会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 4. 发明授权
    • Maximized memory throughput on parallel processing devices
    • 最大化并行处理设备的内存吞吐量
    • US08327123B2
    • 2012-12-04
    • US13069384
    • 2011-03-23
    • Norbert JuffaBrett W. Coon
    • Norbert JuffaBrett W. Coon
    • G06F9/30
    • G06F9/3887G06F9/3455G06F9/3851G06F9/3889
    • In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.
    • 在用于流计算的并行处理装置中,流的每个数据元素的处理可能不是计算密集的,因此与读取流并写入结果所需的存储器访问时间相比,处理可能需要相对较少的时间来计算。 因此,内存吞吐量通常会限制流计算的性能。 一般来说,提供了用于在这种存储器吞吐量限制的流计算中实现改进的,优化的或最终最大化的存储器吞吐量的方法。 通过提高跨多个处理元件和线程的聚合内存吞吐量,最大化流计算性能。 通过平衡线程和线程组之间的处理负载以及耦合到并行处理设备的硬件存储器接口来实现高聚合内存吞吐量。
    • 8. 发明申请
    • MAXIMIZED MEMORY THROUGHPUT ON PARALLEL PROCESSING DEVICES
    • 最大化的并行处理器件的存储器
    • US20110173414A1
    • 2011-07-14
    • US13069384
    • 2011-03-23
    • Norbert JuffaBrett W. Coon
    • Norbert JuffaBrett W. Coon
    • G06F9/38
    • G06F9/3887G06F9/3455G06F9/3851G06F9/3889
    • In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.
    • 在用于流计算的并行处理装置中,流的每个数据元素的处理可能不是计算密集的,因此与读取流并写入结果所需的存储器访问时间相比,处理可能需要相对较少的时间来计算。 因此,内存吞吐量通常会限制流计算的性能。 一般来说,提供了用于在这种存储器吞吐量限制的流计算中实现改进的,优化的或最终最大化的存储器吞吐量的方法。 通过提高跨多个处理元件和线程的聚合内存吞吐量,最大化流计算性能。 通过平衡线程和线程组之间的处理负载以及耦合到并行处理设备的硬件存储器接口来实现高聚合内存吞吐量。
    • 9. 发明申请
    • COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS
    • 合作螺线减排和扫描作业
    • US20110078417A1
    • 2011-03-31
    • US12890227
    • 2010-09-24
    • Brian FAHSMing Y. SiuBrett W. CoonJohn R. NickollsLars Nyland
    • Brian FAHSMing Y. SiuBrett W. CoonJohn R. NickollsLars Nyland
    • G06F9/38
    • G06F9/522G06F8/458G06F9/3004G06F9/30087G06F9/30145G06F9/3851
    • One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
    • 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。 聚合被指定为屏障同步或屏障到达指令的一部分,其中除了执行屏障同步或到达之外,指令聚合(使用缩减或扫描操作)由每个线程提供的值。 当线程执行屏障聚合指令时,线程有助于扫描或缩小结果,并等待执行任何更多指令,直到所有线程都执行了阻挡聚合指令为止。 在所有线程执行了屏障聚合指令之后,向每个线程传送减少结果,并且当线程执行屏障聚合指令时,将扫描结果传送给每个线程。