专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

51. 发明授权

US08583905B2 Runtime extraction of data parallelism 有权
公开(公告)号：US08583905B2
公开(公告)日：2013-11-12
申请号：US13435411
申请日：2012-03-30
申请人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
发明人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
IPC分类号： G06F9/30
CPC分类号： G06F9/3887 , G06F8/4441 , G06F9/381 , G06F9/3824 , G06F9/3836 , G06F9/3838 , G06F9/3859
摘要： Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.

52. 发明授权

US08572586B2 Optimized scalar promotion with load and splat SIMD instructions 失效
标题翻译：通过加载和拼接SIMD指令优化标量升级
公开(公告)号：US08572586B2
公开(公告)日：2013-10-29
申请号：US13555435
申请日：2012-07-23
申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels
发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels
IPC分类号： G06F9/30
CPC分类号： G06F8/45
摘要： Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
摘要翻译：提供了在单指令多数据（SIMD）引擎上执行的优化标量代码的机制。可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

53. 发明授权

US08561044B2 Optimized code generation targeting a high locality software cache 失效
标题翻译：针对高位置软件缓存的优化代码生成
公开(公告)号：US08561044B2
公开(公告)日：2013-10-15
申请号：US12246602
申请日：2008-10-07
申请人： Tong Chen , Alexandre E. Eichenberger , Marc Gonzalez Tallada , John K. O'Brien , Kathryn M. O'Brien , Zehra N. Sura , Tao Zhang
发明人： Tong Chen , Alexandre E. Eichenberger , Marc Gonzalez Tallada , John K. O'Brien , Kathryn M. O'Brien , Zehra N. Sura , Tao Zhang
IPC分类号： G06F9/44
CPC分类号： G06F8/4442
摘要： Mechanisms for optimized code generation targeting a high locality software cache are provided. Original computer code is parsed to identify memory references in the original computer code. Memory references are classified as either regular memory references or irregular memory references. Regular memory references are controlled by a high locality cache mechanism. Original computer code is transformed, by a compiler, to generate transformed computer code in which the regular memory references are grouped into one or more memory reference streams, each memory reference stream having a leading memory reference, a trailing memory reference, and one or more middle memory references. Transforming of the original computer code comprises inserting, into the original computer code, instructions to execute initialization, lookup, and cleanup operations associated with the leading memory reference and trailing memory reference in a different manner from initialization, lookup, and cleanup operations for the one or more middle memory references.
摘要翻译：提供了针对高位置软件缓存的优化代码生成机制。解析原始计算机代码以识别原始计算机代码中的内存引用。内存引用被分类为常规内存引用或不规则内存引用。常规内存引用由高位置缓存机制控制。原始计算机代码由编译器转换以生成转换的计算机代码，其中常规存储器引用被分组成一个或多个存储器参考流，每个存储器参考流具有前导存储器引用，尾随存储器引用和一个或多个中间内存引用。原始计算机代码的转换包括将原始计算机代码中的指令以不同于初始化，查找和清除操作的方式与前导存储器引用和尾随存储器引用相关联的执行初始化，查找和清除操作的指令进行插入或更多的中间内存引用。

54. 发明授权

US08490071B2 Shared prefetching to reduce execution skew in multi-threaded systems 失效
标题翻译：共享预取以减少多线程系统中的执行偏斜
公开(公告)号：US08490071B2
公开(公告)日：2013-07-16
申请号：US12773454
申请日：2010-05-04
申请人： Alexandre E. Eichenberger , John A. Gunnels
发明人： Alexandre E. Eichenberger , John A. Gunnels
IPC分类号： G06F9/45
CPC分类号： G06F12/0862 , G06F8/4442 , G06F9/30047 , G06F9/383 , G06F9/3851 , G06F2212/6028
摘要： Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.
摘要翻译：提供了用于优化代码以执行数据预取到由计算设备上执行的多个线程共享的计算设备的共享存储器的机制。识别由多个线程共享的代码的一部分的存储器流。一组预取指令分布在多个线程上。预取指令被插入到多个线程的指令序列中，使得每个指令序列具有预取指令集合的单独的子部分，从而生成优化的代码。可执行代码基于优化的代码生成并存储在存储设备中。执行的可执行代码在多个线程中以共享的方式执行与分布式预取指令集相关联的预取。

55. 发明授权

US08468508B2 Parallelization of irregular reductions via parallel building and exploitation of conflict-free units of work at runtime 失效
标题翻译：通过并行建设和运行时无冲突的工作单位利用不平等减少并行化
公开(公告)号：US08468508B2
公开(公告)日：2013-06-18
申请号：US12576717
申请日：2009-10-09
申请人： Alexandre E. Eichenberger , Yangchun Luo , John K. O'Brien , Xiaotong Zhuang
发明人： Alexandre E. Eichenberger , Yangchun Luo , John K. O'Brien , Xiaotong Zhuang
IPC分类号： G06F9/45
CPC分类号： G06F8/456
摘要： An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found. Finally, there is scheduled, for parallel run-time operation, at least two or more processing threads to process a respective the at least two or more assigned CFUWs.
摘要翻译：优化编译器装置，方法，计算机程序产品，其能够执行不规则减少的并行化。用于执行不规则减少的并行化的方法包括在编译器处接收程序并且在编译时选择来自程序的至少一个工作单元（UW），每个UW被配置为在至少一个简化操作上操作，其中 UW中的至少一个减少操作对于在运行时运行程序时地址是可确定的减法变量进行操作。在运行时，对于每个连续的当前UW，记录由该工作单元访问的减少操作的列表。此外，在运行时确定由目前的UW访问的减少操作是否与任何记录为由先前选择的工作单元访问的任何缩减操作相冲突，并且将工作单元分配为无冲突的工作单元（CFUW），当没有发现冲突。最后，对于并行运行时间操作，计划至少两个或更多个处理线程来处理相应的所述至少两个或更多个分配的CFUW。

56. 发明授权

US08464271B2 Runtime dependence-aware scheduling using assist thread 失效
标题翻译：使用辅助线程的运行时依赖感知调度
公开(公告)号：US08464271B2
公开(公告)日：2013-06-11
申请号：US13443515
申请日：2012-04-10
申请人： Alexandre E. Eichenberger , Kathryn M. O'Brien , Xiaotong Zhuang
发明人： Alexandre E. Eichenberger , Kathryn M. O'Brien , Xiaotong Zhuang
IPC分类号： G06F9/46 , G06F13/00
CPC分类号： G06F8/445
摘要： A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.
摘要翻译：提供依赖迭代机制的运行时依赖感知调度。通过主线程执行计算机可执行代码的一个或多个迭代的计算。使用一组依赖线程使用经修改的可执行代码来确定计算机可执行代码内的多个存储器访问的依赖性信息。使用依赖性信息，确定多个迭代中的一组未完成迭代的子集是否能够由数据处理系统中的一个或多个可用线程提前执行。如果多次迭代中的一组未完成迭代的子集能够在时间之前被执行，则主线程被用信号通知以跳过该组未完成迭代的子集，并且该信号通知该组辅助线程以执行该组未完成迭代的子集。

57. 发明授权

US08423979B2 Code generation for complex arithmetic reduction for architectures lacking cross data-path support 有权
标题翻译：针对缺乏跨数据路径支持的架构的复杂算术减少的代码生成
公开(公告)号：US08423979B2
公开(公告)日：2013-04-16
申请号：US11548851
申请日：2006-10-12
申请人： Roch Georges Archambault , Alexandre E. Eichenberger , Amy Kai-Ting Wang , Peng Wu , Peng P. Zhao
发明人： Roch Georges Archambault , Alexandre E. Eichenberger , Amy Kai-Ting Wang , Peng Wu , Peng P. Zhao
IPC分类号： G06F9/45
CPC分类号： G06F8/445 , G06F8/45
摘要： A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.
摘要翻译：一种计算机实现的方法，装置和计算机可用程序代码，用于编译用于执行复杂操作的复杂缩减操作的源代码。确定用于生成用于执行复杂操作和复合缩减操作的可执行代码的方法。生成用于计算子产品的可执行代码，将子产品减少到中间结果，并且对中间结果求和以响应于减少的单指令多数据方法的确定而产生最终结果。

58. 发明授权

US08255884B2 Optimized scalar promotion with load and splat SIMD instructions 失效
标题翻译：通过加载和拼接SIMD指令优化标量升级
公开(公告)号：US08255884B2
公开(公告)日：2012-08-28
申请号：US12134495
申请日：2008-06-06
申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels
发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels
IPC分类号： G06F9/45 , G06F9/44
CPC分类号： G06F8/45
摘要： Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
摘要翻译：提供了在单指令多数据（SIMD）引擎上执行的优化标量代码的机制。可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

59. 发明申请

US20120204189A1 Runtime Dependence-Aware Scheduling Using Assist Thread 失效
标题翻译：使用辅助线程的运行时依赖感知调度
公开(公告)号：US20120204189A1
公开(公告)日：2012-08-09
申请号：US13443515
申请日：2012-04-10
申请人： Alexandre E. Eichenberger , Kathryn M. O'Brien , Xiaotong Zhuang
发明人： Alexandre E. Eichenberger , Kathryn M. O'Brien , Xiaotong Zhuang
IPC分类号： G06F9/46
CPC分类号： G06F8/445
摘要： A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.
摘要翻译：提供依赖迭代机制的运行时依赖感知调度。通过主线程执行计算机可执行代码的一个或多个迭代的计算。使用一组依赖线程使用经修改的可执行代码来确定计算机可执行代码内的多个存储器访问的依赖性信息。使用依赖性信息，确定多个迭代中的一组未完成迭代的子集是否能够由数据处理系统中的一个或多个可用线程提前执行。如果多次迭代中的一组未完成迭代的子集能够在时间之前被执行，则主线程被用信号通知以跳过该组未完成迭代的子集，并且该信号通知该组辅助线程以执行该组未完成迭代的子集。

60. 发明申请

US20120192167A1 Runtime Extraction of Data Parallelism 有权
标题翻译：数据并行性的运行时提取
公开(公告)号：US20120192167A1
公开(公告)日：2012-07-26
申请号：US13435411
申请日：2012-03-30
申请人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
发明人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
IPC分类号： G06F9/45
CPC分类号： G06F9/3887 , G06F8/4441 , G06F9/381 , G06F9/3824 , G06F9/3836 , G06F9/3838 , G06F9/3859
摘要： Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.
摘要翻译：提供了在运行时提取数据依赖关系的机制。所述机制执行具有循环的一部分代码，并为所述循环生成包括小于所述循环的总迭代次数的循环迭代子集的第一并行执行组。机制进一步执行第一个并行执行组，并确定迭代子集中的每个迭代，迭代是否具有数据依赖性。此外，机制仅将数据存储到系统存储器中，用于仅在确定了数据依赖性的迭代子集中通过迭代执行的存储。在确定数据相关性的迭代子集中存储由迭代执行的存储数据不会提交给系统存储器。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式