专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

31. 发明申请

US20130283250A1 Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces 审中-公开
标题翻译：线程专用编译器生成的应用程序编程接口运行时支持的定制
公开(公告)号：US20130283250A1
公开(公告)日：2013-10-24
申请号：US13453411
申请日：2012-04-23
申请人： Alexandre E. Eichenberger , John K.P. O'Brien
发明人： Alexandre E. Eichenberger , John K.P. O'Brien
IPC分类号： G06F9/45
CPC分类号： G06F8/43
摘要： Mechanisms are provided for generating a customized runtime library for source code. Source code is analyzed to identify a region of code implementing an application programming interface or programming standard of interest. An invocation tree data structure is generated based on results of analysis of functions of the application programming interface or programming standard of interest that the region of code invokes. A custom runtime library is generated based on the invocation tree data structure. The custom runtime library comprises only a subset of runtime library functions, less than a total number of runtime library functions for the application programming interface or programming standard of interest, actually invoked by the region of code and does not include all runtime library functions in the total number of runtime library functions for the application programming interface or programming standard of interest.
摘要翻译：提供了用于生成用于源代码的定制运行时库的机制。分析源代码以识别实现应用编程接口或感兴趣的编程标准的代码区域。基于应用编程接口的功能分析结果或代码调用的兴趣编程标准的结果生成调用树数据结构。基于调用树数据结构生成自定义运行时库。自定义运行时库仅包含运行时库函数的一部分，小于应用程序编程接口的运行时库函数的总数或感兴趣的编程标准，实际上由代码区域调用，并且不包括所有运行时库函数用于应用程序编程接口的运行时库函数的总数或感兴趣的编程标准。

32. 发明授权

US08516197B2 Write-through cache optimized for dependence-free parallel regions 有权
标题翻译：针对无依赖并行区域优化的直写缓存
公开(公告)号：US08516197B2
公开(公告)日：2013-08-20
申请号：US13025706
申请日：2011-02-11
申请人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
发明人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
IPC分类号： G06F12/00
CPC分类号： G06F12/0837
摘要： An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
摘要翻译：一种用于提高并行计算系统性能的装置，方法和计算机程序产品。与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器检测出第一高速缓存行的虚假共享的发生，并允许第一高速缓存行的错误共享由第二处理器。当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时，发生第一高速缓存行的错误共享设备由第二硬件本地缓存控制器。

33. 发明授权

US08458442B2 Method and structure of using SIMD vector architectures to implement matrix multiplication 失效
标题翻译：使用SIMD矢量架构实现矩阵乘法的方法和结构
公开(公告)号：US08458442B2
公开(公告)日：2013-06-04
申请号：US12548129
申请日：2009-08-26
申请人： Alexandre E. Eichenberger , Michael Karl Gschwind , John A. Gunnels , Fred Gehrung Gustavson , Brett Olsson
发明人： Alexandre E. Eichenberger , Michael Karl Gschwind , John A. Gunnels , Fred Gehrung Gustavson , Brett Olsson
IPC分类号： G06F15/00 , G06F15/76
CPC分类号： G06F9/3881 , G06F9/3001 , G06F9/30032 , G06F9/30036 , G06F9/3877 , G06F17/16
摘要： A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=−1.
摘要翻译：一种包括多个协处理单元和使用复合加载指令选择性地加载用于处理多个协处理单元的数据的控制器的结构（和方法）。复合加载指令包括多个低级软件指令，其以预定的方式预先处理输入数据，以模拟单个硬件加载指令的效果，该硬件加载指令将通过根据效果加载输入数据来提供复合矩阵数据的最佳加载乘以i·i = -1。

34. 发明授权

US08370575B2 Optimized software cache lookup for SIMD architectures 有权
标题翻译： SIMD架构优化的软件缓存查找
公开(公告)号：US08370575B2
公开(公告)日：2013-02-05
申请号：US11470638
申请日：2006-09-07
申请人： Alexandre E. Eichenberger , John Kevin Patrick O'Brien , Tao Zhang
发明人： Alexandre E. Eichenberger , John Kevin Patrick O'Brien , Tao Zhang
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28
CPC分类号： G06F12/0864
摘要： Process, cache memory, computer product and system for loading data associated with a requested address in a software cache. The process includes loading address tags associated with a set in a cache directory using a Single Instruction Multiple Data (SIMD) operation, determining a position of the requested address in the set using a SIMD comparison, and determining an actual data value associated with the position of the requested address in the set.
摘要翻译：处理，高速缓冲存储器，计算机产品和系统，用于在软件缓存中加载与所请求的地址相关联的数据。该过程包括使用单指令多数据（SIMD）操作将与集合相关联的地址标签加载到高速缓存目录中，使用SIMD比较确定所述集合中的请求地址的位置，以及确定与该位置相关联的实际数据值的集合中的请求地址。

35. 发明申请

US20120246654A1 Constant Time Worker Thread Allocation Via Configuration Caching 有权
标题翻译：通过配置缓存来定时工作线程分配
公开(公告)号：US20120246654A1
公开(公告)日：2012-09-27
申请号：US13070811
申请日：2011-03-24
申请人： Alexandre E. Eichenberger , John K.P. O'Brien
发明人： Alexandre E. Eichenberger , John K.P. O'Brien
IPC分类号： G06F9/46
CPC分类号： G06F9/5066
摘要： Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.
摘要翻译：提供了用于分配用于执行并行区域代码的线程的机制。从主线程接收到用于分配工作线程以执行代码并行区域的请求。识别为主线程执行的先前线程分配的缓存线程分配信息被访问。工作线程基于缓存的线程分配信息分配给主线程。使用分配的工作线程来执行代码的并行区域。

36. 发明申请

US20120198425A1 MANAGEMENT OF CONDITIONAL BRANCHES WITHIN A DATA PARALLEL SYSTEM 失效
标题翻译：数据并行系统中条件分支的管理
公开(公告)号：US20120198425A1
公开(公告)日：2012-08-02
申请号：US13016406
申请日：2011-01-28
申请人： Alexandre E. Eichenberger , Brian Flachs , Dorit Nuzman , Ira Rosen , Ulrich Weigand , Ayal Zaks
发明人： Alexandre E. Eichenberger , Brian Flachs , Dorit Nuzman , Ira Rosen , Ulrich Weigand , Ayal Zaks
IPC分类号： G06F9/45
CPC分类号： G06F8/456
摘要： A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The compiler converts those “if-then-else” statements into “conditional branch and prepare” statements as well as “branch return” statements. The compiler compiles source code file information containing “if-then-else” statement opportunities into compiled code, namely an executable program. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements. The processor continues processing next statements inline after all SIMD lanes are complete, while providing speculative and parallel processing capability for multiple data operations of the executable program.
摘要翻译：单指令多数据（SIMD）信息处理系统（IHS）的编译器标识“if-then-else”语句，为条件分支转换提供机会。编译器将那些“if-then-else”语句转换成“条件分支和准备”语句以及“分支返回”语句。编译器将包含“if-then-else”语句机会的源代码文件信息编译为编译代码，即可执行程序。 SIMD IHS采用处理器或处理器来执行可执行程序。在执行期间，处理器生成并更新SIMD通道掩码信息以跟踪和管理执行程序的条件分支循环。处理器保存分支地址，并使用SIMD通道屏蔽来识别具有不同分支条件的条件分支循环，而不是先前的条件分支循环。在处理原始“if-then-else”语句的编译代码时，处理器可能会减少SIMD IHS处理时间。在所有SIMD通道完成之后，处理器继续处理下一个语句，同时为可执行程序的多个数据操作提供推测性和并行处理能力。

37. 发明授权

US08056069B2 Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization 失效
标题翻译：用于SIMD向量化的连续存储器访问的集成的内部和组间集成的框架
公开(公告)号：US08056069B2
公开(公告)日：2011-11-08
申请号：US11856284
申请日：2007-09-17
申请人： Alexandre E. Eichenberger , Kai-Ting Amy Wang , Peng Wu
发明人： Alexandre E. Eichenberger , Kai-Ting Amy Wang , Peng Wu
IPC分类号： G06F9/45 , G06F7/52 , G06F15/00
CPC分类号： G06F8/4452 , G06F8/445
摘要： A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
摘要翻译：一种用于生成在单指令多数据路径（SIMD）架构上执行的循环码的方法，计算机程序产品和信息处理系统，其中循环包含在连续的存储器流上操作的多个非步进存储器访问披露优选实施例识别在循环体内同构语句的组，其中同构语句在循环的迭代上在连续的存储器流上操作。然后将那些识别的语句转换为虚拟长度向量操作。接下来，使用硬件的可用向量长度来确定多个虚拟长度向量以聚合到单个向量操作中，用于循环的每次迭代。最后，聚合的向量化循环码被转换成SIMD操作。

38. 发明申请

US20110088020A1 PARALLELIZATION OF IRREGULAR REDUCTIONS VIA PARALLEL BUILDING AND EXPLOITATION OF CONFLICT-FREE UNITS OF WORK AT RUNTIME 失效
标题翻译：通过平行建筑和平稳利用无冲突的工作单位在运行期间的平行化
公开(公告)号：US20110088020A1
公开(公告)日：2011-04-14
申请号：US12576717
申请日：2009-10-09
申请人： Alexandre E. Eichenberger , Yangchun Luo , John K. O'Brien , Xiaotong Zhuang
发明人： Alexandre E. Eichenberger , Yangchun Luo , John K. O'Brien , Xiaotong Zhuang
IPC分类号： G06F9/45
CPC分类号： G06F8/456
摘要： An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found. Finally, there is scheduled, for parallel run-time operation, at least two or more processing threads to process a respective the at least two or more assigned CFUWs.
摘要翻译：优化编译器装置，方法，计算机程序产品，其能够执行不规则减少的并行化。用于执行不规则减少的并行化的方法包括在编译器处接收程序并且在编译时选择来自程序的至少一个工作单元（UW），每个UW被配置为在至少一个简化操作上操作，其中 UW中的至少一个减少操作对于在运行时运行程序时地址是可确定的减法变量进行操作。在运行时，对于每个连续的当前UW，记录由该工作单元访问的减少操作的列表。此外，在运行时确定由目前的UW访问的减少操作是否与任何记录为由先前选择的工作单元访问的任何缩减操作相冲突，并且将工作单元分配为无冲突的工作单元（CFUW），当没有发现冲突。最后，对于并行运行时间操作，计划至少两个或更多个处理线程来处理相应的所述至少两个或更多个分配的CFUW。

39. 发明申请

US20110047334A1 Checkpointing in Speculative Versioning Caches 失效
标题翻译：推测版本控制缓存中的检查点
公开(公告)号：US20110047334A1
公开(公告)日：2011-02-24
申请号：US12544704
申请日：2009-08-20
申请人： Alexandre E. Eichenberger , Alan Gara , Michael K. Gschwind , Martin Ohmacht
发明人： Alexandre E. Eichenberger , Alan Gara , Michael K. Gschwind , Martin Ohmacht
IPC分类号： G06F12/08 , G06F12/00
CPC分类号： G06F12/0842 , G06F11/1405
摘要： Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.
摘要翻译：提供了用于在数据处理系统的推测版本缓存中生成检查点的机制。这些机制在数据处理系统内执行代码，其中代码访问推测版本控制缓存中的高速缓存行。这些机制进一步确定是否出现指示在推测版本控制高速缓存中生成检查点的需要的第一条件。检查点是推测性高速缓存行，其响应于需要向对应于推测性高速缓存行的高速缓存行的回滚而返回的第二条件而变得不推测。这些机制还响应于确定第一个条件已经发生，在推测版本控制缓存中生成检查点。

40. 发明申请

US20110040822A1 Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture 失效
标题翻译：在高性能计算架构中使用数据预处理的复杂矩阵乘法运算
公开(公告)号：US20110040822A1
公开(公告)日：2011-02-17
申请号：US12542324
申请日：2009-08-17
申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels
发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels
IPC分类号： G06F17/16 , G06F7/52
CPC分类号： G06F17/16 , G06F9/30014 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/30109
摘要： Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.
摘要翻译：提供了执行复矩阵乘法运算的机制。执行矢量加载操作以将复矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。第一矢量操作数包括第一复矢量值的实部和虚部。执行复杂的加载和拼接操作以加载第二向量操作数的第二复数向量值，并在第二目标向量寄存器内复制第二复数向量值。第二个复矢量值具有实部和虚部。对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行交叉乘法运算，以生成复矩阵乘法运算的部分乘积。部分产品与其他部分产品一起累积，并将结果积累的部分产品存储在结果向量寄存器中。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式