专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

21. 发明申请

US20080222623A1 Efficient Code Generation Using Loop Peeling for SIMD Loop Code with Multiple Misaligned Statements 失效
标题翻译：使用循环剥离对具有多个不对齐语句的SIMD循环码进行有效的代码生成
公开(公告)号：US20080222623A1
公开(公告)日：2008-09-11
申请号：US12122050
申请日：2008-05-16
申请人： Alexandre E. Eichenberger , Kai-Ting Amy Wang , Peng Wu
发明人： Alexandre E. Eichenberger , Kai-Ting Amy Wang , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/447 , G06F8/4441
摘要： An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
摘要翻译：提供了一种方法，用于在仅支持对齐加载和存储的SIMD架构的编译代码中向量化未对齐的引用。在这个框架中，循环首先被模拟，就好像内存单元没有对齐约束。编译器然后插入数据重组操作以满足硬件的实际对齐要求。最后，代码生成算法基于数据重组图生成SIMD代码，解决诸如运行时对齐，未知循环边界，残差迭代计数以及具有任意对齐组合的多个语句之类的现实问题。循环剥离用于减少与未对齐数据相关的计算开销。循环序言和结语在模拟循环中从单独迭代中去除，向量拼接指令被应用于剥离的迭代，而稳态循环体不引起额外的计算开销。

22. 发明申请

US20080222391A1 Apparatus and Method for Optimizing Scalar Code Executed on a SIMD Engine by Alignment of SIMD Slots 失效
标题翻译：用于通过SIMD槽的对准来优化在SIMD引擎上执行的标量的装置和方法
公开(公告)号：US20080222391A1
公开(公告)日：2008-09-11
申请号：US12127491
申请日：2008-05-27
申请人： Alexandre E. Eichenberger , John Kevin Patrick O'Brien
发明人： Alexandre E. Eichenberger , John Kevin Patrick O'Brien
IPC分类号： G06F9/44 , G06F9/30 , G06F9/315
CPC分类号： G06F9/3885 , G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/3824
摘要： An apparatus and method for optimizing scalar code executed on a single instruction multiple data (SIMD) engine is provided that aligns the slots of SIMD registers. With the apparatus and method, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.
摘要翻译：提供了一种用于优化在单指令多数据（SIMD）引擎上执行的标量码的装置和方法，其对准SIMD寄存器的时隙。使用设备和方法，提供了一个解析源代码的编译器，对于程序中的每个语句，都会生成一个表达式树。编译器检查表达式树中的所有存储输入到标量运算，以确定它们在SIMD寄存器中的对齐。该对齐方式从树叶中向上传播。当表达式树中的两个操作数的对齐方式相同时，生成的对齐方式是共享值。当表达式树中的两个操作数的对齐不同时，一个操作数被移位。对于移位的操作数，在表达式树中插入shift操作。然后为表达式树生成可执行代码，并在指定的位置插入移位。

23. 发明授权

US07386842B2 Efficient data reorganization to satisfy data alignment constraints 失效
标题翻译：有效的数据重组以满足数据对齐约束
公开(公告)号：US07386842B2
公开(公告)日：2008-06-10
申请号：US10862483
申请日：2004-06-07
申请人： Alexandre E. Eichenberger , John Kevin Patrick O'Brien , Peng Wu
发明人： Alexandre E. Eichenberger , John Kevin Patrick O'Brien , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/4452
摘要： An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.
摘要翻译：提供了一种方法，用于在仅支持对齐加载和存储的SIMD架构的编译代码中向量化未对齐的引用。在本文提出的框架中，首先简化循环，就好像内存单元不会对齐约束。然后，编译器插入数据重组操作，以满足硬件的实际对齐要求。最后，代码生成算法基于数据重组图生成SIMD代码，解决诸如运行时对齐，未知循环边界，残差迭代计数以及具有任意对齐组合的多个语句之类的现实问题。除了生成有效的simdization之外，优选实施例进一步提高了生成代码的质量。公开了四个流移放置策略，其最小化由对齐处理产生的数据重组的数量。

24. 发明申请

US20080127059A1 GENERATING OPTIMIZED SIMD CODE IN THE PRESENCE OF DATA DEPENDENCES 有权
标题翻译：在数据依赖的情况下生成优化的SIMD代码
公开(公告)号：US20080127059A1
公开(公告)日：2008-05-29
申请号：US11535181
申请日：2006-09-26
申请人： Alexandre E. Eichenberger , Amy K. Wang , Peng Wu , Peng Zhao
发明人： Alexandre E. Eichenberger , Amy K. Wang , Peng Wu , Peng Zhao
IPC分类号： G06F9/44
CPC分类号： G06F8/447 , G06F8/43
摘要： A method for generating code, including identifying at least one portion of source code that is simdizable and has a dependence, analyzing the dependence for characteristics, based upon the characteristics, selecting a transformation from a predefined group of transformations, applying the transformation to the at least one portion to generate SIMD code for the at least one portion.
摘要翻译：一种用于生成代码的方法，包括识别可仿真并具有依赖性的源代码的至少一部分，基于特征来分析对特征的依赖性，从预定义的变换组中选择变换，将转换应用于至少一个部分，用于为所述至少一个部分生成SIMD代码。

25. 发明授权

US09223580B2 Systems, methods and computer products for cross-thread scheduling 有权
标题翻译：用于跨线程调度的系统，方法和计算机产品
公开(公告)号：US09223580B2
公开(公告)日：2015-12-29
申请号：US11847556
申请日：2007-08-30
申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A Gunnels , James L. McInnes , Mark P. Mendell
发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A Gunnels , James L. McInnes , Mark P. Mendell
IPC分类号： G06F9/455 , G06F9/46 , G06F9/38 , G06F9/45
CPC分类号： G06F9/3851 , G06F8/445 , G06F9/3885
摘要： Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.
摘要翻译：用于跨线程调度的系统，方法和计算机产品。示例性实施例包括用于编译代码的交叉线程调度方法，所述方法包括：响应于所述调度单元处于所述代码的非多线程部分中的调度器子操作来调度调度单元，并且调度所述调度单元，响应于调度单元处于代码的多线程部分中的线程调度器子操作。

26. 发明授权

US08954943B2 Analyze and reduce number of data reordering operations in SIMD code 有权
标题翻译：分析和减少SIMD代码中数据重排序的数量
公开(公告)号：US08954943B2
公开(公告)日：2015-02-10
申请号：US11340452
申请日：2006-01-26
申请人： Alexandre E. Eichenberger , Kai-Ting Amy Wang , Peng Wu , Peng Zhao
发明人： Alexandre E. Eichenberger , Kai-Ting Amy Wang , Peng Wu , Peng Zhao
IPC分类号： G06F9/45 , G06F15/00 , G06F15/76
CPC分类号： G06F8/443
摘要： A method for analyzing data reordering operations in Single Issue Multiple Data source code and generating executable code therefrom is provided. Input is received. One or more data reordering operations in the input are identified and each data reordering operation in the input is abstracted into a corresponding virtual shuffle operation so that each virtual shuffle operation forms part of an expression tree. One or more virtual shuffle trees are collapsed by combining virtual shuffle operations within at least one of the one or more virtual shuffle trees to form one or more combined virtual shuffle operations, wherein each virtual shuffle tree is a subtree of the expression tree that only contains virtual shuffle operations. Then code is generated for the one or more combined virtual shuffle operations.
摘要翻译：提供了一种用于分析单发多数据源代码中的数据重排序操作并从中生成可执行代码的方法。收到输入。识别输入中的一个或多个数据重排序操作，并将输入中的每个数据重排序操作抽象为相应的虚拟随机播放操作，使得每个虚拟随机播放操作形成表达式树的一部分。通过将所述一个或多个虚拟随机播放树中的至少一个中的虚拟随机播放操作组合以形成一个或多个组合的虚拟随机播放操作来折叠一个或多个虚拟洗牌树，其中每个虚拟随机播放树是仅包含表达式树的子树虚拟随机操作。然后为一个或多个组合的虚拟随机操作生成代码。

27. 发明授权

US08904153B2 Vector loads with multiple vector elements from a same cache line in a scattered load operation 有权
标题翻译：在分散加载操作中，来自相同高速缓存行的多个向量元素的向量加载
公开(公告)号：US08904153B2
公开(公告)日：2014-12-02
申请号：US12876321
申请日：2010-09-07
申请人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura
发明人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura
IPC分类号： G06F9/345 , G06F9/30 , G06F9/38 , G06F15/80
CPC分类号： G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/3857 , G06F15/8069
摘要： Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.
摘要翻译：提供了执行分散加载操作的机构。利用这些机制，扩展地址被接收在处理器的高速缓冲存储器中。扩展地址具有多个数据元素地址部分，其指定使用单个扩展地址来访问的多个数据元素。多个数据元素地址部分中的每一个被提供给高速缓冲存储器的相应数据元素选择器逻辑单元。高速缓冲存储器中的每个数据元素选择器逻辑单元基于提供给数据元素选择器逻辑单元的相应数据元素地址部分从高速缓存行缓冲器中选择相应的数据元素。每个数据元素选择器逻辑单元输出相应的数据元素供处理器使用。

28. 发明授权

US08881159B2 Constant time worker thread allocation via configuration caching 有权
标题翻译：通过配置缓存来定时工作线程分配
公开(公告)号：US08881159B2
公开(公告)日：2014-11-04
申请号：US13070811
申请日：2011-03-24
申请人： Alexandre E. Eichenberger , John K. P. O'Brien
发明人： Alexandre E. Eichenberger , John K. P. O'Brien
IPC分类号： G06F9/46 , G06F9/50
CPC分类号： G06F9/5066
摘要： Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.
摘要翻译：提供了用于分配用于执行并行区域代码的线程的机制。从主线程接收到用于分配工作线程以执行代码并行区域的请求。识别为主线程执行的先前线程分配的缓存线程分配信息被访问。工作线程基于缓存的线程分配信息分配给主线程。使用分配的工作线程来执行代码的并行区域。

29. 发明授权

US08627043B2 Data parallel function call for determining if called routine is data parallel 失效
标题翻译：数据并行功能调用，用于确定被调用的程序是否是数据并行的
公开(公告)号：US08627043B2
公开(公告)日：2014-01-07
申请号：US13430168
申请日：2012-03-26
申请人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
发明人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
IPC分类号： G06F9/30
CPC分类号： G06F9/30072 , G06F8/456 , G06F9/30189 , G06F9/3851 , G06F9/3887
摘要： Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.
摘要翻译：提供了在运行期间执行代码中数据并行函数调用的机制。这些机制可以操作以在处理器中执行具有对目标代码部分的数据并行函数调用的代码的一部分。这些机制可以进一步操作以在运行时由处理器确定目标代码部分是代码的数据并行部分还是代码的标量部分，并确定调用代码是数据并行代码还是标量代码。此外，这些机制可以基于代码的目标部分是代码的数据并行部分还是代码的标量部分的确定来执行代码的目标部分，以及确定调用代码是否是数据并行代码或标量代码。

30. 发明授权

US08627042B2 Data parallel function call for determining if called routine is data parallel 失效
公开(公告)号：US08627042B2
公开(公告)日：2014-01-07
申请号：US12649751
申请日：2009-12-30
申请人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
发明人： Alexandre E. Eichenberger , Brian K. Flachs , Charles R. Johns , Mark R. Nutter
IPC分类号： G06F9/30
CPC分类号： G06F9/30072 , G06F8/456 , G06F9/30189 , G06F9/3851 , G06F9/3887
摘要： Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式