会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • Optimized Scalar Promotion with Load and Splat SIMD Instructions
    • 通过加载和Splat SIMD指令优化标量升级
    • US20120290816A1
    • 2012-11-15
    • US13555435
    • 2012-07-23
    • Alexandre E. EichenbergerMichael K. GschwindJohn A. Gunnels
    • Alexandre E. EichenbergerMichael K. GschwindJohn A. Gunnels
    • G06F9/30
    • G06F8/45
    • Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
    • 提供了在单指令多数据(SIMD)引擎上执行的优化标量代码的机制。 可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。 可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。 可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。 可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。 可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。
    • 6. 发明申请
    • Optimized Corner Turns for Local Storage and Bandwidth Reduction
    • 优化的角落转向本地存储和带宽减少
    • US20090292758A1
    • 2009-11-26
    • US12125996
    • 2008-05-23
    • Daniel A. BrokenshireJohn A. GunnelsMichael D. Kistler
    • Daniel A. BrokenshireJohn A. GunnelsMichael D. Kistler
    • G06F7/52G06F12/02
    • G06F17/16
    • A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    • 提供了一种块矩阵乘法机制,用于在数据处理系统中执行块矩阵乘法运算时,使拐角处的块的访问次序反转。 通过反转巡视顺序,该机构消除了拐角处的挡块负载。 根据说明性实施例,角返回被称为“反弹”转弯,并且导致矩阵块的蛇形图案化处理顺序。 该机制允许数据处理系统执行块矩阵乘法运算,每个时间步长最多有三个块传输。 因此,该机制降低了最大吞吐量并提高了性能。 此外,该机制还减少了多缓冲本地存储缓冲区的数量。
    • 7. 发明申请
    • METHOD AND STRUCTURE FOR FAST IN-PLACE TRANSFORMATION OF STANDARD FULL AND PACKED MATRIX DATA FORMATS
    • 标准完整和包装矩阵数据格式的快速插入转换的方法和结构
    • US20090063607A1
    • 2009-03-05
    • US11849272
    • 2007-09-01
    • Fred Gehrung GustavsonJohn A. GunnelsJames C. Sexton
    • Fred Gehrung GustavsonJohn A. GunnelsJames C. Sexton
    • G06F7/32
    • G06F17/16G06F7/78G06F12/0207G06F2212/454
    • A method and structure for an in-place transformation of matrix data. For a matrix A stored in one of a standard full format or a packed format and a transformation T having a compact representation, blocking parameters MB and NB are chosen, based on a cache size. A sub-matrix A1 of A, A1 having size M1=m*MB by N1=n*NB, is worked on, and any of a residual remainder of A is saved in a buffer B. Sub-matrix A1 is worked on by contiguously moving and contiguously transforming A1 in-place into a New Data Structure (NDS), applying the transformation T in units of MB*NB contiguous double words to the NDS format of A1, thereby replacing A1 with the contents of T(A1), and moving and transforming NDS T(A1) to standard data format T(A1) with holes for the remainder of A in buffer B. The contents of buffer B is contiguously copied into the holes of A2, thereby providing in-place transformed matrix T(A).
    • 矩阵数据的就地转换的方法和结构。 对于以标准全格式或打包格式之一存储的矩阵A和具有紧凑表示的变换T,基于高速缓存大小来选择阻塞参数MB和NB。 对于具有M1 = m * MB的N1 = n * NB的A的A1的矩阵A1进行加工,并且A的剩余余数中的任一个保存在缓冲器B中。子矩阵A1由 将A1原位连续移动并连续地转换为新数据结构(NDS),将以MB * NB连续双字为单位的变换T应用于A1的NDS格式,从而将A1替换为T(A1)的内容, 并且将NDS T(A1)移动并变换为具有用于缓冲器B中的剩余部分的空穴的标准数据格式T(A1)。缓冲器B的内容被连续地复制到A2的孔中,从而提供就地变换矩阵T (一个)。