会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 31. 发明申请
    • Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces
    • 线程专用编译器生成的应用程序编程接口运行时支持的定制
    • US20130283250A1
    • 2013-10-24
    • US13453411
    • 2012-04-23
    • Alexandre E. EichenbergerJohn K.P. O'Brien
    • Alexandre E. EichenbergerJohn K.P. O'Brien
    • G06F9/45
    • G06F8/43
    • Mechanisms are provided for generating a customized runtime library for source code. Source code is analyzed to identify a region of code implementing an application programming interface or programming standard of interest. An invocation tree data structure is generated based on results of analysis of functions of the application programming interface or programming standard of interest that the region of code invokes. A custom runtime library is generated based on the invocation tree data structure. The custom runtime library comprises only a subset of runtime library functions, less than a total number of runtime library functions for the application programming interface or programming standard of interest, actually invoked by the region of code and does not include all runtime library functions in the total number of runtime library functions for the application programming interface or programming standard of interest.
    • 提供了用于生成用于源代码的定制运行时库的机制。 分析源代码以识别实现应用编程接口或感兴趣的编程标准的代码区域。 基于应用编程接口的功能分析结果或代码调用的兴趣编程标准的结果生成调用树数据结构。 基于调用树数据结构生成自定义运行时库。 自定义运行时库仅包含运行时库函数的一部分,小于应用程序编程接口的运行时库函数的总数或感兴趣的编程标准,实际上由代码区域调用,并且不包括所有运行时库函数 用于应用程序编程接口的运行时库函数的总数或感兴趣的编程标准。
    • 32. 发明授权
    • Write-through cache optimized for dependence-free parallel regions
    • 针对无依赖并行区域优化的直写缓存
    • US08516197B2
    • 2013-08-20
    • US13025706
    • 2011-02-11
    • Alexandre E. EichenbergerAlan G. GaraMartin OhmachtVijayalakshmi Srinivasan
    • Alexandre E. EichenbergerAlan G. GaraMartin OhmachtVijayalakshmi Srinivasan
    • G06F12/00
    • G06F12/0837
    • An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
    • 一种用于提高并行计算系统性能的装置,方法和计算机程序产品。 与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器检测出第一高速缓存行的虚假共享的发生,并允许第一高速缓存行的错误共享由 第二处理器。 当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时,发生第一高速缓存行的错误共享 设备由第二硬件本地缓存控制器。
    • 36. 发明申请
    • MANAGEMENT OF CONDITIONAL BRANCHES WITHIN A DATA PARALLEL SYSTEM
    • 数据并行系统中条件分支的管理
    • US20120198425A1
    • 2012-08-02
    • US13016406
    • 2011-01-28
    • Alexandre E. EichenbergerBrian FlachsDorit NuzmanIra RosenUlrich WeigandAyal Zaks
    • Alexandre E. EichenbergerBrian FlachsDorit NuzmanIra RosenUlrich WeigandAyal Zaks
    • G06F9/45
    • G06F8/456
    • A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The compiler converts those “if-then-else” statements into “conditional branch and prepare” statements as well as “branch return” statements. The compiler compiles source code file information containing “if-then-else” statement opportunities into compiled code, namely an executable program. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements. The processor continues processing next statements inline after all SIMD lanes are complete, while providing speculative and parallel processing capability for multiple data operations of the executable program.
    • 单指令多数据(SIMD)信息处理系统(IHS)的编译器标识“if-then-else”语句,为条件分支转换提供机会。 编译器将那些“if-then-else”语句转换成“条件分支和准备”语句以及“分支返回”语句。 编译器将包含“if-then-else”语句机会的源代码文件信息编译为编译代码,即可执行程序。 SIMD IHS采用处理器或处理器来执行可执行程序。 在执行期间,处理器生成并更新SIMD通道掩码信息以跟踪和管理执行程序的条件分支循环。 处理器保存分支地址,并使用SIMD通道屏蔽来识别具有不同分支条件的条件分支循环,而不是先前的条件分支循环。 在处理原始“if-then-else”语句的编译代码时,处理器可能会减少SIMD IHS处理时间。 在所有SIMD通道完成之后,处理器继续处理下一个语句,同时为可执行程序的多个数据操作提供推测性和并行处理能力。
    • 37. 发明授权
    • Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization
    • 用于SIMD向量化的连续存储器访问的集成的内部和组间集成的框架
    • US08056069B2
    • 2011-11-08
    • US11856284
    • 2007-09-17
    • Alexandre E. EichenbergerKai-Ting Amy WangPeng Wu
    • Alexandre E. EichenbergerKai-Ting Amy WangPeng Wu
    • G06F9/45G06F7/52G06F15/00
    • G06F8/4452G06F8/445
    • A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
    • 一种用于生成在单指令多数据路径(SIMD)架构上执行的循环码的方法,计算机程序产品和信息处理系统,其中循环包含在连续的存储器流上操作的多个非步进存储器访问 披露 优选实施例识别在循环体内同构语句的组,其中同构语句在循环的迭代上在连续的存储器流上操作。 然后将那些识别的语句转换为虚拟长度向量操作。 接下来,使用硬件的可用向量长度来确定多个虚拟长度向量以聚合到单个向量操作中,用于循环的每次迭代。 最后,聚合的向量化循环码被转换成SIMD操作。
    • 38. 发明申请
    • PARALLELIZATION OF IRREGULAR REDUCTIONS VIA PARALLEL BUILDING AND EXPLOITATION OF CONFLICT-FREE UNITS OF WORK AT RUNTIME
    • 通过平行建筑和平稳利用无冲突的工作单位在运行期间的平行化
    • US20110088020A1
    • 2011-04-14
    • US12576717
    • 2009-10-09
    • Alexandre E. EichenbergerYangchun LuoJohn K. O'BrienXiaotong Zhuang
    • Alexandre E. EichenbergerYangchun LuoJohn K. O'BrienXiaotong Zhuang
    • G06F9/45
    • G06F8/456
    • An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found. Finally, there is scheduled, for parallel run-time operation, at least two or more processing threads to process a respective the at least two or more assigned CFUWs.
    • 优化编译器装置,方法,计算机程序产品,其能够执行不规则减少的并行化。 用于执行不规则减少的并行化的方法包括在编译器处接收程序并且在编译时选择来自程序的至少一个工作单元(UW),每个UW被配置为在至少一个简化操作上操作,其中 UW中的至少一个减少操作对于在运行时运行程序时地址是可确定的减法变量进行操作。 在运行时,对于每个连续的当前UW,记录由该工作单元访问的减少操作的列表。 此外,在运行时确定由目前的UW访问的减少操作是否与任何记录为由先前选择的工作单元访问的任何缩减操作相冲突,并且将工作单元分配为无冲突的工作单元(CFUW),当 没有发现冲突。 最后,对于并行运行时间操作,计划至少两个或更多个处理线程来处理相应的所述至少两个或更多个分配的CFUW。