专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08516197B2 Write-through cache optimized for dependence-free parallel regions 有权
标题翻译：针对无依赖并行区域优化的直写缓存
公开(公告)号：US08516197B2
公开(公告)日：2013-08-20
申请号：US13025706
申请日：2011-02-11
申请人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
发明人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
IPC分类号： G06F12/00
CPC分类号： G06F12/0837
摘要： An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
摘要翻译：一种用于提高并行计算系统性能的装置，方法和计算机程序产品。与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器检测出第一高速缓存行的虚假共享的发生，并允许第一高速缓存行的错误共享由第二处理器。当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时，发生第一高速缓存行的错误共享设备由第二硬件本地缓存控制器。

2. 发明授权

US08627010B2 Write-through cache optimized for dependence-free parallel regions 有权
标题翻译：针对无依赖并行区域优化的直写缓存
公开(公告)号：US08627010B2
公开(公告)日：2014-01-07
申请号：US13604349
申请日：2012-09-05
申请人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
发明人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
IPC分类号： G06F12/00
CPC分类号： G06F12/0837
摘要： An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
摘要翻译：一种用于提高并行计算系统性能的设备和计算机程序产品。与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器来检测第一高速缓存行的虚假共享的发生，并允许第一高速缓存行的错误共享由第二处理器。当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时，发生第一高速缓存行的错误共享设备由第二硬件本地缓存控制器。

3. 发明授权

US07617366B2 Method and apparatus for filtering snoop requests using mulitiple snoop caches 失效
标题翻译：使用多播监听高速缓存来过滤窥探请求的方法和装置
公开(公告)号：US07617366B2
公开(公告)日：2009-11-10
申请号：US12113756
申请日：2008-05-01
申请人： Matthias A. Blumrich , Alan G. Gara , Mark E. Giampapa , Martin Ohmacht , Valentina Salapura
发明人： Matthias A. Blumrich , Alan G. Gara , Mark E. Giampapa , Martin Ohmacht , Valentina Salapura
IPC分类号： G06F13/28 , G06F12/00
CPC分类号： G06F12/0822 , G06F12/0831
摘要： A method and apparatus for detecting a cache wrap condition in a computing environment having a processor and a cache. A cache wrap condition is detected when the entire contents of a cache have been replaced, relative to a particular starting state. A set-associative cache is considered to have wrapped when all of the sets within the cache have been replaced. The starting point for cache wrap detection is the state of the cache sets at the time of the previous cache wrap. The method and apparatus is preferably implemented in a snoop filter having filter mechanisms that rely upon detecting the cache wrap condition. These snoop filter mechanisms requiring this information are operatively coupled with cache wrap detection logic adapted to detect the cache wrap event, and perform an indication step to the snoop filter mechanisms. In the various embodiments, cache wrap detection logic is implemented using registers and comparators, loadable counters, or a scoreboard data structure.
摘要翻译：一种用于在具有处理器和高速缓存的计算环境中检测高速缓存包装条件的方法和装置。当高速缓存的全部内容相对于特定的启动状态被替换时，检测到缓存包装条件。当缓存中的所有集合已被替换时，集合关联缓存被认为已被包装。高速缓存包检测的起始点是先前高速缓存包装时高速缓存集的状态。该方法和装置优选地在具有依赖于检测高速缓存包装条件的过滤机构的窥探过滤器中实现。这些需要该信息的窥探过滤机构可操作地与适用于检测高速缓存包裹事件的高速缓存包检测逻辑耦合，并且向窥探过滤机构执行指示步骤。在各种实施例中，使用寄存器和比较器，可加载计数器或记分板数据结构来实现高速缓存封包检测逻辑。

4. 发明申请

US20090006923A1 COMBINED GROUP ECC PROTECTION AND SUBGROUP PARITY PROTECTION 有权
标题翻译：组合群组保护和子群保障
公开(公告)号：US20090006923A1
公开(公告)日：2009-01-01
申请号：US11768527
申请日：2007-06-26
申请人： Alan G. Gara , Dong Chen , Philip Heidelberger , Martin Ohmacht
发明人： Alan G. Gara , Dong Chen , Philip Heidelberger , Martin Ohmacht
IPC分类号： H03M13/00
CPC分类号： G06F11/1076 , G06F11/1064 , G06F2212/403 , H03M1/0687 , H03M13/13 , H03M13/2707 , H03M13/271 , H03M13/29 , H03M13/616
摘要： A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.
摘要翻译：公开了用于为给定的n位组提供组合的错误代码保护和子组奇偶校验保护的方法和系统。该方法包括以下步骤：识别用于所述错误保护的冗余位的数量m; 并且构造矩阵P，其中将所述给定的n个比特组与P相乘产生m个冗余纠错码（ECC）保护比特，并且两列P为所述给定组n比特的子组提供奇偶校验保护。在本发明的优选实施例中，矩阵P是通过产生具有三个或更多个奇数个元素的m位宽向量的排列而构成的，其中值为1的元素和其他元素的值为零; 并将所述向量分配给矩阵P的行。

5. 发明授权

US07380071B2 Snoop filtering system in a multiprocessor system 有权
标题翻译：多处理器系统中的Snoop过滤系统
公开(公告)号：US07380071B2
公开(公告)日：2008-05-27
申请号：US11093127
申请日：2005-03-29
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F13/28 , G06F12/00
CPC分类号： G06F12/0831 , G06F12/0813 , Y02D10/13
摘要： A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.
摘要翻译：一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法，每个单元具有与其可操作地耦合的相关联的高速缓冲存储器系统。该系统包括多个互连的窥探过滤器单元，每个窥探过滤器单元对应于相应处理单元并与其通信，每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备，用于处理在其上接收的窥探请求，并且转发请求之一或者阻止将请求转发到其相关联的处理单元。多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件，每个并行操作子滤波器元件同时接收相同的窥探请求，并且实现一个或多个不同的窥探滤波器算法，用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求并且防止将这些请求转发到处理器单元。以这种方式，减少了转发到处理单元的多个窥探请求，从而增加了计算环境的性能。

6. 发明授权

US07174434B2 Low latency memory access and synchronization 失效
标题翻译：低延迟内存访问和同步
公开(公告)号：US07174434B2
公开(公告)日：2007-02-06
申请号：US10468994
申请日：2002-02-25
申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas
IPC分类号： G06F12/12
CPC分类号： G06F9/52
摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
摘要翻译：与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的每个处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

7. 发明授权

US08255638B2 Snoop filter for filtering snoop requests 失效
标题翻译：用于过滤窥探请求的Snoop过滤器
公开(公告)号：US08255638B2
公开(公告)日：2012-08-28
申请号：US12113262
申请日：2008-05-01
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/00 , G06F13/00
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

8. 发明授权

US08161248B2 Simplifying and speeding the management of intra-node cache coherence 失效
标题翻译：简化和加快节点内缓存一致性管理
公开(公告)号：US08161248B2
公开(公告)日：2012-04-17
申请号：US12953770
申请日：2010-11-24
申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Phillip Heidelberger , Dirk Hoenicke , Martin Ohmacht
发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Phillip Heidelberger , Dirk Hoenicke , Martin Ohmacht
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F15/167
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
摘要翻译：一种用于管理多处理器计算机系统的两个处理器节点的两个处理器之间的相干性的方法和装置。通常，本发明涉及一种软件算法，其简化并显着加速了传送并行计算机的消息中的高速缓存一致性的管理以及辅助该高速缓存一致性算法的硬件设备。软件算法使用put / get窗口的打开和关闭来协调激活的所需要的，以实现缓存一致性。硬件设备可以是硬件地址解码的扩展，其在节点的物理存储器地址空间中创建（a）实际不存在的虚拟存储器的区域，并且（b）因此能够立即响应从处理元素读取和写入请求。

9. 发明授权

US08140925B2 Method and apparatus to debug an integrated circuit chip via synchronous clock stop and scan 失效
标题翻译：通过同步时钟停止和扫描来调试集成电路芯片的方法和装置
公开(公告)号：US08140925B2
公开(公告)日：2012-03-20
申请号：US11768791
申请日：2007-06-26
申请人： Ralph E. Bellofatto , Matthew R. Ellavsky , Alan G. Gara , Mark E. Giampapa , Thomas M. Gooding , Rudolf A. Haring , Lance G. Hehenberger , Martin Ohmacht
发明人： Ralph E. Bellofatto , Matthew R. Ellavsky , Alan G. Gara , Mark E. Giampapa , Thomas M. Gooding , Rudolf A. Haring , Lance G. Hehenberger , Martin Ohmacht
IPC分类号： G01R31/28 , G06F1/12
CPC分类号： G06F11/2236
摘要： An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state. The apparatus and methodology enables construction of a cycle-by-cycle view of any part of the state of a running IC chip, using a combination of on-chip circuitry and software.
摘要翻译：一种用于评估电子或集成电路（IC）的状态的装置和方法，每个IC包括用于控制IC子单元的操作的一个或多个处理器元件，以及每个支持多个时钟域的IC。该方法包括：根据确定的定时配置，产生与一个或多个IC子单元相对应的用于开始一个或多个IC子单元的操作的同步的使能信号组; 计数，响应于同步的一组使能信号的一个信号，多个主处理器IC时钟周期; 并且在获得期望的时钟周期数时，产生用于每个唯一频率时钟域的停止信号以同步地停止每个相应频率时钟域的功能时钟; 并且在确定性地同时停止所有频率时钟域上的所有片上功能时钟时，以期望的IC芯片状态扫描数据值。该装置和方法使得能够使用片上电路和软件的组合来构建运行中的IC芯片的状态的任何部分的逐周期视图。

10. 发明授权

US08122197B2 Managing coherence via put/get windows 失效
标题翻译：通过put / get窗口管理一致性
公开(公告)号：US08122197B2
公开(公告)日：2012-02-21
申请号：US12543890
申请日：2009-08-19
申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht
发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
摘要翻译：一种用于管理多处理器计算机系统的两个处理器节点的两个处理器之间的相干性的方法和装置。通常，本发明涉及一种软件算法，其简化并显着加速了传送并行计算机的消息中的高速缓存一致性的管理以及辅助该高速缓存一致性算法的硬件设备。软件算法使用put / get窗口的打开和关闭来协调激活的所需要的，以实现缓存一致性。硬件设备可以是硬件地址解码的扩展，其在节点的物理存储器地址空间中创建（a）实际不存在的虚拟存储器的区域，并且（b）因此能够立即响应从处理元素读取和写入请求。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式