专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

61. 发明授权

US09857858B2 Managing power consumption and performance of computing systems 有权
公开(公告)号：US09857858B2
公开(公告)日：2018-01-02
申请号：US13976817
申请日：2012-05-17
申请人： Devadatta V. Bodas , John H. Crawford , Alan G. Gara
发明人： Devadatta V. Bodas , John H. Crawford , Alan G. Gara
IPC分类号： G06F1/26 , G06F1/32
CPC分类号： G06F1/3206 , G06F1/3209 , G06F1/3234
摘要： A method and system for managing power consumption and performance of computing systems are described herein. The method includes monitoring an overall power consumption of the computing systems to determine whether the overall power consumption is above or below an overall power consumption limit, and monitoring a performance of each computing system to determine whether the performance is within a performance tolerance. The method further includes adjusting the power consumption limits for the computing systems or the performances of the computing systems such that the overall power consumption is below the overall power consumption limit and the performance of each computing system is within the performance tolerance.

62. 发明授权

US08677073B2 Snoop filter for filtering snoop requests 有权
标题翻译：用于过滤窥探请求的Snoop过滤器
公开(公告)号：US08677073B2
公开(公告)日：2014-03-18
申请号：US13587420
申请日：2012-08-16
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F13/28 , G06F12/00
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

63. 发明授权

US08627010B2 Write-through cache optimized for dependence-free parallel regions 有权
标题翻译：针对无依赖并行区域优化的直写缓存
公开(公告)号：US08627010B2
公开(公告)日：2014-01-07
申请号：US13604349
申请日：2012-09-05
申请人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
发明人： Alexandre E. Eichenberger , Alan G. Gara , Martin Ohmacht , Vijayalakshmi Srinivasan
IPC分类号： G06F12/00
CPC分类号： G06F12/0837
摘要： An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
摘要翻译：一种用于提高并行计算系统性能的设备和计算机程序产品。与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器来检测第一高速缓存行的虚假共享的发生，并允许第一高速缓存行的错误共享由第二处理器。当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时，发生第一高速缓存行的错误共享设备由第二硬件本地缓存控制器。

64. 发明授权

US08108738B2 Data eye monitor method and apparatus 失效
标题翻译：数据眼监护仪方法及装置
公开(公告)号：US08108738B2
公开(公告)日：2012-01-31
申请号：US11768810
申请日：2007-06-26
申请人： Alan G. Gara , James A. Marcella , Martin Ohmacht
发明人： Alan G. Gara , James A. Marcella , Martin Ohmacht
IPC分类号： G06K5/04 , G11B5/00 , G11B20/20
CPC分类号： G06F13/1689
摘要： An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.
摘要翻译：一种用于提供数据眼监护仪的装置和方法。数据眼监视装置利用逆变器/锁存器串电路和一组锁存器来保存数据，以提供无限持续数据眼。在操作中，输入的读数据信号在第一阶段被单独地调整并被锁存以将读取的数据提供给请求单元。数据也被同时馈送到平衡XOR树中，以将所有输入的读取数据信号的转换组合成单个信号。该信号沿着延迟链传递，并以恒定间隔敲击。抽头点被馈送到锁存器，以延迟元件间隔分辨率捕获转换。使用XOR，检测相邻抽头之间的差异，因此检测到转换之间的差异。眼睛由在一系列样本上没有显示转换的片段定义。眼睛大小和位置可用于重新调整输入信号的延迟和/或控制环境参数，如电压，时钟速度和温度。

65. 发明授权

US08095585B2 Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer 失效
标题翻译：在分布式存储并行多节点计算机上高效实现多维快速傅里叶变换
公开(公告)号：US08095585B2
公开(公告)日：2012-01-10
申请号：US11931898
申请日：2007-10-31
申请人： Gyan V. Bhanot , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Gyan V. Bhanot , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F17/14
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The “all-to-all” re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
摘要翻译：发明内容涉及一种用于有效地实现多维阵列的多维快速傅里叶变换（FFT）的方法，系统和程序存储设备，所述多维阵列包括最初分布在多节点计算机系统中的多个元素，所述多节点包括多个节点通过网络进行通信，包括：通过所述网络在所述计算机系统的所述多个节点之间以第一维度分布所述阵列的所述多个元素以促进第一一维FFT; 对分布在第一维度中的每个节点的阵列的元素执行第一个一维FFT; 通过网络上的计算机系统的其他节点以随机顺序的“全对全”分布，在第二维度中的每个节点处重新分布一维FFT变换的元素; 以及对在所述第二维度中的每个节点处重新分布的阵列的元素执行第二一维FFT，其中所述随机顺序有助于所述网络的有效利用，从而有效地实现所述多维FFT。在分布式存储器并行超级计算机上的多维FFT以外的应用中，数组元素的“全部”重新分配进一步有效地实现。

66. 发明授权

US08001401B2 Power throttling of collections of computing elements 失效
标题翻译：功率节流计算元件的集合
公开(公告)号：US08001401B2
公开(公告)日：2011-08-16
申请号：US11768752
申请日：2007-06-26
申请人： Ralph E. Bellofatto , Paul W. Coteus , Paul G. Crumley , Alan G. Gara , Mark E. Giampapa , Thomas M. Gooding , Rudolf A. Haring , Mark G. Megerian , Martin Ohmacht , Don D. Reed , Richard A. Swetz , Todd Takken
发明人： Ralph E. Bellofatto , Paul W. Coteus , Paul G. Crumley , Alan G. Gara , Mark E. Giampapa , Thomas M. Gooding , Rudolf A. Haring , Mark G. Megerian , Martin Ohmacht , Don D. Reed , Richard A. Swetz , Todd Takken
IPC分类号： G06F1/26
CPC分类号： G06F1/3203 , G06F1/206
摘要： An apparatus and method for controlling power usage in a computer includes a plurality of computers communicating with a local control device, and a power source supplying power to the local control device and the computer. A plurality of sensors communicate with the computer for ascertaining power usage of the computer, and a system control device communicates with the computer for controlling power usage of the computer.
摘要翻译：用于控制计算机中的电力使用的装置和方法包括与本地控制装置通信的多个计算机，以及向本地控制装置和计算机供电的电源。多个传感器与计算机通信以确定计算机的功率使用，并且系统控制装置与计算机通信以控制计算机的电力使用。

67. 发明授权

US07870343B2 Managing coherence via put/get windows 失效
标题翻译：通过put / get窗口管理一致性
公开(公告)号：US07870343B2
公开(公告)日：2011-01-11
申请号：US10468995
申请日：2002-02-25
申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht
发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F15/167
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
摘要翻译：一种用于管理多处理器计算机系统的两个处理器节点的两个处理器之间的相干性的方法和装置。通常，本发明涉及一种软件算法，其简化并显着加速了传送并行计算机的消息中的高速缓存一致性的管理以及辅助该高速缓存一致性算法的硬件设备。软件算法使用put / get窗口的打开和关闭来协调激活的所需要的，以实现缓存一致性。硬件设备可以是硬件地址解码的扩展，其在节点的物理存储器地址空间中创建（a）实际不存在的虚拟存储器的区域，并且（b）因此能够立即响应从处理元素读取和写入请求。

68. 发明授权

US07818514B2 Low latency memory access and synchronization 失效
标题翻译：低延迟内存访问和同步
公开(公告)号：US07818514B2
公开(公告)日：2010-10-19
申请号：US12196796
申请日：2008-08-22
申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas
IPC分类号： G06F12/06
CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028
摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
摘要翻译：与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的Bach处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

69. 发明授权

US07788334B2 Multiple node remote messaging 有权
标题翻译：多节点远程消息传递
公开(公告)号：US07788334B2
公开(公告)日：2010-08-31
申请号：US11768784
申请日：2007-06-26
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas
IPC分类号： G06F15/167 , G06F13/28
CPC分类号： G06F15/16
摘要： A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
摘要翻译：在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括：第一计算节点（A）将单个远程消息发送到远程第二计算节点（B），以便控制远程第二计算节点（B）发送至少一个远程消息。该方法包括各种步骤，包括在第一计算节点（A）处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符，用于控制远程第二计算节点（B）至少发送一个远程消息，包括将第一消息描述符放在第一计算节点（A）的注入FIFO中，并将单个远程消息和至少一个远程消息描述符发送到第二计算节点（B）。

70. 发明申请

US20090259713A1 NOVEL MASSIVELY PARALLEL SUPERCOMPUTER 有权
标题翻译：新的大型并行超级计算机
公开(公告)号：US20090259713A1
公开(公告)日：2009-10-15
申请号：US12492799
申请日：2009-06-26
申请人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken
发明人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken
IPC分类号： G06F15/76 , G06F15/16 , G06F11/28 , G06F12/08 , G06F9/02 , G06F15/177
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external connectivity and used for Input/Output, System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the supercomputer in multiple networks for optimizing supercomputing resources.
摘要翻译：数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构，即每个处理节点包括单个专用集成电路（ASIC）。在每个ASIC节点内是多个处理元件，每个处理元件由中央处理单元（CPU）和多个浮点处理器组成，以实现计算性能，封装密度，低成本以及功率和冷却要求的最佳平衡。单个节点内的多个处理器可以单独使用或同时使用，以在任何时间点解决或执行的特定算法所要求的任何计算或通信组合上工作。片上系统ASIC节点通过多个独立网络互连，从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。在优选实施例中，多个网络包括用于并行算法消息传递的三个高速网络，包括提供全局障碍和通知功能的环形，全局树和全球异步网络。这些多个独立网络可以根据用于优化算法处理性能的算法的需求或阶段来协同或独立地利用。对于特定类别的并行算法或并行计算的部分，该架构具有出色的计算性能，并且可以启用对新类并行算法执行计算。为外部连接提供附加网络，用于输入/输出，系统管理和配置以及调试和监控功能。实现中平面和其他硬件设备的特殊节点打包技术有助于在多个网络中划分超级计算机，以优化超级计算资源。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式