会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 41. 发明授权
    • System for efficient performance monitoring of a large number of simultaneous events
    • 系统用于高效率监控大量同时发生的事件
    • US07877759B2
    • 2011-01-25
    • US12324254
    • 2008-11-26
    • Alan G. GaraMichael K. GschwindValentina Salapura
    • Alan G. GaraMichael K. GschwindValentina Salapura
    • G06F3/00G06F9/44G06F9/46G06F13/00
    • G06F11/348Y02D10/34
    • A system for monitoring a large number of simultaneous events implements a hybrid counter array device having a first counter portion comprising counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. A second counter portion comprises a memory array device having addressable memory locations in correspondence with the counter devices, each addressable memory location for storing a second count value representing higher order bits. A control device monitors each of the counter devices and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location. The system includes interrupt pre-indication for providing fast interrupt trigger to a processor device when a count value related to an event equals a threshold value.
    • 一种用于监视大量同时事件的系统实现具有包括计数器装置的第一计数器部分的混合计数器阵列装置,每个计数器装置用于接收表示从事件源发生的事件的信号,并提供对应于较低次序的第一计数值 混合计数器阵列的位。 第二计数器部分包括具有与计数器装置对应的可寻址存储器位置的存储器阵列器件,每个可寻址存储器位置用于存储表示较高位的第二计数值。 控制装置监视每个计数器装置并且启动更新存储在相应的可寻址存储器位置处的对应的第二计数值的值。 当与事件相关的计数值等于阈值时,该系统包括用于向处理器设备提供快速中断触发的中断预指示。
    • 44. 发明授权
    • Low latency counter event indication
    • 低延迟计数器事件指示
    • US07782995B2
    • 2010-08-24
    • US12130724
    • 2008-05-30
    • Alan G. GaraValentina Salapura
    • Alan G. GaraValentina Salapura
    • G06M3/00
    • H03K23/54
    • A hybrid counter array device for counting events with interrupt indication includes a first counter portion comprising N counter devices, each for counting signals representing event occurrences and providing a first count value representing lower order bits. An overflow bit device associated with each respective counter device is additionally set in response to an overflow condition. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits. An operatively coupled control device monitors each associated overflow bit device and initiates incrementing a second count value stored at a corresponding memory location in response to a respective overflow bit being set. The incremented second count value is compared to an interrupt threshold value stored in a threshold register, and, when the second counter value is equal to the interrupt threshold value, a corresponding “interrupt arm” bit is set to enable a fast interrupt indication. On a subsequent roll-over of the lower bits of that counter, the interrupt will be fired.
    • 一种用于对具有中断指示进行计数事件的混合计数器阵列装置,包括包括N个计数器装置的第一计数器部分,每个计数器装置用于计数表示事件发生的信号,并提供表示较低位的第一计数值。 与每个相应计数器装置相关联的溢出位装置响应于溢出状况被另外设置。 混合计数器阵列包括第二计数器部分,其包括具有与N个计数器装置对应的N个可寻址存储器位置的存储器阵列器件,每个可寻址存储器位置用于存储表示较高阶位的第二计数值。 操作耦合的控制设备监视每个相关联的溢出位设备并且响应于设置的相应溢出位而启动存储在相应存储器位置的第二计数值的递增。 将增加的第二计数值与存储在阈值寄存器中的中断阈值进行比较,并且当第二计数器值等于中断阈值时,相应的“中断臂”位被设置为使能快速中断指示。 在该计数器的低位后续翻转时,中断将被触发。
    • 47. 发明授权
    • Massively parallel supercomputer
    • 大型并行超级计算机
    • US07555566B2
    • 2009-06-30
    • US10468993
    • 2002-02-25
    • Matthias A. BlumrichDong ChenGeorge L. ChiuThomas M. CipollaPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerGerard V. KopcsayLawrence S. MokTodd E. Takken
    • Matthias A. BlumrichDong ChenGeorge L. ChiuThomas M. CipollaPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerGerard V. KopcsayLawrence S. MokTodd E. Takken
    • G06F15/16
    • H05K7/20836F24F11/77G06F9/52G06F9/526G06F15/17381G06F17/142G09G5/008H04L7/0338
    • A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external connectivity and used for Input/Output, System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the supercomputer in multiple networks for optimizing supercomputing resources.
    • 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构,即,每个处理节点包括单个专用集成电路(ASIC)。 在每个ASIC节点内是多个处理元件,每个处理元件由中央处理单元(CPU)和多个浮点处理器组成,以实现计算性能,封装密度,低成本以及功率和冷却​​要求的最佳平衡。 单个节点内的多个处理器可以单独使用或同时使用,以在任何时间点解决或执行的特定算法所要求的任何计算或通信组合上工作。 片上系统ASIC节点通过多个独立网络互连,从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。 在优选实施例中,多个网络包括用于并行算法消息传递的三个高速网络,包括提供全局障碍和通知功能的环形,全局树和全球异步网络。 这些多个独立网络可以根据用于优化算法处理性能的算法的需求或阶段来协同或独立地利用。 对于特定类别的并行算法或并行计算的部分,该架构具有出色的计算性能,并且可以启用对新类并行算法执行计算。 为外部连接提供附加网络,用于输入/输出,系统管理和配置以及调试和监控功能。 实现中平面和其他硬件设备的特殊节点打包技术有助于在多个网络中划分超级计算机,以优化超级计算资源。
    • 48. 发明授权
    • Method for prefetching non-contiguous data structures
    • 预取非连续数据结构的方法
    • US07529895B2
    • 2009-05-05
    • US11617276
    • 2006-12-28
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • G06F13/28
    • G06F12/0862G06F9/52G06F2212/6028
    • A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
    • 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单完善。 存储器线被重新定义,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定哪个存储器行被提供而不是一些其它预测 算法。 这使得硬件能够有效地预处理不连续但重复的存储器访问模式。
    • 49. 发明申请
    • METHOD AND APPARATUS TO DEBUG AN INTEGRATED CIRCUIT CHIP VIA SYNCHRONOUS CLOCK STOP AND SCAN
    • 通过同步时钟停止和扫描来调试集成电路芯片的方法和设备
    • US20090006894A1
    • 2009-01-01
    • US11768791
    • 2007-06-26
    • Ralph E. BellofattoMatthew R. EllavskyAlan G. GaraMark E. GiampapaThomas M. GoodingRudolf A. HaringLance G. HehenbergerMartin Ohmacht
    • Ralph E. BellofattoMatthew R. EllavskyAlan G. GaraMark E. GiampapaThomas M. GoodingRudolf A. HaringLance G. HehenbergerMartin Ohmacht
    • G06F11/00
    • G06F11/2236
    • An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state. The apparatus and methodology enables construction of a cycle-by-cycle view of any part of the state of a running IC chip, using a combination of on-chip circuitry and software.
    • 一种用于评估电子或集成电路(IC)的状态的装置和方法,每个IC包括用于控制IC子单元的操作的一个或多个处理器元件,以及每个支持多个时钟域的IC。 该方法包括:根据确定的定时配置,产生与一个或多个IC子单元相对应的用于开始一个或多个IC子单元的操作的同步的使能信号组; 计数,响应于同步的一组使能信号的一个信号,多个主处理器IC时钟周期; 并且在获得期望的时钟周期数时,产生用于每个唯一频率时钟域的停止信号以同步地停止每个相应频率时钟域的功能时钟; 并且在确定性地同时停止所有频率时钟域上的所有片上功能时钟时,以期望的IC芯片状态扫描数据值。 该装置和方法使得能够使用片上电路和软件的组合来构建运行中的IC芯片的状态的任何部分的逐周期视图。