会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明授权
    • Low latency memory access and synchronization
    • 低延迟内存访问和同步
    • US07174434B2
    • 2007-02-06
    • US10468994
    • 2002-02-25
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • G06F12/12
    • G06F9/52
    • A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
    • 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单预取。 重新定义存储器线,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定要预取的存储器行而不是一些其它预测 算法。 这使得硬件能够有效地预取不连续但重复的存储器访问模式。
    • 4. 发明授权
    • Method for prefetching non-contiguous data structures
    • 预取非连续数据结构的方法
    • US07529895B2
    • 2009-05-05
    • US11617276
    • 2006-12-28
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • G06F13/28
    • G06F12/0862G06F9/52G06F2212/6028
    • A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
    • 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单完善。 存储器线被重新定义,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定哪个存储器行被提供而不是一些其它预测 算法。 这使得硬件能够有效地预处理不连续但重复的存储器访问模式。
    • 6. 发明授权
    • Low latency memory access and synchronization
    • 低延迟内存访问和同步
    • US07818514B2
    • 2010-10-19
    • US12196796
    • 2008-08-22
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • G06F12/06
    • G06F12/0862G06F9/52G06F2212/6028
    • A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
    • 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的Bach处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单预取。 重新定义存储器线,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定要预取的存储器行而不是一些其它预测 算法。 这使得硬件能够有效地预取不连续但重复的存储器访问模式。
    • 10. 发明授权
    • Arithmetic functions in torus and tree networks
    • 圆环和树网络中的算术函数
    • US07313582B2
    • 2007-12-25
    • US10468991
    • 2002-02-25
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F7/38
    • G06F15/17337
    • Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working in conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time required for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The invention involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimize the number of hops and the bidirectional capabilities of the network to reduce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer. With the method of this invention, every single element is injected into the network only once and it will be stored and forwarded without any further software overhead. In accordance with a second aspect of the invention, methods and systems are provided to efficiently implement global arithmetic operations on a network that supports the global combining operations. The latency of doing such global operations are greatly reduced by using these methods.
    • 用于执行算术功能的方法和系统。 根据本发明的第一方面,提供了方法和装置,其结合软件算法和类网络路由的硬件实现,以实现对环面上的全局算术运算所需的时间的非常显着的减少。 因此,它可以提高在大型并行机上运行的应用程序的可扩展性。 本发明涉及提高全球运营效率和准确性三个步骤:(1)在必要时确保所有节点按照相同顺序对数据进行全局运算,从而获得独立的回答,而不考虑舍入误差; (2)使用环面的拓扑来最小化跳数和网络的双向能力,将数据传输操作中的时间步数减少到绝对最小值; 和(3)使用类函数路由来减少数据传输中的延迟。 利用本发明的方法,每个单个元件仅被注入到网络中一次,并且它将被存储和转发而没有任何进一步的软件开销。 根据本发明的第二方面,提供了用于在支持全局组合操作的网络上有效地实现全局算术运算的方法和系统。 通过使用这些方法大大减少了进行这种全局操作的延迟。