专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08230423B2 Multithreaded processor architecture with operational latency hiding 有权
标题翻译：具有可操作延迟隐藏的多线程处理器架构
公开(公告)号：US08230423B2
公开(公告)日：2012-07-24
申请号：US11101601
申请日：2005-04-07
申请人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
发明人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
IPC分类号： G06F9/46 , G06F9/40 , G06F7/38
CPC分类号： G06F9/38 , G06F9/30043 , G06F9/3009 , G06F9/30112 , G06F9/3017 , G06F9/3824 , G06F9/383 , G06F9/3851 , G06F9/461 , G06F9/4843 , G06F11/2094
摘要： A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.
摘要翻译：公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。上下文切换用于隐藏两个存储器访问操作（即，加载和存储）和算术/逻辑操作的延迟。当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时，通过执行到不同线程的上下文切换来隐藏等待时间。当操作的结果变得可用时，执行回到该线程的上下文切换以允许线程继续。

2. 发明授权

US08972703B2 Multithreaded processor architecture with operational latency hiding 有权
标题翻译：具有可操作延迟隐藏的多线程处理器架构
公开(公告)号：US08972703B2
公开(公告)日：2015-03-03
申请号：US13180724
申请日：2011-07-12
申请人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
发明人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
IPC分类号： G06F9/38 , G06F11/20
CPC分类号： G06F9/38 , G06F9/30043 , G06F9/3009 , G06F9/30112 , G06F9/3017 , G06F9/3824 , G06F9/383 , G06F9/3851 , G06F9/461 , G06F9/4843 , G06F11/2094
摘要： A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.
摘要翻译：公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。上下文切换用于隐藏两个存储器访问操作（即，加载和存储）和算术/逻辑操作的延迟。当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时，通过执行到不同线程的上下文切换来隐藏等待时间。当操作的结果变得可用时，执行回到该线程的上下文切换以允许线程继续。

3. 发明申请

US20140075159A1 Multithreaded processor architecture with operational latency hiding 有权
标题翻译：具有可操作延迟隐藏的多线程处理器架构
公开(公告)号：US20140075159A1
公开(公告)日：2014-03-13
申请号：US13180724
申请日：2011-07-12
申请人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
发明人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
IPC分类号： G06F9/38
CPC分类号： G06F9/38 , G06F9/30043 , G06F9/3009 , G06F9/30112 , G06F9/3017 , G06F9/3824 , G06F9/383 , G06F9/3851 , G06F9/461 , G06F9/4843 , G06F11/2094
摘要： A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.
摘要翻译：公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。上下文切换用于隐藏两个存储器访问操作（即，加载和存储）和算术/逻辑操作的延迟。当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时，通过执行到不同线程的上下文切换来隐藏等待时间。当操作的结果变得可用时，执行回到该线程的上下文切换以允许线程继续。

4. 发明申请

US20060230409A1 Multithreaded processor architecture with implicit granularity adaptation 审中-公开
标题翻译：具有隐式粒度适配性的多线程处理器架构
公开(公告)号：US20060230409A1
公开(公告)日：2006-10-12
申请号：US11101608
申请日：2005-04-07
申请人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
发明人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
IPC分类号： G06F9/46
CPC分类号： G06F9/4843
摘要： A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new threads and having a novel operational semantics. If a hardware thread is available to shepherd a forked thread, the fork and join instructions have thread creation and termination/synchronization semantics, respectively. If no hardware thread is available, however, the fork and join instructions assume subroutine call and return semantics respectively. The link register of the processor is used to determine whether a given join instruction should be treated as a thread synchronization operation or as a return from subroutine operation.
摘要翻译：公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。优选实施例定义了用于产生新线程并具有新颖的操作语义的“叉”和“连接”指令。如果一个硬件线程可用于分派叉形线程，则fork和join指令分别具有线程创建和终止/同步语义。然而，如果没有硬件线程可用，fork和join指令分别假定子程序调用和返回语义。处理器的链接寄存器用于确定给定的连接指令是否应被视为线程同步操作或作为从子程序操作返回。

5. 发明申请

US20060230408A1 Multithreaded processor architecture with operational latency hiding 有权
标题翻译：具有可操作延迟隐藏的多线程处理器架构
公开(公告)号：US20060230408A1
公开(公告)日：2006-10-12
申请号：US11101601
申请日：2005-04-07
申请人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
发明人： Matteo Frigo , Ahmed Gheith , Volker Strumpen
IPC分类号： G06F9/46
CPC分类号： G06F9/38 , G06F9/30043 , G06F9/3009 , G06F9/30112 , G06F9/3017 , G06F9/3824 , G06F9/383 , G06F9/3851 , G06F9/461 , G06F9/4843 , G06F11/2094
摘要： A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.
摘要翻译：公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。上下文切换用于隐藏两个存储器访问操作（即，加载和存储）和算术/逻辑操作的延迟。当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时，通过执行到不同线程的上下文切换来隐藏等待时间。当操作的结果变得可用时，执行回到该线程的上下文切换以允许线程继续。

6. 发明授权

US08060699B2 Spiral cache memory and method of operating a spiral cache 失效
标题翻译：螺旋高速缓存和操作螺旋高速缓存的方法
公开(公告)号：US08060699B2
公开(公告)日：2011-11-15
申请号：US12270095
申请日：2008-11-13
申请人： Volker Strumpen , Matteo Frigo
发明人： Volker Strumpen , Matteo Frigo
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28
CPC分类号： G06F12/123 , G06F12/0897 , G06F2212/271 , Y02D10/13
摘要： A memory provides reduction in access latency for frequently-accessed values by self-organizing to always move a requested value to a front-most central storage element of a spiral. The occupant of the central location is swapped backward, which continues backward through the spiral until an empty location is swapped-to, or the last displaced value is cast out of the last location in the spiral. The elements in the spiral may be cache memories or single elements. The resulting cache memory is self-organizing and for the one-dimensional implementation has a worst-case access time proportional to N, where N is the number of tiles in the spiral. A k-dimensional spiral cache has a worst-case access time proportional to N1/k. Further, a spiral cache system provides a basis for a non-inclusive system of cache memory, which reduces the amount of space and power consumed by a cache memory of a given size.
摘要翻译：存储器通过自组织来提供经常访问的值的访问等待时间，以将请求的值始终移动到螺旋的最前面的中央存储元件。中央位置的乘员被倒置，后退通过螺旋，直到空的位置被交换，或者最后的位移值被抛弃在螺旋中的最后位置。螺旋中的元件可以是高速缓冲存储器或单个元件。所得到的高速缓冲存储器是自组织的，并且由于一维实现具有与N成比例的最差情况访问时间，其中N是螺旋中的瓦片的数量。 k维螺旋高速缓存具有与N1 / k成比例的最差情况访问时间。此外，螺旋高速缓存系统为非包容性高速缓存存储器系统提供了基础，其减少了给定大小的高速缓冲存储器消耗的空间和功率量。

7. 发明申请

US20100122035A1 SPIRAL CACHE MEMORY AND METHOD OF OPERATING A SPIRAL CACHE 失效
标题翻译：螺旋式高速缓存存储器和操作螺旋缓存的方法
公开(公告)号：US20100122035A1
公开(公告)日：2010-05-13
申请号：US12270095
申请日：2008-11-13
申请人： Volker Strumpen , Matteo Frigo
发明人： Volker Strumpen , Matteo Frigo
IPC分类号： G06F12/08 , G06F12/00 , G06F1/04
CPC分类号： G06F12/123 , G06F12/0897 , G06F2212/271 , Y02D10/13
摘要： A spiral cache memory provides reduction in access latency for frequently-accessed values by self-organizing to always move a requested value to a front-most central storage element of the spiral. The occupant of the central location is swapped backward, which continues backward through the spiral until an empty location is swapped-to, or the last displaced value is cast out of the last location in the spiral. The elements in the spiral may be cache memories or single elements. The resulting cache memory is self-organizing and for the one-dimensional implementation has a worst-case access time proportional to N, where N is the number of tiles in the spiral. A k-dimensional spiral cache has a worst-case access time proportional to N1/k. Further, a spiral cache system provides a basis for a non-inclusive system of cache memory, which reduces the amount of space and power consumed by a cache memory of a given size.
摘要翻译：螺旋高速缓冲存储器通过自组织来提供经常访问的值的访问延迟的降低，以便始终将请求的值移动到螺旋的最前面的中央存储元件。中央位置的乘员被倒置，后退通过螺旋，直到空的位置被交换，或者最后的位移值被抛弃在螺旋中的最后位置。螺旋中的元件可以是高速缓冲存储器或单个元件。所得到的高速缓冲存储器是自组织的，并且由于一维实现具有与N成比例的最差情况访问时间，其中N是螺旋中的瓦片的数量。 k维螺旋高速缓存具有与N1 / k成比例的最差情况访问时间。此外，螺旋高速缓存系统为非包容性高速缓冲存储器系统提供了基础，其减少了给定大小的高速缓冲存储器消耗的空间和功率的量。

8. 发明申请

US20070260663A1 Cyclic segmented prefix circuits for mesh networks 有权
标题翻译：网状网络的循环分段前缀电路
公开(公告)号：US20070260663A1
公开(公告)日：2007-11-08
申请号：US11408099
申请日：2006-04-20
申请人： Matteo Frigo , Volker Strumpen
发明人： Matteo Frigo , Volker Strumpen
IPC分类号： G06F7/38
CPC分类号： G06F7/506 , G06F2207/5063
摘要： Parallel prefix circuits for computing a cyclic segmented prefix operation with a mesh topology are disclosed. In one embodiment of the present invention, the elements (prefix nodes) of the mesh are arranged in row-major order. Values are accumulated toward the center of the mesh and partial results are propagated outward from the center of the mesh to complete the cyclic segmented prefix operation. This embodiment has been shown to be time-optimal. In another embodiment of the present invention, the prefix nodes are arranged such that the prefix node corresponding to the last element in the array is located at the center of the array. This alternative embodiment is not only time-optimal when accounting for wire-lengths (and therefore propagation delays), but it is also asympotically optimal in terms of minimizing the number of segmented prefix operators.
摘要翻译：公开了用于计算具有网格拓扑的循环分段前缀操作的并行前缀电路。在本发明的一个实施例中，网格的元素（前缀节点）按行主顺序排列。值向网格的中心累积，部分结果从网格的中心向外传播，以完成循环分段前缀操作。该实施例已被证明是时间最佳的。在本发明的另一个实施例中，前缀节点被布置成使得与阵列中的最后一个元素相对应的前缀节点位于阵列的中心。这种替代实施例不仅在考虑线长度（因此传播延迟）时是时间最优的，而且在最小化分段前缀运算符的数量方面也是最优的。

9. 发明申请

US20100122034A1 STORAGE ARRAY TILE SUPPORTING SYSTOLIC MOVEMENT OPERATIONS 有权
标题翻译：存储阵列支持协同运动操作
公开(公告)号：US20100122034A1
公开(公告)日：2010-05-13
申请号：US12270186
申请日：2008-11-13
申请人： Volker Strumpen , Matteo Frigo
发明人： Volker Strumpen , Matteo Frigo
IPC分类号： G06F12/02 , G06F12/08
CPC分类号： G06F12/0897 , G06F12/0855 , G06F2212/271
摘要： A tile for use in a tiled storage array provides re-organization of values within the tile array without requiring sophisticated global control. The tiles operate to move a requested value to a front-most storage element of the tile array according to a global systolic clock. The previous occupant of the front-most location is moved or swapped backward according to the systolic clock, and the new occupant is moved forward according to the systolic clock, according to the operation of the tiles, while providing for multiple in-flight access requests within the tile array. The placement heuristic that moves the values is determined according to the position of the tiles within the array and the behavior of the tiles. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the tile array.
摘要翻译：用于平铺存储阵列的瓦片可以重新组织瓦片阵列中的值，而不需要复杂的全局控制。瓦片用于根据全局收缩时钟将请求的值移动到瓦片阵列的最前面的存储元件。根据收视时钟，最前面的位置的先前乘客向后移动或倒换，并且根据瓦片的操作根据收缩时钟向前移动新的乘客，同时提供多个飞行中访问请求在瓷砖阵列内。根据数组中瓦片的位置和瓦片的行为来确定移动值的放置启发式。值的移动可以仅通过瓦片阵列内的相邻瓦片的下一个相邻连接来执行。

10. 发明授权

US08527726B2 Tiled storage array with systolic move-to-front reorganization 有权
标题翻译：平铺式存储阵列，具有收缩前移重组
公开(公告)号：US08527726B2
公开(公告)日：2013-09-03
申请号：US12270132
申请日：2008-11-13
申请人： Volker Strumpen , Matteo Frigo
发明人： Volker Strumpen , Matteo Frigo
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28
CPC分类号： G06F12/0811 , G06F3/0689 , G06F12/0897 , G06F12/123 , G06F2212/271
摘要： A tiled storage array provides reduction in access latency for frequently-accessed values by re-organizing to always move a requested value to a front-most storage element of array. The previous occupant of the front-most location is moved backward according to a systolic pulse, and the new occupant is moved forward according to the systolic pulse, preserving the uniqueness of the stored values within the array, and providing for multiple in-flight access requests within the array. The placement heuristic that moves the values according to the systolic pulse can be implemented by control logic within identical tiles, so that the placement heuristic moves the values according to the position of the tiles within the array. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the array.
摘要翻译：平铺的存储阵列通过重新组织来提供频繁访问值的访问延迟，从而始终将请求的值移动到阵列的最前面的存储元素。根据收缩期脉搏，最前面的位置的前乘客向后移动，并且新乘员根据收缩脉冲向前移动，保持阵列内存储值的唯一性，并提供多个飞行中访问数组内的请求。根据收缩期脉冲移动值的放置启发式可以由相同瓦片内的控制逻辑实现，使得放置启发式根据阵列内的瓦片的位置来移动值。值的移动可以仅通过阵列内的相邻瓦片的下一个相邻连接来执行。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式