会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • Performing A Local Reduction Operation On A Parallel Computer
    • 在并行计算机上执行局部缩减操作
    • US20120317399A1
    • 2012-12-13
    • US13585993
    • 2012-08-15
    • Michael A. BlocksomeDaniel A. Faraj
    • Michael A. BlocksomeDaniel A. Faraj
    • G06F15/76G06F15/16G06F9/02G06F12/00
    • G06F15/17387G06F15/17318
    • A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    • 并行计算机包括计算节点,每个包括两个减少处理核心,一个网络写入处理核心和一个网络读取处理核心,每个处理核心分配一个输入缓冲器。 通过缩小处理核心在交织块中将缩小处理核心的输入缓冲器的内容复制到共享存储器中的交错缓冲器; 通过一个还原处理核心将网络写处理核心的输入缓冲器的内容复制到共享存储器; 通过另一个还原处理核心将网络读处理核心的输入缓冲器的内容复制到共享存储器; 并通过还原处理核心并行减少:还原处理核心的输入缓冲器的内容; 交错缓冲器的每隔一个交错块; 复制内容的网络写入处理核心的输入缓冲区; 以及网络读取处理核心的输入缓冲区的复制内容。
    • 2. 发明申请
    • Performing A Local Reduction Operation On A Parallel Computer
    • 在并行计算机上执行局部缩减操作
    • US20110258245A1
    • 2011-10-20
    • US12760020
    • 2010-04-14
    • Michael A. BlocksomeDaniel A. Faraj
    • Michael A. BlocksomeDaniel A. Faraj
    • G06F15/76G06F15/16G06F9/02G06F12/06
    • G06F15/17387G06F15/17318
    • A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    • 并行计算机包括计算节点,每个包括两个减少处理核心,一个网络写入处理核心和一个网络读取处理核心,每个处理核心分配一个输入缓冲器。 通过缩小处理核心在交织块中将缩小处理核心的输入缓冲器的内容复制到共享存储器中的交错缓冲器; 通过一个还原处理核心将网络写处理核心的输入缓冲器的内容复制到共享存储器; 通过另一个还原处理核心将网络读处理核心的输入缓冲器的内容复制到共享存储器; 并通过还原处理核心并行减少:还原处理核心的输入缓冲器的内容; 交错缓冲器的每隔一个交错块; 复制内容的网络写入处理核心的输入缓冲区; 以及网络读取处理核心的输入缓冲区的复制内容。
    • 3. 发明授权
    • Performing a local reduction operation on a parallel computer
    • 在并行计算机上执行局部缩减操作
    • US08332460B2
    • 2012-12-11
    • US12760020
    • 2010-04-14
    • Michael A. BlocksomeDaniel A. Faraj
    • Michael A. BlocksomeDaniel A. Faraj
    • G06F15/76G06F15/16G06F9/02G06F12/06
    • G06F15/17387G06F15/17318
    • A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    • 并行计算机包括计算节点,每个包括两个减少处理核心,一个网络写入处理核心和一个网络读取处理核心,每个处理核心分配一个输入缓冲器。 通过缩小处理核心在交织块中将缩小处理核心的输入缓冲器的内容复制到共享存储器中的交错缓冲器; 通过一个还原处理核心将网络写处理核心的输入缓冲器的内容复制到共享存储器; 通过另一个还原处理核心将网络读处理核心的输入缓冲器的内容复制到共享存储器; 并通过还原处理核心并行减少:还原处理核心的输入缓冲器的内容; 交错缓冲器的每隔一个交错块; 复制内容的网络写入处理核心的输入缓冲区; 以及网络读取处理核心的输入缓冲区的复制内容。
    • 4. 发明授权
    • Performing a local reduction operation on a parallel computer
    • 在并行计算机上执行局部缩减操作
    • US08458244B2
    • 2013-06-04
    • US13585993
    • 2012-08-15
    • Michael A. BlocksomeDaniel A. Faraj
    • Michael A. BlocksomeDaniel A. Faraj
    • G06F15/76G06F15/16G06F9/02G06F12/00
    • G06F15/17387G06F15/17318
    • A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    • 并行计算机包括计算节点,每个包括两个减少处理核心,一个网络写入处理核心和一个网络读取处理核心,每个处理核心分配一个输入缓冲器。 通过缩小处理核心在交织块中将缩小处理核心的输入缓冲器的内容复制到共享存储器中的交错缓冲器; 通过一个还原处理核心将网络写处理核心的输入缓冲器的内容复制到共享存储器; 通过另一个还原处理核心将网络读处理核心的输入缓冲器的内容复制到共享存储器; 并通过还原处理核心并行减少:还原处理核心的输入缓冲器的内容; 交错缓冲器的每隔一个交错块; 复制内容的网络写入处理核心的输入缓冲区; 以及网络读取处理核心的输入缓冲区的复制内容。
    • 5. 发明授权
    • Data communications for a collective operation in a parallel active messaging interface of a parallel computer
    • 用于并行计算机的并行活动消息接口中的集体操作的数据通信
    • US08490112B2
    • 2013-07-16
    • US12959539
    • 2010-12-03
    • Daniel A. Faraj
    • Daniel A. Faraj
    • G06F9/46
    • G06F15/17318G06F9/54
    • Algorithm selection for data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
    • 并行计算机的并行主动消息接口(“PAMI”)中的数据通信的算法选择,由数据通信端点组成的PAMI,每个端点包括客户端的规范,上下文和任务,用于数据通信的端点 PAMI,包括将PAMI数据通信算法和位掩码相关联; 在PAMI的原点端点接收集体指令,指定数据通信消息从原点终端传输到目标端点的指令; 为接收到的集体指令构建位掩码; 从相关联的算法和位掩码中选择依赖于构造的位掩码的数据通信算法; 并且执行所述集体指令,根据所选择的数据通信算法从所述起始端点向所述目标端点发送所述数据通信消息。
    • 6. 发明申请
    • Runtime Optimization Of An Application Executing On A Parallel Computer
    • 并行计算机上执行的应用程序运行时优化
    • US20110258627A1
    • 2011-10-20
    • US12760111
    • 2010-04-14
    • Daniel A. FarajBrian E. Smith
    • Daniel A. FarajBrian E. Smith
    • G06F9/46
    • G06F9/52G06F9/546G06F11/3404
    • Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
    • 识别在并行计算机上执行的应用程序中的集体操作; 识别集体经营的召回地点; 确定集体经营是否是根本性的; 如果集体操作不是基于根的:建立调整会话并在调优会话中执行集体操作; 如果集体操作是基于根的,则确定执行应用的所有计算节点是否在同一呼叫站点识别集体操作; 如果所有计算节点在同一呼叫站点识别集体操作,建立调谐会话并在调谐会话中执行集体操作; 并且如果执行应用程序的所有计算节点没有识别在同一呼叫站点的集合操作,则在不建立调谐会话的情况下执行集体操作。
    • 7. 发明授权
    • Optimized data communications in a parallel computer
    • 并行计算机中优化的数据通信
    • US08812610B2
    • 2014-08-19
    • US13325440
    • 2011-12-14
    • Daniel A. Faraj
    • Daniel A. Faraj
    • G06F15/173G06F15/167
    • G06F15/1735G06F15/167G06F15/17387
    • A parallel computer includes nodes that include a network adapter that couples the node in a point-to-point network and supports communications in opposite directions of each dimension. Optimized communications include: receiving, by a network adapter of a receiving compute node, a packet—from a source direction—that specifies a destination node and deposit hints. Each hint is associated with a direction within which the packet is to be deposited. If a hint indicates the packet to be deposited in the opposite direction: the adapter delivers the packet to an application on the receiving node; forwards the packet to a next node in the opposite direction if the receiving node is not the destination; and forwards the packet to a node in a direction of a subsequent dimension if the hints indicate that the packet is to be deposited in the direction of the subsequent dimension.
    • 并行计算机包括节点,其包括网络适配器,该网络适配器将节点耦合在点对点网络中并且支持在每个维度的相反方向上的通信。 优化的通信包括:由接收计算节点的网络适配器接收来自源方向的分组 - 指定目的地节点和存储提示。 每个提示都与分组的存放方向相关联。 如果提示指示要以相反方向存放数据包:适配器将数据包传送到接收节点上的应用程序; 如果接收节点不是目的地,则将数据包转发到相反方向的下一个节点; 并且如果提示指示要在后续维度的方向上存储分组,则将分组转发到后续维度的方向上的节点。
    • 8. 发明申请
    • Determining Collective Barrier Operation Skew In A Parallel Computer
    • 确定并行计算机中的集体屏障操作偏差
    • US20130145378A1
    • 2013-06-06
    • US13308917
    • 2011-12-01
    • Daniel A. Faraj
    • Daniel A. Faraj
    • G06F9/46
    • G06F9/522
    • Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by: identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.
    • 确定包括组织成操作组中的多个计算节点的并行计算机中的集体屏障操作偏差包括:对于每个节点,直到每个节点被选择为延迟节点:选择节点之一作为延迟节点; 由延迟节点之外的每个节点进入集体屏障操作; 经延迟节点延迟进入集体屏障操作; 从集体屏障操作的根部接收出口信号; 并为延迟节点测量屏障完成时间。 通过以下方式计算屏障操作偏差:从计算节点的屏障完成时间确定最大屏障完成时间和最小屏障完成时间,并计算屏障操作偏差作为最大和最小屏障完成时间的差。
    • 9. 发明申请
    • Administering Connection Identifiers For Collective Operations In A Parallel Computer
    • 管理并行计算机中集体操作的连接标识符
    • US20120030370A1
    • 2012-02-02
    • US12847573
    • 2010-07-30
    • Daniel A. FarajBrian E. Smith
    • Daniel A. FarajBrian E. Smith
    • G06F15/16
    • G06F13/36G06F9/526
    • Administering connection identifiers for collective operations in a parallel computer, including prior to calling a collective operation, determining, by a first compute node of a communicator to receive an instruction to execute the collective operation, whether a value stored in a global connection identifier utilization buffer exceeds a predetermined threshold; if the value stored in the global ConnID utilization buffer does not exceed the predetermined threshold: calling the collective operation with a next available ConnID including retrieving, from an element of a ConnID buffer, the next available ConnID and locking the element of the ConnID buffer from access by other compute nodes; and if the value stored in the global ConnID utilization buffer exceeds the predetermined threshold: repeatedly determining whether the value stored in the global ConnID utilization buffer exceeds the predetermined threshold until the value stored in the global ConnID utilization buffer does not exceed the predetermined threshold.
    • 管理并行计算机中的集合操作的连接标识符,包括在调用集体操作之前,由通信器的第一计算节点确定接收执行集体操作的指令,是否存储在全局连接标识符利用缓冲器中的值 超过预定阈值; 如果存储在全局ConnID利用率缓冲区中的值不超过预定阈值:使用下一个可用的ConnID调用集合操作,包括从ConnID缓冲区的元素检索下一个可用的ConnID并将ConnID缓冲区的元素从 其他计算节点的访问; 并且如果存储在全局ConnID利用缓冲器中的值超过预定阈值:重复确定存储在全局ConnID利用缓冲器中的值是否超过预定阈值,直到存储在全局ConnID利用缓冲器中的值不超过预定阈值。
    • 10. 发明授权
    • Synchronizing compute node time bases in a parallel computer
    • 在并行计算机中同步计算节点时基
    • US08924763B2
    • 2014-12-30
    • US13327107
    • 2011-12-15
    • Dong ChenDaniel A. FarajThomas M. GoodingPhilip Heidelberger
    • Dong ChenDaniel A. FarajThomas M. GoodingPhilip Heidelberger
    • G06F1/12
    • G06F1/12H04L12/413
    • Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
    • 在并行计算机中同步时基,其包括为树网络中的数据通信而组织的计算节点,其中一个计算节点被指定为根,并且对于每个计算节点,计算从根到计算节点的数据传输等待时间; 将线程配置为脉冲服务员; 初始化唤醒单元; 并执行局部屏障操作; 在每个节点完成局部屏障操作时,由所有计算节点进入全局屏障操作; 在所有节点进入全局屏障操作之后,向所有计算节点发送脉冲信号; 并且对于每个计算节点在接收到脉冲信号时:由唤醒单元唤醒脉冲服务员; 为计算节点设置等于根节点和计算节点之间的数据传输延迟的时基; 并退出全球屏障操作。