专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20080127120A1 Method and apparatus for identifying instructions associated with execution events in a data space profiler 有权
标题翻译：用于识别与数据空间分析器中的执行事件相关联的指令的方法和装置
公开(公告)号：US20080127120A1
公开(公告)日：2008-05-29
申请号：US11590288
申请日：2006-10-31
申请人： Nicolai Kosche , Yukon L. Maruyama , Martin S. Itzkowitz
发明人： Nicolai Kosche , Yukon L. Maruyama , Martin S. Itzkowitz
IPC分类号： G06F9/44
CPC分类号： G06F11/3447 , G06F11/3409 , G06F11/3476 , G06F2201/835 , G06F2201/86 , G06F2201/865 , G06F2201/88
摘要： A system and method for profiling a software application may include means for capturing profiling information corresponding to an instruction identified as having executed coincident with the occurrence of a runtime event, and for associating the profiling information with the event in an event set. In some embodiments, the identified instruction, which may have triggered the event, may be located in the program code sequence at a predetermined position relative to the current program counter value at the time the event was detected. The predetermined relative position may be fixed dependent on the processor architecture and may also be dependent on the event type. The predetermined relative position may be zero, indicating that when the event was detected, the program counter value corresponded to an instruction associated with the event. If the identified instruction is an ambiguity-creating instruction, an indication of ambiguity may be associated with the event.
摘要翻译：用于对软件应用进行分析的系统和方法可以包括用于捕获与被识别为与运行时事件的发生一致地执行的指令相对应的分析信息的装置，并且用于将分析信息与事件集中的事件相关联。在一些实施例中，可以触发事件的所识别的指令可以在检测到事件时相对于当前程序计数器值的预定位置处于程序代码序列中。取决于处理器架构，预定的相对位置可以是固定的，并且还可以取决于事件类型。预定的相对位置可以为零，指示当检测到事件时，程序计数器值对应于与该事件相关联的指令。如果所识别的指令是歧义生成指令，则可能与事件相关联的歧义指示。

2. 发明授权

US06651245B1 System and method for insertion of prefetch instructions by a compiler 有权
标题翻译：由编译器插入预取指令的系统和方法
公开(公告)号：US06651245B1
公开(公告)日：2003-11-18
申请号：US09679433
申请日：2000-10-03
申请人： Peter C. Damron , Nicolai Kosche
发明人： Peter C. Damron , Nicolai Kosche
IPC分类号： G06F945
CPC分类号： G06F8/4442 , G06F8/445 , G06F12/0862 , G06F2212/6028
摘要： The present invention discloses a method and device for placing prefetch instruction in a low-level or assembly code instruction stream. It involves the use of a new concept called a martyr memory operation. When inserting prefetch instructions in a code stream, some instructions will still miss the cache because in some circumstances a prefetch cannot be added at all, or cannot be added early enough to allow the needed reference to be in cache before being referenced by an executing instruction. A subset of these instructions are identified using a new method and designated as martyr memory operations. Once identified, other memory operations that would also have been cache misses can “hide” behind the martyr memory operation and complete their prefetches while the processor, of necessity, waits for the martyr memory operation instruction to complete. This will increase the number of cache hits.
摘要翻译：本发明公开了一种用于将预取指令放置在低级或汇编代码指令流中的方法和装置。它涉及使用称为烈士记忆操作的新概念。当在代码流中插入预取指令时，一些指令仍将错过高速缓存，因为在某些情况下，根本无法添加预取，或者不能早期添加，以便在执行指令引用之前将所需的引用置于高速缓存中。这些指令的一个子集使用新的方法进行识别，并被指定为烈士记忆操作。一旦识别出，也可能是高速缓存未命中的其他内存操作可以“隐藏”在烈士内存操作之后并完成其预取，而处理器必须等待烈士内存操作指令完成。这将增加缓存命中数。

3. 发明授权

US06574713B1 Heuristic for identifying loads guaranteed to hit in processor cache 有权
标题翻译：启发式，用于识别保证在处理器缓存中命中的负载
公开(公告)号：US06574713B1
公开(公告)日：2003-06-03
申请号：US09685431
申请日：2000-10-10
申请人： Nicolai Kosche , Peter C. Damron
发明人： Nicolai Kosche , Peter C. Damron
IPC分类号： G06F1200
CPC分类号： G06F8/4442 , G06F12/0862 , G06F2212/6028
摘要： A heuristic algorithm which identifies loads guaranteed to hit the processor cache which further provides a “minimal” set of prefetches which are scheduled/inserted during compilation of a program is disclosed. The heuristic algorithm of the present invention utilizes the concept of a “cache line” (i.e., the data chunks received during memory operations) in conjunction with the concept of “related” memory operations for determining which prefetches are unnecessary for related memory operations; thus, generating a minimal number of prefetches for related memory operations.
摘要翻译：公开了一种启发式算法，其识别确保撞击处理器高速缓存的负载，其进一步提供在编程期间被调度/插入的“最小”预取集合。本发明的启发式算法结合“相关”存储器操作的概念，利用“高速缓存行”（即，存储器操作期间接收的数据块）的概念，用于确定哪些预取对于相关存储器操作是不必要的; 因此，为相关的存储器操作生成最少数量的预取。

4. 发明授权

US06564297B1 Compiler-based cache line optimization 有权
标题翻译：基于编译器的缓存行优化
公开(公告)号：US06564297B1
公开(公告)日：2003-05-13
申请号：US09594430
申请日：2000-06-15
申请人： Nicolai Kosche
发明人： Nicolai Kosche
IPC分类号： G06F1202
CPC分类号： G06F12/0802 , G06F12/0862 , G06F2212/6028
摘要： Cache line optimization involves computing where cache misses are in a control flow and assigning probabilities to cache misses. Cache lines may be scheduled based on the assigned probabilities and where the cache misses are in the control flow. Cache line probabilities may be calculated based on the relationship of the cache line and where the cache misses are in the control flow. A control flow may be pruned before calculating cache line probabilities. Function call sites may be used to prune the control flow. Address generation of a cache miss may be duplicated to speculatively hoist address generation and the associated prefetch. References may be selected for optimization, identifying cache lines, and mapping the selected references. Dependencies within the cache lines may be determined and the cache lines may be scheduled based on the determined dependencies and probabilities of usefulness. Instructions may be scheduled based on the scheduled cache lines and the target machine model to maximize outstanding memory transactions. Cache lines may be scheduled across call sites.
摘要翻译：高速缓存行优化包括计算缓存未命中在控制流中的位置，并将概率分配给高速缓存未命中。可以基于分配的概率并且高速缓存未命中的位置在控制流程中来调度高速缓存行。高速缓存行概率可以基于高速缓存行的关系和高速缓存未命中在控制流中的位置来计算。在计算高速缓存行概率之前，可以修剪控制流。函数调用站点可用于修剪控制流。高速缓存未命中的地址生成可能会被重复以推测地址生成和相关的预取。可以选择参考以进行优化，识别高速缓存行，以及映射所选择的引用。可以确定高速缓存行内的依赖性，并且可以基于确定的有用性的依赖性和概率来调度高速缓存行。可以基于预定的高速缓存行和目标机器模型来调度指令以最大化未完成的存储器事务。可以在呼叫站点之间调度缓存线。

5. 发明授权

US06421826B1 Method and apparatus for performing prefetching at the function level 有权
标题翻译：用于在功能级别执行预取的方法和装置
公开(公告)号：US06421826B1
公开(公告)日：2002-07-16
申请号：US09434715
申请日：1999-11-05
申请人： Nicolai Kosche , Peter C. Damron
发明人： Nicolai Kosche , Peter C. Damron
IPC分类号： G06F944
CPC分类号： G06F9/383
摘要： One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within regions of code that tend to generate cache misses. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system runs the executable code module in a training mode on a representative workload and keeps statistics on cache miss rates for functions within the executable code module. These statistics are used to identify a set of “hot” functions that generate a large number of cache misses. Next, explicit prefetch instructions are scheduled in advance of memory operations within the set of hot functions. In one embodiment, explicit prefetch operations are scheduled into the executable code module by activating prefetch generation at a start of an identified function, and by deactivating prefetch generation at a return from the identified function. In embodiment, the system further schedules prefetch operations for the memory operations by identifying a subset of memory operations of a particular type within the set of hot functions, and scheduling explicit prefetch operations for memory operations belonging to the subset.
摘要翻译：本发明的一个实施例提供了一种用于将源代码编译成可执行代码的系统，其对易于产生高速缓存未命中的代码区域内的存储器操作进行预取。该系统通过将包含编程语言指令的源代码模块编译成包含适合于处理器执行的指令的可执行代码模块来操作。接下来，系统以代表性工作量的训练模式运行可执行代码模块，并且保持对可执行代码模块内的功能的高速缓存未命中率的统计。这些统计信息用于识别一组产生大量高速缓存未命中的“热”功能。接下来，在热功能集合内的存储器操作之前安排显式预取指令。在一个实施例中，通过在识别的功能的开始处激活预取生成，并且通过在从所识别的功能返回时停用预取生成，将显式预取操作调度到可执行代码模块中。在实施例中，系统通过识别热功能集合内的特定类型的存储器操作的子集，并且对属于该子集的存储器操作调度显式预取操作来进一步调度存储器操作的预取操作。

6. 发明授权

US08166462B2 Method and apparatus for sorting and displaying costs in a data space profiler 有权
标题翻译：在数据空间分析器中分类和显示成本的方法和装置
公开(公告)号：US08166462B2
公开(公告)日：2012-04-24
申请号：US11516980
申请日：2006-09-07
申请人： Nicolai Kosche , Arpana Jayaswal , Martin S. Itzkowitz
发明人： Nicolai Kosche , Arpana Jayaswal , Martin S. Itzkowitz
IPC分类号： G06F9/44 , G06F9/45
CPC分类号： G06F11/328 , G06F11/3466 , G06F11/3471 , G06F2201/86
摘要： A data space profiler may include a graphical user interface (GUI) for sorting, aggregating and displaying profile data associated with runtime events of a profiled software application. This profile data may include costs associated with events as well as extended address elements and other code behavior attributes associated with them. The GUI may include means for selecting a perspective from which cost data is to be presented as well as presentation options for displaying the data. The presentation options may include panning and zooming options, which may determine how the data is sorted and/or aggregated for display. The GUI may also include means for specifying filter criteria, which may be used to determine which data to display. By providing means to alternate the display of profile data according to different perspectives and filtering criteria, the GUI may facilitate identification of performance bottlenecks of the profiled application and the causes thereof.
摘要翻译：数据空间分析器可以包括图形用户界面（GUI），用于对与分析软件应用程序的运行时事件相关联的简档数据进行排序，聚合和显示。该简档数据可以包括与事件相关联的成本以及与它们相关联的扩展地址元素和其他代码行为属性。 GUI可以包括用于选择要呈现成本数据的透视图以及用于显示数据的呈现选项的装置。演示选项可以包括平移和缩放选项，其可以确定如何对数据进行排序和/或聚合以便显示。 GUI还可以包括用于指定过滤器标准的装置，其可以用于确定要显示的数据。通过提供根据不同的观点和过滤标准交替显示简档数据的方法，GUI可以有助于识别分析应用程序的性能瓶颈及其原因。

7. 发明授权

US08136124B2 Method and apparatus for synthesizing hardware counters from performance sampling 有权
标题翻译：从性能采样中合成硬件计数器的方法和装置
公开(公告)号：US08136124B2
公开(公告)日：2012-03-13
申请号：US11624526
申请日：2007-01-18
申请人： Nicolai Kosche , Kenneth Tracton
发明人： Nicolai Kosche , Kenneth Tracton
IPC分类号： G06F9/44 , G06F11/00
CPC分类号： G06F11/3476 , G06F11/3447 , G06F2201/86 , G06F2201/88
摘要： A system and method for performance monitoring may use data collected from a hardware event agent comprising a hardware sampling mechanism and/or one or more hardware counters to increment one or more synthesized performance counters by an amount dependent on an expression involving the collected data. Each synthesized performance counter may be configured to count events of a different type and may comprise a machine addressable storage location. The event types may include various memory references or misses, branches, branch mispredictions, or any other event of interest in performance monitoring. The hardware event agent may comprise one or more instruction counters, cycle counters, timers, or other hardware performance counters. One hardware performance counter may be used in a time-multiplexed or data-multiplexed manner to monitor events of multiple event types. The hardware sampling mechanism may return a statistical packet for sampled instructions, which may be examined to determine the event type.
摘要翻译：用于性能监视的系统和方法可以使用从包括硬件采样机构和/或一个或多个硬件计数器的硬件事件代理收集的数据，以将依赖于涉及所收集的数据的表达式的量递增一个或多个合成性能计数器。每个合成性能计数器可以被配置为对不同类型的事件进行计数，并且可以包括机器可寻址存储位置。事件类型可能包括各种内存引用或错误，分支，分支错误预测或任何其他性能监控感兴趣的事件。硬件事件代理可以包括一个或多个指令计数器，周期计数器，定时器或其他硬件性能计数器。可以以时间复用或数据多路复用的方式使用一个硬件性能计数器来监视多种事件类型的事件。硬件采样机制可以返回用于采样指令的统计分组，这可以被检查以确定事件类型。

8. 发明申请

US20080109796A1 Method and Apparatus for Associating User-Specified Data with Events in a Data Space Profiler 有权
标题翻译：将用户指定数据与数据空间分析器中的事件相关联的方法和装置
公开(公告)号：US20080109796A1
公开(公告)日：2008-05-08
申请号：US11557874
申请日：2006-11-08
申请人： Nicolai Kosche
发明人： Nicolai Kosche
IPC分类号： G06F9/45
CPC分类号： G06F11/3612
摘要： A system and method for profiling a software application may include means for operating on context-specific data and costs. The system may include a descriptor apparatus for specifying identifiers of extended address elements to be profiled and locations for storing corresponding data values. In some embodiments, a list of variables to be included in profiling may be registered with an event agent and values of the variables may be captured in response to detection of a system event. Registering variables to be profiled may involve conveying a list of the variables or a pointer to such a list to the event agent. The event agent may associate the values of the registered variables with the detected system event and may store them in an event space database. The database may be accessed by a data space profiler to identify performance bottlenecks dependent on one or more registered variable values.
摘要翻译：用于分析软件应用程序的系统和方法可以包括用于对上下文特定的数据和成本进行操作的装置。该系统可以包括用于指定要被分类的扩展地址元素的标识符的描述符装置和用于存储相应数据值的位置。在一些实施例中，要包括在分析中的变量的列表可以被注册到事件代理，并且响应于系统事件的检测可以捕获变量的值。注册要分析的变量可能涉及将变量列表或指向此类列表的指针传递给事件代理。事件代理可以将注册变量的值与检测到的系统事件相关联，并将它们存储在事件空间数据库中。数据库可以被数据空间分析器访问，以识别取决于一个或多个已注册变量值的性能瓶颈。

9. 发明授权

US07137111B2 Aggressive prefetch of address chains 有权
标题翻译：积极预取地址链
公开(公告)号：US07137111B2
公开(公告)日：2006-11-14
申请号：US09996088
申请日：2001-11-28
申请人： Peter C. Damron , Nicolai Kosche
发明人： Peter C. Damron , Nicolai Kosche
IPC分类号： G06F9/44 , G06F9/30
CPC分类号： G06F9/30047 , G06F9/3842
摘要： Operations including inserted prefetch operations that correspond to addressing chains may be scheduled above memory access operations that are likely-to-miss, thereby exploiting latency of the “martyred” likely-to-miss operations and improving execution performance of resulting code. More generally, certain pre-executable counterparts of likely-to-stall operations that form dependency chains may be scheduled above operations that are themselves likely-to-stall.
摘要翻译：包括对应于寻址链的插入预取操作的操作可以被安排在可能丢失的存储器访问操作之上，从而利用“殉职”可能对错误操作的延迟并且提高结果代码的执行性能。更一般地，形成依赖关系链的可能到失速操作的某些预执行对应物可能被调度在本身可能失效的操作之上。

10. 发明授权

US07039910B2 Technique for associating execution characteristics with instructions or operations of program code 有权
公开(公告)号：US07039910B2
公开(公告)日：2006-05-02
申请号：US10050387
申请日：2002-01-16
申请人： Nicolai Kosche , Christopher P. Aoki , Peter C. Damron
发明人： Nicolai Kosche , Christopher P. Aoki , Peter C. Damron
IPC分类号： G06F9/45
CPC分类号： G06F11/3466 , G06F2201/86 , G06F2201/865 , G06F2201/885
摘要： By maintaining consistency of instruction or operation identification between code prepared for profiling and that prepared using profiling results, efficacy of profile-directed code optimizations can be improved. In particular, profile-directed optimizations based on stall statistics are facilitated in an environment in which correspondence maintained between (i) instructions or operations whose execution performance may be optimized (or which may provide an opportunity for optimization of other instructions or operations) and (ii) particular instructions or operations profiled.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式