会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • Method and apparatus for identifying instructions associated with execution events in a data space profiler
    • 用于识别与数据空间分析器中的执行事件相关联的指令的方法和装置
    • US20080127120A1
    • 2008-05-29
    • US11590288
    • 2006-10-31
    • Nicolai KoscheYukon L. MaruyamaMartin S. Itzkowitz
    • Nicolai KoscheYukon L. MaruyamaMartin S. Itzkowitz
    • G06F9/44
    • G06F11/3447G06F11/3409G06F11/3476G06F2201/835G06F2201/86G06F2201/865G06F2201/88
    • A system and method for profiling a software application may include means for capturing profiling information corresponding to an instruction identified as having executed coincident with the occurrence of a runtime event, and for associating the profiling information with the event in an event set. In some embodiments, the identified instruction, which may have triggered the event, may be located in the program code sequence at a predetermined position relative to the current program counter value at the time the event was detected. The predetermined relative position may be fixed dependent on the processor architecture and may also be dependent on the event type. The predetermined relative position may be zero, indicating that when the event was detected, the program counter value corresponded to an instruction associated with the event. If the identified instruction is an ambiguity-creating instruction, an indication of ambiguity may be associated with the event.
    • 用于对软件应用进行分析的系统和方法可以包括用于捕获与被识别为与运行时事件的发生一致地执行的指令相对应的分析信息的装置,并且用于将分析信息与事件集中的事件相关联。 在一些实施例中,可以触发事件的所识别的指令可以在检测到事件时相对于当前程序计数器值的预定位置处于程序代码序列中。 取决于处理器架构,预定的相对位置可以是固定的,并且还可以取决于事件类型。 预定的相对位置可以为零,指示当检测到事件时,程序计数器值对应于与该事件相关联的指令。 如果所识别的指令是歧义生成指令,则可能与事件相关联的歧义指示。
    • 2. 发明授权
    • System and method for insertion of prefetch instructions by a compiler
    • 由编译器插入预取指令的系统和方法
    • US06651245B1
    • 2003-11-18
    • US09679433
    • 2000-10-03
    • Peter C. DamronNicolai Kosche
    • Peter C. DamronNicolai Kosche
    • G06F945
    • G06F8/4442G06F8/445G06F12/0862G06F2212/6028
    • The present invention discloses a method and device for placing prefetch instruction in a low-level or assembly code instruction stream. It involves the use of a new concept called a martyr memory operation. When inserting prefetch instructions in a code stream, some instructions will still miss the cache because in some circumstances a prefetch cannot be added at all, or cannot be added early enough to allow the needed reference to be in cache before being referenced by an executing instruction. A subset of these instructions are identified using a new method and designated as martyr memory operations. Once identified, other memory operations that would also have been cache misses can “hide” behind the martyr memory operation and complete their prefetches while the processor, of necessity, waits for the martyr memory operation instruction to complete. This will increase the number of cache hits.
    • 本发明公开了一种用于将预取指令放置在低级或汇编代码指令流中的方法和装置。 它涉及使用称为烈士记忆操作的新概念。 当在代码流中插入预取指令时,一些指令仍将错过高速缓存,因为在某些情况下,根本无法添加预取,或者不能早期添加,以便在执行指令引用之前将所需的引用置于高速缓存中 。 这些指令的一个子集使用新的方法进行识别,并被指定为烈士记忆操作。 一旦识别出,也可能是高速缓存未命中的其他内存操作可以“隐藏”在烈士内存操作之后并完成其预取,而处理器必须等待烈士内存操作指令完成。 这将增加缓存命中数。
    • 4. 发明授权
    • Compiler-based cache line optimization
    • 基于编译器的缓存行优化
    • US06564297B1
    • 2003-05-13
    • US09594430
    • 2000-06-15
    • Nicolai Kosche
    • Nicolai Kosche
    • G06F1202
    • G06F12/0802G06F12/0862G06F2212/6028
    • Cache line optimization involves computing where cache misses are in a control flow and assigning probabilities to cache misses. Cache lines may be scheduled based on the assigned probabilities and where the cache misses are in the control flow. Cache line probabilities may be calculated based on the relationship of the cache line and where the cache misses are in the control flow. A control flow may be pruned before calculating cache line probabilities. Function call sites may be used to prune the control flow. Address generation of a cache miss may be duplicated to speculatively hoist address generation and the associated prefetch. References may be selected for optimization, identifying cache lines, and mapping the selected references. Dependencies within the cache lines may be determined and the cache lines may be scheduled based on the determined dependencies and probabilities of usefulness. Instructions may be scheduled based on the scheduled cache lines and the target machine model to maximize outstanding memory transactions. Cache lines may be scheduled across call sites.
    • 高速缓存行优化包括计算缓存未命中在控制流中的位置,并将概率分配给高速缓存未命中。 可以基于分配的概率并且高速缓存未命中的位置在控制流程中来调度高速缓存行。 高速缓存行概率可以基于高速缓存行的关系和高速缓存未命中在控制流中的位置来计算。 在计算高速缓存行概率之前,可以修剪控制流。 函数调用站点可用于修剪控制流。 高速缓存未命中的地址生成可能会被重复以推测地址生成和相关的预取。 可以选择参考以进行优化,识别高速缓存行,以及映射所选择的引用。 可以确定高速缓存行内的依赖性,并且可以基于确定的有用性的依赖性和概率来调度高速缓存行。 可以基于预定的高速缓存行和目标机器模型来调度指令以最大化未完成的存储器事务。 可以在呼叫站点之间调度缓存线。
    • 5. 发明授权
    • Method and apparatus for performing prefetching at the function level
    • 用于在功能级别执行预取的方法和装置
    • US06421826B1
    • 2002-07-16
    • US09434715
    • 1999-11-05
    • Nicolai KoschePeter C. Damron
    • Nicolai KoschePeter C. Damron
    • G06F944
    • G06F9/383
    • One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within regions of code that tend to generate cache misses. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system runs the executable code module in a training mode on a representative workload and keeps statistics on cache miss rates for functions within the executable code module. These statistics are used to identify a set of “hot” functions that generate a large number of cache misses. Next, explicit prefetch instructions are scheduled in advance of memory operations within the set of hot functions. In one embodiment, explicit prefetch operations are scheduled into the executable code module by activating prefetch generation at a start of an identified function, and by deactivating prefetch generation at a return from the identified function. In embodiment, the system further schedules prefetch operations for the memory operations by identifying a subset of memory operations of a particular type within the set of hot functions, and scheduling explicit prefetch operations for memory operations belonging to the subset.
    • 本发明的一个实施例提供了一种用于将源代码编译成可执行代码的系统,其对易于产生高速缓存未命中的代码区域内的存储器操作进行预取。 该系统通过将包含编程语言指令的源代码模块编译成包含适合于处理器执行的指令的可执行代码模块来操作。 接下来,系统以代表性工作量的训练模式运行可执行代码模块,并且保持对可执行代码模块内的功能的高速缓存未命中率的统计。 这些统计信息用于识别一组产生大量高速缓存未命中的“热”功能。 接下来,在热功能集合内的存储器操作之前安排显式预取指令。 在一个实施例中,通过在识别的功能的开始处激活预取生成,并且通过在从所识别的功能返回时停用预取生成,将显式预取操作调度到可执行代码模块中。 在实施例中,系统通过识别热功能集合内的特定类型的存储器操作的子集,并且对属于该子集的存​​储器操作调度显式预取操作来进一步调度存储器操作的预取操作。
    • 6. 发明授权
    • Method and apparatus for sorting and displaying costs in a data space profiler
    • 在数据空间分析器中分类和显示成本的方法和装置
    • US08166462B2
    • 2012-04-24
    • US11516980
    • 2006-09-07
    • Nicolai KoscheArpana JayaswalMartin S. Itzkowitz
    • Nicolai KoscheArpana JayaswalMartin S. Itzkowitz
    • G06F9/44G06F9/45
    • G06F11/328G06F11/3466G06F11/3471G06F2201/86
    • A data space profiler may include a graphical user interface (GUI) for sorting, aggregating and displaying profile data associated with runtime events of a profiled software application. This profile data may include costs associated with events as well as extended address elements and other code behavior attributes associated with them. The GUI may include means for selecting a perspective from which cost data is to be presented as well as presentation options for displaying the data. The presentation options may include panning and zooming options, which may determine how the data is sorted and/or aggregated for display. The GUI may also include means for specifying filter criteria, which may be used to determine which data to display. By providing means to alternate the display of profile data according to different perspectives and filtering criteria, the GUI may facilitate identification of performance bottlenecks of the profiled application and the causes thereof.
    • 数据空间分析器可以包括图形用户界面(GUI),用于对与分析软件应用程序的运行时事件相关联的简档数据进行排序,聚合和显示。 该简档数据可以包括与事件相关联的成本以及与它们相关联的扩展地址元素和其他代码行为属性。 GUI可以包括用于选择要呈现成本数据的透视图以及用于显示数据的呈现选项的装置。 演示选项可以包括平移和缩放选项,其可以确定如何对数据进行排序和/或聚合以便显示。 GUI还可以包括用于指定过滤器标准的装置,其可以用于确定要显示的数据。 通过提供根据不同的观点和过滤标准交替显示简档数据的方法,GUI可以有助于识别分析应用程序的性能瓶颈及其原因。
    • 7. 发明授权
    • Method and apparatus for synthesizing hardware counters from performance sampling
    • 从性能采样中合成硬件计数器的方法和装置
    • US08136124B2
    • 2012-03-13
    • US11624526
    • 2007-01-18
    • Nicolai KoscheKenneth Tracton
    • Nicolai KoscheKenneth Tracton
    • G06F9/44G06F11/00
    • G06F11/3476G06F11/3447G06F2201/86G06F2201/88
    • A system and method for performance monitoring may use data collected from a hardware event agent comprising a hardware sampling mechanism and/or one or more hardware counters to increment one or more synthesized performance counters by an amount dependent on an expression involving the collected data. Each synthesized performance counter may be configured to count events of a different type and may comprise a machine addressable storage location. The event types may include various memory references or misses, branches, branch mispredictions, or any other event of interest in performance monitoring. The hardware event agent may comprise one or more instruction counters, cycle counters, timers, or other hardware performance counters. One hardware performance counter may be used in a time-multiplexed or data-multiplexed manner to monitor events of multiple event types. The hardware sampling mechanism may return a statistical packet for sampled instructions, which may be examined to determine the event type.
    • 用于性能监视的系统和方法可以使用从包括硬件采样机构和/或一个或多个硬件计数器的硬件事件代理收集的数据,以将依赖于涉及所收集的数据的表达式的量递增一个或多个合成性能计数器。 每个合成性能计数器可以被配置为对不同类型的事件进行计数,并且可以包括机器可寻址存储位置。 事件类型可能包括各种内存引用或错误,分支,分支错误预测或任何其他性能监控感兴趣的事件。 硬件事件代理可以包括一个或多个指令计数器,周期计数器,定时器或其他硬件性能计数器。 可以以时间复用或数据多路复用的方式使用一个硬件性能计数器来监视多种事件类型的事件。 硬件采样机制可以返回用于采样指令的统计分组,这可以被检查以确定事件类型。
    • 8. 发明申请
    • Method and Apparatus for Associating User-Specified Data with Events in a Data Space Profiler
    • 将用户指定数据与数据空间分析器中的事件相关联的方法和装置
    • US20080109796A1
    • 2008-05-08
    • US11557874
    • 2006-11-08
    • Nicolai Kosche
    • Nicolai Kosche
    • G06F9/45
    • G06F11/3612
    • A system and method for profiling a software application may include means for operating on context-specific data and costs. The system may include a descriptor apparatus for specifying identifiers of extended address elements to be profiled and locations for storing corresponding data values. In some embodiments, a list of variables to be included in profiling may be registered with an event agent and values of the variables may be captured in response to detection of a system event. Registering variables to be profiled may involve conveying a list of the variables or a pointer to such a list to the event agent. The event agent may associate the values of the registered variables with the detected system event and may store them in an event space database. The database may be accessed by a data space profiler to identify performance bottlenecks dependent on one or more registered variable values.
    • 用于分析软件应用程序的系统和方法可以包括用于对上下文特定的数据和成本进行操作的装置。 该系统可以包括用于指定要被分类的扩展地址元素的标识符的描述符装置和用于存储相应数据值的位置。 在一些实施例中,要包括在分析中的变量的列表可以被注册到事件代理,并且响应于系统事件的检测可以捕获变量的值。 注册要分析的变量可能涉及将变量列表或指向此类列表的指针传递给事件代理。 事件代理可以将注册变量的值与检测到的系统事件相关联,并将它们存储在事件空间数据库中。 数据库可以被数据空间分析器访问,以识别取决于一个或多个已注册变量值的性能瓶颈。