会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • System and method for adaptive pruning
    • 自适应修剪的系统和方法
    • US08301584B2
    • 2012-10-30
    • US10737123
    • 2003-12-16
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06F7/00G06F3/00
    • G06F17/30539G06F17/30598
    • Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence. As the level of confidence is raised, a sub-ensemble that has more models will be selected and as the level of confidence is lowered, a sub-ensemble that has fewer models will be selected. Finally, the invention applies the selected sub-ensemble, in place of the ensemble, to an example to make a prediction.
    • 公开了一种使用模型集合在数据库中搜索数据的方法和结构。 首先,发明执行训练。 这种训练按照预测精度的顺序对集合内的模型进行排序,并将不同数量的模型结合在一起形成子集合。 这些模型以预测精度的顺序连接在子集合中。 接下来在训练过程中,本发明计算每个子集合的置信度值。 信心是衡量子系统的结果与合奏结果相符的结果。 每个子集合的大小根据置信水平而变化,而相反,整体的大小是固定的。 训练后,本发明可以进行预测。 首先,本发明选择满足给定的置信水平的子集合。 随着信心的提高,将选择具有更多模型的子集合,并且随着置信度的降低,将选择具有较少模型的子集合。 最后,本发明将选择的子集合代替集合应用于一个例子进行预测。
    • 2. 发明申请
    • System and Method for Classifying Data Streams with Very Large Cardinality
    • 用于分类具有非常大的基数的数据流的系统和方法
    • US20120166382A1
    • 2012-06-28
    • US13400863
    • 2012-02-21
    • Charu C. AggarwalPhilip S. Yu
    • Charu C. AggarwalPhilip S. Yu
    • G06N5/02
    • G06N99/005G06K9/6267
    • An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.
    • 识别描述该对象的对象和属性。 这些属性被分组成属性模式,并且识别分类类。 对于每个识别的类,创建包含多个并行哈希表的草图表。 对于要分类的对象,使用每个草图表的所有散列函数处理每个属性模式,从而在单个属性模式的每个草图表下产生多个值。 为每个草图表选择最低值。 对每个属性模式评估所有草图表中的值的分布,为每个属性模式产生歧视性的权力。 选择具有高于给定阈值的辨别力的属性模式并将其添加到关联的草图表值。 识别具有最大总和的草图表,并将关联的类分配给属于属性模式的对象。
    • 4. 发明授权
    • Systems and methods for condensation-based privacy in strings
    • 字符串中基于冷凝的隐私的系统和方法
    • US08010541B2
    • 2011-08-30
    • US11540406
    • 2006-09-30
    • Charu C. AggarwalPhilip S. Yu
    • Charu C. AggarwalPhilip S. Yu
    • G06F17/30
    • G06F21/6245
    • Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
    • 使用简单的基于模板的模型,用于隐私保护字符串数据挖掘的新方法和系统。 这种基于模板的模型在实践中是有效的,并且保持字符串的重要统计特征,例如记录内距离。 这里讨论的是字符串数据的匿名化的缩合模型。 针对字符串组创建摘要统计信息,并使用这些统计信息来生成伪字符串。 可以看出,一组新的字符串的聚合行为保持关键特征,例如组合,字符串间距离的顺序以及诸如分类的数据挖掘算法的准确性。 字符串间距离的保留是许多字符串和生物应用中的关键目标,这些应用程序深深地依赖于这种距离的计算,而可以显示诸如分类的应用的准确性不受匿名过程的影响。
    • 7. 发明申请
    • SYSTEM AND METHOD FOR SCALABLE COST-SENSITIVE LEARNING
    • 可衡量敏感性学习的系统和方法
    • US20100169252A1
    • 2010-07-01
    • US12690502
    • 2010-01-20
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06N3/12G06F15/18
    • G06N99/005
    • A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
    • 一种用于处理实例的数据集的感应学习模型的方法(和结构),包括将示例的数据集划分成多个数据子集,并使用计算机上的处理器生成使用第一子集的示例的学习模型 的多个数据子集的数据。 为第一子集生成的学习模型包括用于整个数据集的演进聚合学习模型(集合模型)的初始阶段,从而为整个数据集提供演进的估计学习模型,如果所有子集 被处理。 使用来自子集的数据生成学习模型包括计算至少一个参数的值,所述参数提供对所述集合模型的当前阶段的充分性的客观指示。
    • 8. 发明授权
    • Model-based self-optimizing distributed information management
    • 基于模型的自优化分布式信息管理
    • US07720841B2
    • 2010-05-18
    • US11538525
    • 2006-10-04
    • Xiaohui GuPhilip S. YuShu-Ping Chang
    • Xiaohui GuPhilip S. YuShu-Ping Chang
    • G06F13/14
    • G06F17/30575G06F11/3447G06F11/3495G06F17/30463G06F2201/81Y10S707/966
    • Disclosed are a method, information processing system, and computer readable medium for managing data collection in a distributed processing system. The method includes dynamically collecting at least one statistical query pattern associated with a selected group of information processing nodes. The statistical query pattern is dynamically collected from a plurality of information processing nodes in a distributed processing system. At least one operating attribute distribution associated with an operating attribute that has been queried for the selected group is dynamically monitored. The selected group is dynamically configured, based on the query pattern and the operating attribute distribution, to periodically push a set of attributes associated with the each information processing node in the selected group.
    • 公开了一种用于管理分布式处理系统中的数据收集的方法,信息处理系统和计算机可读介质。 该方法包括动态地收集与所选择的一组信息处理节点相关联的至少一个统计查询模式。 统计查询模式是从分布式处理系统中的多个信息处理节点动态收集的。 动态地监视与被选择组查询的操作属性相关联的至少一个操作属性分布。 基于查询模式和操作属性分布动态地配置所选择的组,以周期性地推送与所选择的组中的每个信息处理节点相关联的一组属性。
    • 9. 发明申请
    • RESOURCE ADAPTIVE SPECTRUM ESTIMATION OF STREAMING DATA
    • 资源自适应频谱估计数据流
    • US20090074043A1
    • 2009-03-19
    • US12177300
    • 2008-07-22
    • Deepak Srinivac TuragaMichail VlachosPhilip S. Yu
    • Deepak Srinivac TuragaMichail VlachosPhilip S. Yu
    • H04B17/00
    • G06F17/141
    • Streaming environments typically dictate incomplete or approximate algorithm execution, in order to cope with sudden surges in the data rate. Such limitations are even more accentuated in mobile environments (such as sensor networks) where computational and memory resources are typically limited. Introduced herein is a novel “resource adaptive” algorithm for spectrum and periodicity estimation on a continuous stream of data. The formulation is based on the derivation of a closed-form incremental computation of the spectrum, augmented by an intelligent load-shedding scheme that can adapt to available CPU resources. Experimentation indicates that the proposed technique can be a viable and resource efficient solution for real-time spectrum estimation.
    • 流环境通常会指示不完整或近似算法执行,以应对数据速率的突然增加。 在计算和存储资源通常受限制的移动环境(如传感器网络)中,这种限制更加突出。 这里介绍的是一种用于连续数据流的频谱和周期估计的新型“资源自适应”算法。 该公式基于频谱的闭合增量计算的推导,通过可以适应可用CPU资源的智能加载开放方案来增强。 实验表明,提出的技术可以成为实时频谱估计的可行且资源有效的解决方案。