会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • System and method for load shedding in data mining and knowledge discovery from stream data
    • 数据挖掘中的负载脱落和流数据的知识发现的系统和方法
    • US08060461B2
    • 2011-11-15
    • US12372568
    • 2009-02-17
    • Yun ChiHaixun WangPhilip S. Yu
    • Yun ChiHaixun WangPhilip S. Yu
    • G06F7/00G06F17/00
    • G06K9/6297H04L43/028
    • Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
    • 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。
    • 2. 发明授权
    • System and method for scalable cost-sensitive learning
    • 可扩展成本敏感学习的系统和方法
    • US07904397B2
    • 2011-03-08
    • US12690502
    • 2010-01-20
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06F15/18G06N3/00G06N3/12
    • G06N99/005
    • A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
    • 一种用于处理实例的数据集的感应学习模型的方法(和结构),包括将示例的数据集划分成多个数据子集,并使用计算机上的处理器生成使用第一子集的示例的学习模型 的多个数据子集的数据。 为第一子集生成的学习模型包括用于整个数据集的演进聚合学习模型(集合模型)的初始阶段,从而为整个数据集提供演进的估计学习模型,如果所有子集 被处理。 使用来自子集的数据生成学习模型包括计算至少一个参数的值,所述参数提供对所述集合模型的当前阶段的充分性的客观指示。
    • 4. 发明授权
    • System and method for ranked keyword search on graphs
    • 在图表上排名关键词搜索的系统和方法
    • US07702620B2
    • 2010-04-20
    • US11693471
    • 2007-03-29
    • Hao HePhilip S. YuHaixun Wang
    • Hao HePhilip S. YuHaixun Wang
    • G06F17/30
    • G06F17/30625G06F17/30616G06F17/30675Y10S707/99933
    • Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.
    • 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。
    • 5. 发明授权
    • System and method for sequence-based subspace pattern clustering
    • 基于序列的子空间模式聚类的系统和方法
    • US07565346B2
    • 2009-07-21
    • US10858541
    • 2004-05-31
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06F17/30
    • G06K9/6215Y10S707/99936
    • Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. There is presented herein a novel method that offers this capability.
    • 与传统的集群方法不同,传统的集群方法集中在对一组维度上具有类似值的对象进行分组,通过模式相似性进行聚类可以找到在子空间中呈现一致的上升和下降模式的对象。 基于模式的群集扩展了传统群集的概念,受益于广泛的应用,包括电子商务目标营销,生物信息学(大规模科学数据分析)和自动计算(Web使用分析)等。然而,状态 基于图案的聚类方法(例如,pCluster算法)只能处理数千条记录的数据集,这使得它们不适合许多现实生活中的应用。 此外,除了巨大的数据量之外,许多数据集的特征还在于它们的顺序性,例如,客户购买记录和网络事件日志通常被建模为数据序列。 因此,重要的是启用基于图案的聚类方法i)处理大数据集,以及ii)发现嵌入在数据序列中的模式相似性。 这里提供了一种提供这种能力的新颖方法。
    • 7. 发明申请
    • SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS
    • 排序关键字搜索的系统和方法
    • US20080243811A1
    • 2008-10-02
    • US11693471
    • 2007-03-29
    • Hao HePhilip S. YuHaixun Wang
    • Hao HePhilip S. YuHaixun Wang
    • G06F17/30
    • G06F17/30625G06F17/30616G06F17/30675Y10S707/99933
    • Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.
    • 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。
    • 8. 发明申请
    • METHOD AND SYSTEM FOR INDEXING AND SERIALIZING DATA
    • 数据索引和序列化的方法和系统
    • US20080215520A1
    • 2008-09-04
    • US11681486
    • 2007-03-02
    • Xiaohui GuLipyeow LimHaixun WangMin Wang
    • Xiaohui GuLipyeow LimHaixun WangMin Wang
    • G06N5/02
    • G06F17/30911
    • The present invention provides a computer implemented method, an apparatus, and a computer usable program product for indexing data. A controller identifies a set of data to be indexed, wherein a set of data structure trees represents the set of data. The controller merges the set of data structure trees to form a unified tree, wherein the unified tree contains a node for each unit of data in the set of data. The controller assigns an identifier to the node for each unit of data in the set of data that describes the node within the unified tree. The controller then serializes the unified tree to form a set of sequential series that represents the set of data structure trees, wherein the set of sequential series forms an index for the set of data.
    • 本发明提供了一种用于索引数据的计算机实现的方法,装置和计算机可用程序产品。 控制器识别要索引的一组数据,其中一组数据结构树表示该组数据。 控制器将数据结构树组合成一个统一的树,其中统一树包含一组数据中每个数据单元的节点。 控制器为描述统一树中节点的数据集中的每个数据单元向节点分配一个标识符。 然后,控制器对统一树进行序列化以形成一组代表数据结构树的顺序序列,其中,该顺序序列集合形成该组数据的索引。