会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • System and method for load shedding in data mining and knowledge discovery from stream data
    • 数据挖掘中的负载脱落和流数据的知识发现的系统和方法
    • US08060461B2
    • 2011-11-15
    • US12372568
    • 2009-02-17
    • Yun ChiHaixun WangPhilip S. Yu
    • Yun ChiHaixun WangPhilip S. Yu
    • G06F7/00G06F17/00
    • G06K9/6297H04L43/028
    • Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
    • 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。
    • 2. 发明授权
    • System and method for load shedding in data mining and knowledge discovery from stream data
    • 数据挖掘中的负载脱落和流数据的知识发现的系统和方法
    • US07493346B2
    • 2009-02-17
    • US11058944
    • 2005-02-16
    • Yun ChiHaixun WangPhilip S. Yu
    • Yun ChiHaixun WangPhilip S. Yu
    • G06F12/00G06F17/30G06F9/46
    • G06K9/6297H04L43/028
    • Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
    • 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。
    • 3. 发明授权
    • Systems and methods for maintaining closed frequent itemsets over a data stream sliding window
    • 在数据流滑动窗口上维护关闭频繁项目集的系统和方法
    • US07496592B2
    • 2009-02-24
    • US11046926
    • 2005-01-31
    • Yun ChiHaixun WangPhilip S. Yu
    • Yun ChiHaixun WangPhilip S. Yu
    • G06F17/00
    • G06F17/30516G06F17/30539Y10S707/99943
    • Towards mining closed frequent itemsets over a sliding window using limited memory space, a synopsis data structure to monitor transactions in the sliding window so that one can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets, but monitoring only frequent itemsets makes it difficult to detect new itemsets when they become frequent. Herein, there is introduced a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding-window. The selected itemsets include a boundary between closed frequent itemsets and the rest of the itemsets Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET.
    • 通过使用有限的存储空间的滑动窗口挖掘封闭的频繁项集,用于监视滑动窗口中的事务的概要数据结构,以便可以随时输出当前关闭的频繁项集。 由于时间和内存限制,概要数据结构不能监视所有可能的项集,而只监视频繁项集,使得当它们变得频繁时很难检测新的项集。 在这里,引入了一种紧凑的数据结构,封闭的枚举树(CET),以便在滑动窗口上维护动态选择的一组项集。 所选择的项目集包括封闭频繁项集和其余项目集之间的边界由于边界相对稳定,在滑动窗口中挖掘封闭频繁项集的成本大大降低到可能导致边界移动的采矿交易的成本 CET。
    • 4. 发明申请
    • SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA
    • 用于数据挖掘中的负载分解和来自流数据的知识发现的系统和方法
    • US20090187914A1
    • 2009-07-23
    • US12372568
    • 2009-02-17
    • Yun ChiHaixun WangPhilip S. Yu
    • Yun ChiHaixun WangPhilip S. Yu
    • G06F9/46G06N5/02
    • G06K9/6297H04L43/028
    • Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
    • 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。
    • 5. 发明授权
    • System and method for scalable cost-sensitive learning
    • 可扩展成本敏感学习的系统和方法
    • US07904397B2
    • 2011-03-08
    • US12690502
    • 2010-01-20
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06F15/18G06N3/00G06N3/12
    • G06N99/005
    • A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
    • 一种用于处理实例的数据集的感应学习模型的方法(和结构),包括将示例的数据集划分成多个数据子集,并使用计算机上的处理器生成使用第一子集的示例的学习模型 的多个数据子集的数据。 为第一子集生成的学习模型包括用于整个数据集的演进聚合学习模型(集合模型)的初始阶段,从而为整个数据集提供演进的估计学习模型,如果所有子集 被处理。 使用来自子集的数据生成学习模型包括计算至少一个参数的值,所述参数提供对所述集合模型的当前阶段的充分性的客观指示。
    • 6. 发明授权
    • System and method for ranked keyword search on graphs
    • 在图表上排名关键词搜索的系统和方法
    • US07702620B2
    • 2010-04-20
    • US11693471
    • 2007-03-29
    • Hao HePhilip S. YuHaixun Wang
    • Hao HePhilip S. YuHaixun Wang
    • G06F17/30
    • G06F17/30625G06F17/30616G06F17/30675Y10S707/99933
    • Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.
    • 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。
    • 7. 发明授权
    • System and method for sequence-based subspace pattern clustering
    • 基于序列的子空间模式聚类的系统和方法
    • US07565346B2
    • 2009-07-21
    • US10858541
    • 2004-05-31
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06F17/30
    • G06K9/6215Y10S707/99936
    • Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. There is presented herein a novel method that offers this capability.
    • 与传统的集群方法不同,传统的集群方法集中在对一组维度上具有类似值的对象进行分组,通过模式相似性进行聚类可以找到在子空间中呈现一致的上升和下降模式的对象。 基于模式的群集扩展了传统群集的概念,受益于广泛的应用,包括电子商务目标营销,生物信息学(大规模科学数据分析)和自动计算(Web使用分析)等。然而,状态 基于图案的聚类方法(例如,pCluster算法)只能处理数千条记录的数据集,这使得它们不适合许多现实生活中的应用。 此外,除了巨大的数据量之外,许多数据集的特征还在于它们的顺序性,例如,客户购买记录和网络事件日志通常被建模为数据序列。 因此,重要的是启用基于图案的聚类方法i)处理大数据集,以及ii)发现嵌入在数据序列中的模式相似性。 这里提供了一种提供这种能力的新颖方法。
    • 8. 发明申请
    • SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS
    • 排序关键字搜索的系统和方法
    • US20080243811A1
    • 2008-10-02
    • US11693471
    • 2007-03-29
    • Hao HePhilip S. YuHaixun Wang
    • Hao HePhilip S. YuHaixun Wang
    • G06F17/30
    • G06F17/30625G06F17/30616G06F17/30675Y10S707/99933
    • Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.
    • 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。
    • 9. 发明授权
    • System and method for mining time-changing data streams
    • 挖掘时变数据流的系统和方法
    • US07565369B2
    • 2009-07-21
    • US10857030
    • 2004-05-28
    • Wei FanHaixun WangPhilip S. Yu
    • Wei FanHaixun WangPhilip S. Yu
    • G06F17/30
    • G06N99/005Y10S707/99943Y10S707/99945
    • A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
    • 采用加权综合分类器挖掘概念漂移数据流的一般框架。 分类模型的集合,例如C4.5,RIPPER,朴素贝叶斯等,是从数据流的连续块中训练出来的。 根据其在时间不断变化的环境下的测试数据的预期分类精度,合理地加权集合中的分类器。 因此,综合方法提高了学习模型的效率和执行分类的准确性。 实证研究表明,所提出的方法在预测精度方面具有优于单分类器方法的优势,并且整体框架对于各种分类模型是有效的。
    • 10. 发明申请
    • Query integrity assurance in database outsourcing
    • 查询数据库外包的完整性保证
    • US20080183656A1
    • 2008-07-31
    • US11626847
    • 2007-01-25
    • Chang-Shing PerngHaixun WangJian YinPhilip S. Yu
    • Chang-Shing PerngHaixun WangJian YinPhilip S. Yu
    • G06F17/30
    • G06F21/64G06F17/30286G06F21/6245G06F2221/2115
    • A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.
    • 一种用于确认从数据存储返回的数据的有效性的方法,系统和计算机程序产品。 数据存储包含使用第一加密加密的主数据集和使用第二加密的辅数据集。 辅助数据集是主数据集的子集。 客户端对数据存储器发出实质性查询以检索属于主数据集的主数据结果。 查询界面对数据存储区发出至少一个验证查询。 每个验证查询返回属于辅助数据集的辅助数据结果。 如果满足辅助数据结果的未加密形式的实质性查询的数据未包含在主数据结果的未加密形式中,则查询接口接收辅助数据结果并提供数据无效通知。