专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

51. 发明申请

US20080172360A1 QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM 审中-公开
标题翻译：在数据库管理系统中查询数据和相关的本体
公开(公告)号：US20080172360A1
公开(公告)日：2008-07-17
申请号：US11623952
申请日：2007-01-17
申请人： Lipyeow Lim , Haixun Wang , Min Wang
发明人： Lipyeow Lim , Haixun Wang , Min Wang
IPC分类号： G06F7/06
CPC分类号： G06F16/24534 , G06F16/24564 , G06F16/8358
摘要： A method, apparatus, and computer program product for querying data in a database. An ontology is associated with the data in the database. A query containing a query predicate is received. The query predicate is expanded using implications from the ontology to form a modified query. The modified query is rewritten to include subsumption checking.
摘要翻译：一种用于查询数据库中的数据的方法，装置和计算机程序产品。本体与数据库中的数据相关联。收到包含查询谓词的查询。使用本体的含义扩展查询谓词以形成修改的查询。被修改的查询被重写以包括包含检查。

52. 发明申请

US20050114331A1 Near-neighbor search in pattern distance spaces 审中-公开
标题翻译：近距离搜索模式距离空间
公开(公告)号：US20050114331A1
公开(公告)日：2005-05-26
申请号：US10722776
申请日：2003-11-26
申请人： Haixun Wang , Philip Yu
发明人： Haixun Wang , Philip Yu
IPC分类号： G06F7/00 , G06F19/00 , G06K9/62
CPC分类号： G06K9/6232 , G06K9/6228 , G16B25/00 , G16B40/00
摘要： Similarity searching techniques are provided. In one aspect, a method for use in finding near-neighbors in a set of objects comprises the following steps. Subspace pattern similarities that the objects in the set exhibit in multi-dimensional spaces are identified. Subspace correlations are defined between two or more of the objects in the set based on the identified subspace pattern similarities for use in identifying near-neighbor objects. A pattern distance index may be created. A method of performing a near-neighbor search of one or more query objects against a set of objects is also provided.
摘要翻译：提供相似性搜索技术。在一个方面，一种用于在一组对象中找到近邻的方法包括以下步骤。确定集合中的对象在多维空间中显示的子空间模式相似性。基于用于识别近邻物体的所识别的子空间模式相似度，在集合中的两个或更多个对象之间定义子空间相关性。可以创建图案距离索引。还提供了针对一组对象执行对一个或多个查询对象的近邻搜索的方法。

53. 发明申请

US20050027710A1 Methods and apparatus for mining attribute associations 失效
标题翻译：挖掘属性关联的方法和装置
公开(公告)号：US20050027710A1
公开(公告)日：2005-02-03
申请号：US10630992
申请日：2003-07-30
申请人： Sheng Ma , Chang-shing Perng , Haixun Wang , Philip Yu
发明人： Sheng Ma , Chang-shing Perng , Haixun Wang , Philip Yu
IPC分类号： G06F7/00 , G06F17/18 , G06F17/30
CPC分类号： G06F17/18 , G06F17/30539 , Y10S707/99936 , Y10S707/99937 , Y10S707/99943
摘要： Attribute association discovery techniques that support relational-based data mining are disclosed. In one aspect of the invention, a technique for mining attribute associations in a relational data set comprises the following steps/operations. Multiple items are obtained from the relational data set. Then, attribute associations are discovered using: (i) multi-attribute mining templates formed from at least a portion of the multiple items; and (ii) one or more mining preferences specified by a user. The invention provides a novel architecture for the mining search space so as to exploit the inter-relationships among patterns of different templates. The framework is relational-sensitive and supports interactive and online mining.
摘要翻译：公开了支持基于关系的数据挖掘的属性关联发现技术。在本发明的一个方面，用于挖掘关系数据集中的属性关联的技术包括以下步骤/操作。从关系数据集获得多个项目。然后，使用以下方式发现属性关联：（i）由多个项目的至少一部分形成的多属性挖掘模板; 和（ii）用户指定的一个或多个挖掘偏好。本发明提供了一种用于挖掘搜索空间的新型架构，以便利用不同模板的模式之间的相互关系。该框架是关系敏感的，支持交互式和在线挖掘。

54. 发明授权

US06574623B1 Query transformation and simplification for group by queries with rollup/grouping sets in relational database management systems 有权
标题翻译：关系数据库管理系统中具有汇总/分组集合的查询的查询转换和简化
公开(公告)号：US06574623B1
公开(公告)日：2003-06-03
申请号：US09639021
申请日：2000-08-15
申请人： Ting Yu Leung , Haixun Wang
发明人： Ting Yu Leung , Haixun Wang
IPC分类号： G06F1730
CPC分类号： G06F17/30454 , G06F17/30463 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要： A method, apparatus, and article of manufacture for optimizing database queries, wherein the query is analyzed to determine whether the query includes at least one GROUP BY operation that computes at least one of the following: (1) a ROLLUP and (2) a GROUPING SET, and when it does, the query is rewritten to optimize one or more predicates that are applied after the GROUP BY operation. The query is also analyzed to determine whether the query includes at least one GROUP BY operation that computes two or more stacked GROUP BY operations, and when it does, the query is rewritten to collapse the stacked GROUP BY operations into a single GROUP BY operation.
摘要翻译：一种用于优化数据库查询的方法，装置和制品，其中分析所述查询以确定所述查询是否包括至少一个计算以下至少一个的GROUP BY操作：（1）ROLLUP和（2）a GROUPING SET，如果是，则重写该查询以优化在GROUP BY操作之后应用的一个或多个谓词。还会对查询进行分析，以确定查询是否包含至少一个计算两个或多个堆栈GROUP BY操作的GROUP BY操作，如果该查询被重写，则将堆叠的GROUP BY操作折叠为单个GROUP BY操作。

55. 发明授权

US08301584B2 System and method for adaptive pruning 失效
标题翻译：自适应修剪的系统和方法
公开(公告)号：US08301584B2
公开(公告)日：2012-10-30
申请号：US10737123
申请日：2003-12-16
申请人： Wei Fan , Haixun Wang , Philip S. Yu
发明人： Wei Fan , Haixun Wang , Philip S. Yu
IPC分类号： G06F7/00 , G06F3/00
CPC分类号： G06F17/30539 , G06F17/30598
摘要： Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence. As the level of confidence is raised, a sub-ensemble that has more models will be selected and as the level of confidence is lowered, a sub-ensemble that has fewer models will be selected. Finally, the invention applies the selected sub-ensemble, in place of the ensemble, to an example to make a prediction.
摘要翻译：公开了一种使用模型集合在数据库中搜索数据的方法和结构。首先，发明执行训练。这种训练按照预测精度的顺序对集合内的模型进行排序，并将不同数量的模型结合在一起形成子集合。这些模型以预测精度的顺序连接在子集合中。接下来在训练过程中，本发明计算每个子集合的置信度值。信心是衡量子系统的结果与合奏结果相符的结果。每个子集合的大小根据置信水平而变化，而相反，整体的大小是固定的。训练后，本发明可以进行预测。首先，本发明选择满足给定的置信水平的子集合。随着信心的提高，将选择具有更多模型的子集合，并且随着置信度的降低，将选择具有较少模型的子集合。最后，本发明将选择的子集合代替集合应用于一个例子进行预测。

56. 发明申请

US20110078187A1 SEMANTIC QUERY BY EXAMPLE 审中-公开
标题翻译：示例的语义查询
公开(公告)号：US20110078187A1
公开(公告)日：2011-03-31
申请号：US12566882
申请日：2009-09-25
申请人： Lipyeow Lim , Haixun Wang , Min Wang
发明人： Lipyeow Lim , Haixun Wang , Min Wang
IPC分类号： G06F17/30 , G06F17/27
CPC分类号： G06F17/30598 , G06F17/30404 , G06F17/30672 , G06F17/30734
摘要： A computer-implemented method, system, and computer program product for producing a semantic query by example are provided. The method includes receiving examples of potential results from querying a database table with an associated ontology, and extracting features from the database table and the examples based on the associated ontology. The method further includes training a classifier based on the examples and the extracted features, and applying the classifier to the database table to obtain a semantic query result. The method also includes outputting the semantic query result to a user interface, and requesting user feedback of satisfaction with the semantic query result. The method additionally includes updating the classifier and the semantic query result iteratively in response to the user feedback.
摘要翻译：提供了一种用于通过示例产生语义查询的计算机实现的方法，系统和计算机程序产品。该方法包括从相关联的本体查询数据库表并从数据库表中提取特征以及基于相关本体的示例来接收潜在结果的示例。该方法还包括基于示例和提取的特征来训练分类器，并将分类器应用于数据库表以获得语义查询结果。该方法还包括将语义查询结果输出到用户界面，并且请求用户对语义查询结果满意的反馈。该方法还包括响应于用户反馈迭代地更新分类器和语义查询结果。

57. 发明申请

US20100169252A1 SYSTEM AND METHOD FOR SCALABLE COST-SENSITIVE LEARNING 有权
标题翻译：可衡量敏感性学习的系统和方法
公开(公告)号：US20100169252A1
公开(公告)日：2010-07-01
申请号：US12690502
申请日：2010-01-20
申请人： Wei Fan , Haixun Wang , Philip S. Yu
发明人： Wei Fan , Haixun Wang , Philip S. Yu
IPC分类号： G06N3/12 , G06F15/18
CPC分类号： G06N99/005
摘要： A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
摘要翻译：一种用于处理实例的数据集的感应学习模型的方法（和结构），包括将示例的数据集划分成多个数据子集，并使用计算机上的处理器生成使用第一子集的示例的学习模型的多个数据子集的数据。为第一子集生成的学习模型包括用于整个数据集的演进聚合学习模型（集合模型）的初始阶段，从而为整个数据集提供演进的估计学习模型，如果所有子集被处理。使用来自子集的数据生成学习模型包括计算至少一个参数的值，所述参数提供对所述集合模型的当前阶段的充分性的客观指示。

58. 发明申请

US20090049035A1 System and method for indexing type-annotated web documents 审中-公开
标题翻译：用于索引类型注释的Web文档的系统和方法
公开(公告)号：US20090049035A1
公开(公告)日：2009-02-19
申请号：US11891921
申请日：2007-08-14
申请人： Hao He , Haixun Wang , Philip Shilung Yu
发明人： Hao He , Haixun Wang , Philip Shilung Yu
IPC分类号： G06F7/06 , G06F17/30
CPC分类号： G06F16/951
摘要： Methods and apparatus generate an index for use in a document retrieval system where the index is organized by type and keyword. Redundancy in the index is reduced by organizing type entries in a hierarchy of internal and leaf nodes. Determining whether to generate an inverted list for a type is based on the position of the type in the hierarchy; generally inverted lists are generated only for types corresponding to leaf nodes. Redundancy is further reduced by re-using inverted lists generated for keywords for types when there is an overlap between keywords and types. Search performance using the document retrieval index is improved by adding entries corresponding to combinations of keywords and types. The intersections of inverted lists associated with the keywords and types comprising the combinations are determined and added to the index for use in search operations. Determining whether to add an entry for a keyword-type combination is made on a cost-benefit analysis dependent, at least in part, on the proximity of the keyword to type in documents containing the combination.
摘要翻译：方法和设备生成用于文档检索系统的索引，其中索引按类型和关键字组织。通过在内部和叶节点的层次结构中组织类型条目来减少索引中的冗余。确定是否为类型生成反向列表是基于层次结构中类型的位置; 一般反转的列表仅针对对应于叶节点的类型生成。当关键字和类型之间存在重叠时，通过重新使用针对关键字生成的反向列表来进一步减少冗余。通过添加与关键字和类型的组合相对应的条目来提高使用文档检索索引的搜索性能。确定与包括组合的关键词和类型相关联的倒排列表的交集并将其添加到用于搜索操作的索引中。确定是否添加关键字类型组合的条目是根据成本效益分析进行的，至少部分是关键字的邻近度来键入包含该组合的文档。

59. 发明授权

US07475070B2 System and method for tree structure indexing that provides at least one constraint sequence to preserve query-equivalence between xml document structure match and subsequence match 失效
标题翻译：用于树结构索引的系统和方法，其提供至少一个约束序列以保持xml文档结构匹配和子序列匹配之间的查询等价
公开(公告)号：US07475070B2
公开(公告)日：2009-01-06
申请号：US11035889
申请日：2005-01-14
申请人： Wei Fan , Haixun Wang , Philip S. Yu
发明人： Wei Fan , Haixun Wang , Philip S. Yu
IPC分类号： G06F17/30 , G06F17/00
CPC分类号： G06F17/30935 , Y10S707/99933 , Y10S707/99936
摘要： Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addressed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures. For any given XML dataset, the principle finds an optimal sequencing strategy according to its schema and its data distribution; there is thus presented herein a novel method that realizes this principle.
摘要翻译：基于序列的XML索引旨在避免查询处理中的昂贵的联接操作。它将结构化XML数据转换为序列，以便可以通过子序列匹配整体回答结构化查询。这里，针对这种转换的查询等价问题，提出了一种用于排序树结构的性能导向原理。通过查询等价，可以通过子序列匹配执行XML查询，无需连接操作，后处理或其他特殊处理，例如虚假警报等问题。确定了一类用于此目的的测序方法，并提出了一种观察查询等价性的新颖的子序列匹配算法。还引入了一种以性能为导向的原则来指导树结构的排序。对于任何给定的XML数据集，该原理根据其模式及其数据分布找到最佳排序策略; 因此在此呈现了实现这一原理的新颖方法。

60. 发明授权

US07464068B2 System and method for continuous diagnosis of data streams 失效
标题翻译：用于连续诊断数据流的系统和方法
公开(公告)号：US07464068B2
公开(公告)日：2008-12-09
申请号：US10880913
申请日：2004-06-30
申请人： Wei Fan , Haixun Wang , Philip S. Yu
发明人： Wei Fan , Haixun Wang , Philip S. Yu
IPC分类号： G06F17/30
CPC分类号： G06F17/30017 , G06F2216/03 , Y10S707/99931 , Y10S707/99935
摘要： In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.
摘要翻译：与挖掘时间不断变化的数据流有关的一般框架，即从具有未标记实例的数据流或有限数量的标记实例中挖掘变更和重建模型。特别地，这里定义了扩展分类树的统计分析方法，以便在没有任何标记数据的情况下猜测数据流中漂移的百分比。可以通过主动抽取少量真实标签来估计精确误差。如果估计的误差明显高于经验期望值，则最好重新采样少量的真实标签，以从叶节点级别重建决策树。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式