专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

71. 发明申请

US20050131873A1 System and method for adaptive pruning 失效
标题翻译：自适应修剪的系统和方法
公开(公告)号：US20050131873A1
公开(公告)日：2005-06-16
申请号：US10737123
申请日：2003-12-16
申请人： Wei Fan , Haixun Wang , Philip Yu
发明人： Wei Fan , Haixun Wang , Philip Yu
IPC分类号： G06F17/30
CPC分类号： G06F17/30539 , G06F17/30598
摘要： Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence. As the level of confidence is raised, a sub-ensemble that has more models will be selected and as the level of confidence is lowered, a sub-ensemble that has fewer models will be selected. Finally, the invention applies the selected sub-ensemble, in place of the ensemble, to an example to make a prediction.
摘要翻译：公开了一种使用模型集合在数据库中搜索数据的方法和结构。首先，发明执行训练。这种训练按照预测精度的顺序对集合内的模型进行排序，并将不同数量的模型结合在一起形成子集合。这些模型以预测精度的顺序连接在子集合中。接下来在训练过程中，本发明计算每个子集合的置信度值。信心是衡量子系统的结果与合奏结果相符的结果。每个子集合的大小根据置信水平而变化，而相反，整体的大小是固定的。训练后，本发明可以进行预测。首先，本发明选择满足给定的置信水平的子集合。随着信心的提高，将选择具有更多模型的子集合，并且随着置信度的降低，将选择具有较少模型的子集合。最后，本发明将选择的子集合代替集合应用于一个例子进行预测。

72. 发明申请

US20050114314A1 Index structure for supporting structural XML queries 失效
标题翻译：用于支持结构XML查询的索引结构
公开(公告)号：US20050114314A1
公开(公告)日：2005-05-26
申请号：US10723206
申请日：2003-11-26
申请人： Wei Fan , Haixun Wang , Philip Yu
发明人： Wei Fan , Haixun Wang , Philip Yu
IPC分类号： G06F17/30
CPC分类号： G06F17/30911 , Y10S707/99933 , Y10S707/99943
摘要： The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
摘要翻译：本发明提供了一种ViST（或“虚拟后缀树”），其是用于搜索XML文档的新型索引结构。通过在结构编码序列中同时表示XML文档和XML查询，显示查询XML数据等同于查找（非连续）子序列匹配。各种XML查询（包括具有分支的查询）或通配符（'*'和'//'）可以由结构编码的序列表示。不同于将查询反汇编成多个子查询的索引方法，然后加入这些子查询的结果以提供最终答案，ViST使用树结构作为查询的基本单位，以避免昂贵的连接操作。此外，ViST为XML文档的内容和结构提供了一个统一的索引，因此与仅通过内容或结构索引方法相比，它具有性能优势。 ViST支持动态索引更新，它仅仅依赖于B＆lt; +＆gt;树，而不使用通用数据库管理系统（以下简称“DBMS”）不能很好支持的任何专门的数据结构。

73. 发明申请

US20050114298A1 System and method for indexing weighted-sequences in large databases 有权
标题翻译：用于索引大数据库中加权序列的系统和方法
公开(公告)号：US20050114298A1
公开(公告)日：2005-05-26
申请号：US10723229
申请日：2003-11-26
申请人： Wei Fan , Chang-Shing Perng , Haixun Wang , Philip Yu
发明人： Wei Fan , Chang-Shing Perng , Haixun Wang , Philip Yu
IPC分类号： G06F17/30
CPC分类号： G06F17/30327 , G06F17/30548 , Y10S707/99943
摘要： The present invention provides an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure in which each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence because each event is associated with a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed herein enables the efficient retrieval from the database of all subsequences (contiguous and non-contiguous) that match a given query sequence both by events and by weights. The index structure also takes into consideration the nonuniform frequency distribution of events in the sequence data.
摘要翻译：本发明提供了一种用于在大数据库中管理加权序列的索引结构。加权序列被定义为二维结构，其中序列中的每个元素与权重相关联。例如，一系列网络事件是加权序列，因为每个事件都与时间戳相关联。通过事件发生模式查询大序列数据库是了解事件之间的时间因果关系的第一步。这里提出的索引结构使得能够通过事件和权重从数据库有效地检索与给定查询序列匹配的所有子序列（连续的和不连续的）。索引结构还考虑了序列数据中事件的不均匀频率分布。

74. 发明授权

US06871201B2 Method for building space-splitting decision tree 失效
标题翻译：建立空间分裂决策树的方法
公开(公告)号：US06871201B2
公开(公告)日：2005-03-22
申请号：US09918952
申请日：2001-07-31
申请人： Philip Shi-lung Yu , Haixun Wang
发明人： Philip Shi-lung Yu , Haixun Wang
IPC分类号： G06F7/00 , G06F17/30 , G06K9/62
CPC分类号： G06F17/30705 , G06K9/6282 , Y10S707/99935 , Y10S707/99937 , Y10S707/99943
摘要： A method is provided for data classification that achieves improved interpretability and accuracy while preserving the efficiency and scalability of univariate decision trees. To build a compact decision tree, the method searches for clusters in subspaces to enable multivariate splitting based on weighted distances to such a cluster. To classify an instance more accurately, the method performs a nearest neighbor (NN) search among the potential nearest leaf nodes of the instance. The similarity measure used in the NN search is based on Euclidean distances defined in different subspaces for different leaf nodes. Since instances are scored by their similarity to a certain class, this approach provides an effective means for target selection that is not supported well by conventional decision trees.
摘要翻译：提供了一种用于数据分类的方法，其实现了改进的可解释性和准确性，同时保持了单变量决策树的效率和可扩展性。为了构建一个紧凑的决策树，该方法将搜索子空间中的群集，以便根据这种群集的加权距离来启用多变量分割。为了更精确地对实例进行分类，该方法在实例的最靠近的叶节点之间执行最近邻（NN）搜索。 NN搜索中使用的相似性度量是基于不同叶节点不同子空间中定义的欧几里德距离。由于实例与某一类别的相似性得分，所以这种方法为常规决策树不能很好地支持目标选择提供了有效的手段。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式