专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07698294B2 Content object indexing using domain knowledge 有权
标题翻译：使用领域知识的内容对象索引
公开(公告)号：US07698294B2
公开(公告)日：2010-04-13
申请号：US11275509
申请日：2006-01-11
申请人： Wei-Ying Ma , Lie Lu , Ji-Rong Wen , Zhiwei Li , Zaiqing Nie , Hsiao-Wuen Hon
发明人： Wei-Ying Ma , Lie Lu , Ji-Rong Wen , Zhiwei Li , Zaiqing Nie , Hsiao-Wuen Hon
IPC分类号： G06F17/30
CPC分类号： G06F17/30613
摘要： A content object indexing process including creating a content object knowledge index, calculating a description vector of a target content object, and indexing the target content object by searching for the description vector in the content object knowledge database. It may be difficult to search for an exact content object such as a music file or academic researcher as a conventional search index may not include related hierarchical information. A content object indexing process may add hierarchical information taken from a content object knowledge index and incorporate the hierarchical information to the index entry for a specific content object. An application of such a content object indexing process may be a world wide web search engine.
摘要翻译：内容对象索引处理包括创建内容对象知识索引，计算目标内容对象的描述向量，并通过搜索内容对象知识库中的描述向量来索引目标内容对象。可能难以搜索诸如音乐文件或学术研究者的确切内容对象，因为传统的搜索索引可能不包括相关的分层信息。内容对象索引处理可以添加从内容对象知识索引获取的分层信息，并且将分层信息并入特定内容对象的索引条目。这样的内容对象索引处理的应用可以是万维网搜索引擎。

2. 发明申请

US20070162408A1 Content Object Indexing Using Domain Knowledge 有权
标题翻译：使用域知识的内容对象索引
公开(公告)号：US20070162408A1
公开(公告)日：2007-07-12
申请号：US11275509
申请日：2006-01-11
申请人： Wei-Ying Ma , Lie Lu , Ji-Rong Wen , Zhiwei Li , Zaiqing Nie , Hsiao-Wuen Hon
发明人： Wei-Ying Ma , Lie Lu , Ji-Rong Wen , Zhiwei Li , Zaiqing Nie , Hsiao-Wuen Hon
IPC分类号： G06N5/02
CPC分类号： G06F17/30613
摘要： A content object indexing process including creating a content object knowledge index, calculating a description vector of a target content object, and indexing the target content object by searching for the description vector in the content object knowledge database. It may be difficult to search for an exact content object such as a music file or academic researcher as a conventional search index may not include related hierarchical information. A content object indexing process may add hierarchical information taken from a content object knowledge index and incorporate the hierarchical information to the index entry for a specific content object. An application of such a content object indexing process may be a world wide web search engine.
摘要翻译：内容对象索引处理包括创建内容对象知识索引，计算目标内容对象的描述向量，并通过搜索内容对象知识库中的描述向量来索引目标内容对象。可能难以搜索诸如音乐文件或学术研究者的确切内容对象，因为传统的搜索索引可能不包括相关的分层信息。内容对象索引处理可以添加从内容对象知识索引获取的分层信息，并且将分层信息并入特定内容对象的索引条目。这样的内容对象索引处理的应用可以是万维网搜索引擎。

3. 发明授权

US08538898B2 Interactive framework for name disambiguation 有权
标题翻译：互动框架的名称消歧
公开(公告)号：US08538898B2
公开(公告)日：2013-09-17
申请号：US13118404
申请日：2011-05-28
申请人： Zhengdong Lu , Zaiqing Nie , Gang Luo , Yong Cao , Ji-Rong Wen , Wei-Ying Ma
发明人： Zhengdong Lu , Zaiqing Nie , Gang Luo , Yong Cao , Ji-Rong Wen , Wei-Ying Ma
IPC分类号： G06N5/00
CPC分类号： G06N99/005 , G06F17/30616
摘要： A “Name Disambiguator” provides various techniques for implementing an interactive framework for resolving or disambiguating entity names (associated with objects such as publications) for entity searches where two or more same or similar names may refer to different entities. More specifically, the Name Disambiguator uses a combination of user input and automatic models to address the disambiguation problem. In various embodiments, the Name Disambiguator uses a two part process, including: 1) a global SVM trained from large sets of documents or objects in a simulated interactive mode, and 2) further personalization of local SVM models (associated with individual names or groups of names such as, for example, a group of coauthors) derived from the global SVM model. The result of this process is that large sets of documents or objects are rapidly and accurately condensed or clustered into ordered sets by that are organized by entity names.
摘要翻译： “名称歧义者”提供了各种技术，用于实现用于解析或消除实体名称（与诸如出版物的对象相关联）的交互式框架，用于实体搜索，其中两个或多个相同或相似的名称可以指代不同的实体。更具体地说，名称消歧器使用用户输入和自动模型的组合来解决消歧问题。在各种实施例中，名称消歧器使用两部分过程，包括：1）以模拟交互模式从大量文档或对象训练的全局SVM，以及2）本地SVM模型的进一步个性化（与个体名称或组相关联来自全球SVM模型的名称，例如一组合作者。这个过程的结果是，大量的文档或对象可以通过按实体名称组织的快速，准确的浓缩或聚类成有序集。

4. 发明授权

US07720830B2 Hierarchical conditional random fields for web extraction 失效
标题翻译： Web提取的分层条件随机字段
公开(公告)号：US07720830B2
公开(公告)日：2010-05-18
申请号：US11461400
申请日：2006-07-31
申请人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
发明人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
IPC分类号： G06F7/00 , G06F17/30 , G06F17/00 , G06F15/173
CPC分类号： G06F17/3089 , G06F17/30994
摘要： A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.
摘要翻译：提供了一种用于标记信息页面的对象信息的方法和系统。标签系统基于对象记录中的对象元素的标签来识别信息页面的对象记录，并且基于包含对象元素的对象记录的标识来标记对象元素。为了识别记录并标记元素，标签系统生成信息页的块的分层表示。标签系统通过块的层次传播记录标签和元素标签的概率相关信息来识别记录中的记录和元素。标签系统为每个块生成特征向量以表示块，并且基于从与相关块相关联的特征向量导出的分数来计算块正确的标签的概率。标签系统搜索具有最高准确概率的记录和元素的标签。

5. 发明授权

US07529748B2 Information classification paradigm 有权
标题翻译：信息分类范式
公开(公告)号：US07529748B2
公开(公告)日：2009-05-05
申请号：US11276818
申请日：2006-03-15
申请人： Ji-Rong Wen , Yan-Feng Sun , Wei-Ying Ma , Zaiqing Nie , Renkuan Jiang
发明人： Ji-Rong Wen , Yan-Feng Sun , Wei-Ying Ma , Zaiqing Nie , Renkuan Jiang
IPC分类号： G06F17/30
CPC分类号： G06F17/30707 , Y10S707/99933 , Y10S707/99937
摘要： A mechanism to classify source documents into one of two categories, either likely to contain desired information or unlikely to contain desired information. Generally some form of rules based classification in conjunction with deeper analysis using advanced techniques on difficult cases is utilized. The rules based classification is generally good for eliminating cases from further consideration and for identifying documents of interest based on generally discernable relationships between data or based on the presence or absence of data. The deeper analysis is used to uncover more complex relationships between data that may identify documents of interest. Portions of the process may use the entire document while other portions of the process may use only a portion of the document.
摘要翻译：将源文档分类为两个类别之一的机制，可能包含所需信息或不太可能包含所需信息。通常使用某种形式的基于规则的分类，结合使用先进技术在困难案例上进行更深入的分析。基于规则的分类通常对于消除进一步考虑的情况以及基于数据之间的一般可辨别的关系或基于数据的存在或不存在来识别感兴趣的文档是有益的。更深入的分析用于发现可能识别感兴趣文档的数据之间更复杂的关系。过程的一部分可以使用整个文档，而进程的其他部分可以仅使用文档的一部分。

6. 发明申请

US20080027969A1 HIERARCHICAL CONDITIONAL RANDOM FIELDS FOR WEB EXTRACTION 失效
标题翻译：用于网络提取的分层条件随机域
公开(公告)号：US20080027969A1
公开(公告)日：2008-01-31
申请号：US11461400
申请日：2006-07-31
申请人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
发明人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
IPC分类号： G06F7/00
CPC分类号： G06F17/3089 , G06F17/30994
摘要： A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.
摘要翻译：提供了一种用于标记信息页面的对象信息的方法和系统。标签系统基于对象记录中的对象元素的标签来识别信息页面的对象记录，并且基于包含对象元素的对象记录的标识来标记对象元素。为了识别记录并标记元素，标签系统生成信息页的块的分层表示。标签系统通过块的层次传播记录标签和元素标签的概率相关信息来识别记录中的记录和元素。标签系统为每个块生成特征向量以表示块，并且基于从与相关块相关联的特征向量导出的分数来计算块正确的标签的概率。标签系统搜索具有最高准确概率的记录和元素的标签。

7. 发明申请

US20080027910A1 WEB OBJECT RETRIEVAL BASED ON A LANGUAGE MODEL 失效
标题翻译：基于语言模型的WEB对象检索
公开(公告)号：US20080027910A1
公开(公告)日：2008-01-31
申请号：US11459857
申请日：2006-07-25
申请人： Ji-Rong Wen , Shuming Shi , Wei-Ying Ma , Yunxiao Ma , Zaiqing Nie
发明人： Ji-Rong Wen , Shuming Shi , Wei-Ying Ma , Yunxiao Ma , Zaiqing Nie
IPC分类号： G06F17/30
CPC分类号： G06F17/30687 , G06F17/30864 , G06F17/30896 , Y10S707/99936
摘要： A method and system is provided for determining relevance of an object to a term based on a language model. The relevance system provides records extracted from web pages that relate to the object. To determine the relevance of the object to a term, the relevance system first determines, for each record of the object, a probability of generating that term using a language model of the record of that object. The relevance system then calculates the relevance of the object to the term by combining the probabilities. The relevance system may also weight the probabilities based on the accuracy or reliability of the extracted information for each data source.
摘要翻译：提供了一种基于语言模型来确定对象与术语的相关性的方法和系统。相关系统提供从与该对象相关的网页提取的记录。为了确定对象与术语的相关性，相关系统首先确定对象的每个记录，使用该对象的记录的语言模型生成该术语的概率。相关系统然后通过组合概率来计算对象与该术语的相关性。相关系统还可以基于每个数据源提取的信息的准确性或可靠性对概率进行加权。

8. 发明申请

US20060235875A1 Method and system for identifying object information 有权
标题翻译：用于识别对象信息的方法和系统
公开(公告)号：US20060235875A1
公开(公告)日：2006-10-19
申请号：US11106383
申请日：2005-04-13
申请人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie
发明人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie
IPC分类号： G06F7/00
CPC分类号： G06F17/30864 , Y10S707/99933 , Y10S707/99936
摘要： A method and system for identifying object information of an information page is provided. An information extraction system identifies the object blocks of an information page. The extraction system classifies the object blocks into object types. Each object type has associated attributes that define a schema for the information of the object type. The extraction system identifies object elements within an object block that may represent an attribute value for the object. After the object elements are identified, the extraction system attempts to identify which object elements correspond to which attributes of the object type in a process referred to as “labeling.” The extraction system uses an algorithm to determine the confidence that a certain object element corresponds to a certain attribute. The extraction system then selects the set of labels with the highest confidence as being the labels for the object elements.
摘要翻译：提供了一种用于识别信息页面的对象信息的方法和系统。信息提取系统识别信息页面的对象块。提取系统将对象块分类为对象类型。每个对象类型都具有关联属性，用于定义对象类型信息的模式。提取系统识别对象块中可能表示对象的属性值的对象元素。在识别对象元素之后，提取系统尝试识别在被称为“标签”的过程中哪些对象元素对应于对象类型的哪些属性。提取系统使用算法来确定某个对象元素对应于某个属性的置信度。然后，提取系统选择具有最高置信度的一组标签作为对象元素的标签。

9. 发明申请

US20110264658A1 WEB OBJECT RETRIEVAL BASED ON A LANGUAGE MODEL 审中-公开
标题翻译：基于语言模型的WEB对象检索
公开(公告)号：US20110264658A1
公开(公告)日：2011-10-27
申请号：US13175796
申请日：2011-07-01
申请人： Ji-Rong Wen , Shuming Shi , Wei-Ying Ma , Yunxiao Ma , Zaiqing Nie
发明人： Ji-Rong Wen , Shuming Shi , Wei-Ying Ma , Yunxiao Ma , Zaiqing Nie
IPC分类号： G06F17/30
CPC分类号： G06F16/3346 , G06F16/951 , G06F16/986 , Y10S707/99936
摘要： A method and system is provided for determining relevance of an object to a term based on a language model. The relevance system provides records extracted from web pages that relate to the object. To determine the relevance of the object to a term, the relevance system first determines, for each record of the object, a probability of generating that term using a language model of the record of that object. The relevance system then calculates the relevance of the object to the term by combining the probabilities. The relevance system may also weight the probabilities based on the accuracy or reliability of the extracted information for each data source.
摘要翻译：提供了一种基于语言模型来确定对象与术语的相关性的方法和系统。相关系统提供从与该对象相关的网页提取的记录。为了确定对象与术语的相关性，相关系统首先确定对象的每个记录，使用该对象的记录的语言模型生成该术语的概率。相关系统然后通过组合概率来计算对象与该术语的相关性。相关系统还可以基于每个数据源提取的信息的准确性或可靠性对概率进行加权。

10. 发明申请

US20100281009A1 HIERARCHICAL CONDITIONAL RANDOM FIELDS FOR WEB EXTRACTION 审中-公开
标题翻译：用于网络提取的分层条件随机域
公开(公告)号：US20100281009A1
公开(公告)日：2010-11-04
申请号：US12776308
申请日：2010-05-07
申请人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
发明人： Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F16/958 , G06F16/904
摘要： A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.
摘要翻译：提供了一种用于标记信息页面的对象信息的方法和系统。标签系统基于对象记录中的对象元素的标签来识别信息页面的对象记录，并且基于包含对象元素的对象记录的标识来标记对象元素。为了识别记录并标记元素，标签系统生成信息页的块的分层表示。标签系统通过块的层次传播记录标签和元素标签的概率相关信息来识别记录中的记录和元素。标签系统为每个块生成特征向量以表示块，并且基于从与相关块相关联的特征向量导出的分数来计算块正确的标签的概率。标签系统搜索具有最高准确概率的记录和元素的标签。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式