专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US09483463B2 Method and system for motif extraction in electronic documents 有权
标题翻译：电子文件中图案提取的方法和系统
公开(公告)号：US09483463B2
公开(公告)日：2016-11-01
申请号：US13608312
申请日：2012-09-10
申请人： Matthias Galle , Jean-Michel Renders
发明人： Matthias Galle , Jean-Michel Renders
IPC分类号： G10L15/22 , G10L15/187 , G06Q30/04 , G06Q30/02 , G06Q40/00 , G06F17/27 , G06F17/24 , G06F19/18 , G06F19/22 , G06F7/24
CPC分类号： G06F17/2775 , G06F17/248
摘要： A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents.
摘要翻译：公开了一种用于从电子文档中提取文本图案的方法，系统和计算机程序产品。用户提供最大最大重复或超最大重复作为第一文本块。检测第一文本块的出现，以基于预定义的参数来识别第一文本块的出现附近的第二文本块。通过组合第一文本块和第二文本块来确定文本图案。最后，从电子文件中提取文本图案。

2. 发明授权

US08880525B2 Full and semi-batch clustering 有权
标题翻译：全和半批聚类
公开(公告)号：US08880525B2
公开(公告)日：2014-11-04
申请号：US13437079
申请日：2012-04-02
申请人： Matthias Galle , Jean-Michel Renders
发明人： Matthias Galle , Jean-Michel Renders
IPC分类号： G06F17/30
CPC分类号： G06F17/30 , G06F17/30707
摘要： A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.
摘要翻译：提供了一种聚类文档的方法。每个文档由多维数据点表示。数据点最初分配给相应的集群，并充当其初始代表点。此后，在迭代过程中，通过基于与簇或其代表点的每个数据点的比较度量以及比较度量的阈值将数据点分配给群集，将数据点聚类在群集中。基于此聚类，可以计算出每个簇的新的代表点。可选地，重叠的聚类被合并。对于下一次迭代，将使用新的代表点作为代表点。基于最新迭代中数据点的聚类，输出文档到集群的分配。可以处理多个批次，保留分配原始批次的初始集群。

3. 发明申请

US20140074455A1 METHOD AND SYSTEM FOR MOTIF EXTRACTION IN ELECTRONIC DOCUMENTS 有权
标题翻译：电子文档中动力提取的方法和系统
公开(公告)号：US20140074455A1
公开(公告)日：2014-03-13
申请号：US13608312
申请日：2012-09-10
申请人： Matthias Galle , Jean-Michel Renders
发明人： Matthias Galle , Jean-Michel Renders
IPC分类号： G06F17/27
CPC分类号： G06F17/2775 , G06F17/248
摘要： A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents.
摘要翻译：公开了一种用于从电子文档中提取文本图案的方法，系统和计算机程序产品。用户提供最大最大重复或超最大重复作为第一文本块。检测第一文本块的出现，以基于预定义的参数来识别第一文本块的出现附近的第二文本块。通过组合第一文本块和第二文本块来确定文本图案。最后，从电子文件中提取文本图案。

4. 发明申请

US20130262465A1 FULL AND SEMI-BATCH CLUSTERING 有权
标题翻译：全集和半集群
公开(公告)号：US20130262465A1
公开(公告)日：2013-10-03
申请号：US13437079
申请日：2012-04-02
申请人： Matthias Galle , Jean-Michel Renders
发明人： Matthias Galle , Jean-Michel Renders
IPC分类号： G06F17/30
CPC分类号： G06F17/30 , G06F17/30707
摘要： A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.
摘要翻译：提供了一种聚类文档的方法。每个文档由多维数据点表示。数据点最初分配给相应的集群，并充当其初始代表点。此后，在迭代过程中，通过基于与簇或其代表点的每个数据点的比较度量以及比较度量的阈值将数据点分配给群集，将数据点聚类在群集中。基于此聚类，可以计算出每个簇的新的代表点。可选地，重叠的聚类被合并。对于下一次迭代，将使用新的代表点作为代表点。基于最新迭代中数据点的聚类，输出文档到集群的分配。可以处理多个批次，保留分配原始批次的初始集群。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式