会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 7. 发明申请
    • METHOD AND APPARATUS FOR DOCUMENT CLUSTERING AND DOCUMENT SKETCHING
    • 用于文档聚类和文档绘制的方法和装置
    • US20070005589A1
    • 2007-01-04
    • US11427781
    • 2006-06-29
    • SREENIVAS GOLLAPUDI
    • SREENIVAS GOLLAPUDI
    • G06F17/30
    • G06F17/30616G06F17/3071Y10S707/99933Y10S707/99935Y10S707/99942Y10S707/99943
    • A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.
    • 本发明的第一实施例提供了一种系统,其基于文档之间的相似性自动地将集合中的文档分类成簇,该文档将新文档自动分类到正确的集群中,并且可以在各种情况下改变集群的数量或参数。 本发明的第二实施例提供了一种用于比较两个文档的技术,其中计算每个文档的指纹或草图。 特别地,本发明的该实施例使用特定的算法来计算文档的指纹。一个实施例将文档中的句子用作提取有效字的逻辑定界符或窗口,此后,计算所有对 - 明智的排列。 根据文档中的权重提取单词,可以使用诸如术语频率和逆文档频率等度量来计算单词。