会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 4. 发明授权
    • Techniques for clustering structurally similar web pages
    • 聚类结构相似网页的技术
    • US07680858B2
    • 2010-03-16
    • US11481734
    • 2006-07-05
    • Krishna Leela PoolaArun Ramanujapuram
    • Krishna Leela PoolaArun Ramanujapuram
    • G06F17/30
    • G06F17/3071G06F17/30864G06K9/6219G06K9/6224
    • Web page clustering techniques described herein are URL Clustering and Page Clustering, whereby clustering algorithms cluster together pages that are structurally similar. Regarding URL clustering, because similarly structured pages have similar patterns in their URLs, grouping similar URL patterns will group structurally similar pages. Embodiments of URL clustering may involve: (a) URL normalization and (b) URL variation computation. Regarding page clustering, page feature-based techniques further cluster any given set of homogenous clusters, reducing the number of clusters based on the underlying page code. Embodiments of page clustering may reduce the number of clusters based on the tag probabilities and the tag sequence, utilizing an Approximate Nearest Neighborhood (ANN) graph along with evaluation of intra-cluster and inter-cluster compactness.
    • 本文描述的网页聚类技术是URL聚类和页面聚类,其中聚类算法将结构上类似的页面聚类在一起。 关于URL聚类,因为类似结构化的页面在其URL中具有相似的模式,所以对类似的URL模式进行分组会将结构上类似的页面分组。 URL聚类的实施例可以包括:(a)URL归一化和(b)URL变化计算。 关于页面聚类,基于页面特征的技术进一步聚集任何给定的同质聚类集合,从而减少基于底层页面代码的聚类数量。 基于标签概率和标签序列,页面聚类的实施例可以使用近似最近邻域(ANN)图以及群内和集群内紧密度的评估来减少簇的数量。
    • 5. 发明申请
    • Techniques for clustering structurally similar webpages based on page features
    • 基于页面特征聚类结构相似的网页的技术
    • US20080010292A1
    • 2008-01-10
    • US11481809
    • 2006-07-05
    • Krishna Leela Poola
    • Krishna Leela Poola
    • G06F17/30
    • G06F17/3071G06F17/2211G06F17/2264G06F17/248G06F17/30896G06K9/6219Y10S707/99935
    • Web page clustering techniques described herein are URL Clustering and Page Clustering, whereby clustering algorithms cluster together pages that are structurally similar. Regarding URL clustering, because similarly structured pages have similar patterns in their URLs, grouping similar URL patterns will group structurally similar pages. Embodiments of URL clustering may involve: (a) URL normalization and (b) URL variation computation. Regarding page clustering, page feature-based techniques further cluster any given set of homogenous clusters, reducing the number of clusters based on the underlying page code. Embodiments of page clustering may reduce the number of clusters based on the tag probabilities and the tag sequence, utilizing an Approximate Nearest Neighborhood (ANN) graph along with evaluation of intra-cluster and inter-cluster compactness.
    • 本文描述的网页聚类技术是URL聚类和页面聚类,其中聚类算法将结构上类似的页面聚类在一起。 关于URL聚类,因为类似结构化的页面在其URL中具有相似的模式,所以对类似的URL模式进行分组会将结构上类似的页面分组。 URL聚类的实施例可以包括:(a)URL归一化和(b)URL变化计算。 关于页面聚类,基于页面特征的技术进一步聚集任何给定的同质聚类集合,从而减少基于底层页面代码的聚类数量。 基于标签概率和标签序列,页面聚类的实施例可以使用近似最近邻域(ANN)图以及群内和集群内紧密度的评估来减少簇的数量。
    • 8. 发明授权
    • System and method for detecting templates of a website using hyperlink analysis
    • 使用超链接分析检测网站模板的系统和方法
    • US07962523B2
    • 2011-06-14
    • US12101293
    • 2008-04-11
    • Krishna Leela Poola
    • Krishna Leela Poola
    • G06F7/00G06F17/30G06F13/14
    • G06F17/30887
    • The present invention relates to methods, systems, and computer readable media comprising instructions for detecting templates within one or more web pages comprising a website. The method of the present invention comprises generating one or more groups of hyperlinks within a respective web page of the one or more web pages comprising the website. An in-link score is calculated for a given uniform resource locator associated with the one or more web pages comprising the website. The hyperlink groups in which the uniform resource locators associated with the one or more web pages comprising the website appear are identified. A template score is assigned to the identified hyperlinks groups on the basis of the in-link score associated with the uniform resource locators to which the hyperlinks comprising the hyperlink group correspond. The hyperlink groups with template scores exceeding a given template score threshold are thereafter identified as templates.
    • 本发明涉及包括用于检测包括网站的一个或多个网页内的模板的指令的方法,系统和计算机可读介质。 本发明的方法包括在包含该网站的一个或多个网页的相应网页内生成一组或多组超链接。 对于与包括网站的一个或多个网页相关联的给定的统一资源定位器,计算链接内分数。 识别与包含网站的一个或多个网页相关联的统一资源定位符的超链接组。 基于与包含超链接组的超链接相对应的统一资源定位符相关联的链接内分数,将模板评分分配给所标识的超链接组。 具有超过给定模板分数阈值的模板分数的超链接组随后被识别为模板。
    • 9. 发明授权
    • Method for organizing structurally similar web pages from a web site
    • 从网站组织结构相似的网页的方法
    • US07941420B2
    • 2011-05-10
    • US11838351
    • 2007-08-14
    • Krishna Prasad ChitrapuraKrishna Leela Poola
    • Krishna Prasad ChitrapuraKrishna Leela Poola
    • G06F17/30
    • G06F17/30896
    • Techniques are described for organizing structurally similar web pages for a website. Fingerprints are made of the structure of the web pages using shingling by placing the web page's HTML tags and attributes in sequence and encoding the tags and attributes using a standard encoding technique. Fixed-size portions of the encoded sequence are taken and a set of values extracted using independent hash functions to compute the shingles. Alternatively, a DOM tree representation of HTML of the web page is generated and each path of the DOM tree encoded and values extracted using independent hash functions to compute the shingles. A specified number of shingles are retained as the fingerprint. The pages are then clustered based upon the URL and the similarity of the shingles. The clustered hierarchal organization of pages is further pruned by various criteria including similarity of shingles or support of the cluster node in the hierarchy.
    • 描述了用于组织网站的结构相似网页的技术。 使用标准编码技术,通过依次放置网页的HTML标签和属性并对标签和属性进行编码,指纹由网页的结构构成。 采用编码序列的固定大小部分,并使用独立散列函数提取一组值来计算带状疱疹。 或者,生成网页的HTML的DOM树表示,并且使用独立散列函数对DOM树的每个路径进行编码和提取,以计算带状疱疹。 保留指定数量的带状键作为指纹。 然后基于URL和带状疱疹的相似性来聚集页面。 通过各种标准进一步修剪页面的集群层次结构,包括层次结构中带状块的相似性或群集节点的支持。
    • 10. 发明授权
    • Techniques for clustering structurally similar web pages based on page features
    • 基于页面特征聚类结构相似网页的技术
    • US07676465B2
    • 2010-03-09
    • US11481809
    • 2006-07-05
    • Krishna Leela Poola
    • Krishna Leela Poola
    • G06F7/00G06F17/30
    • G06F17/3071G06F17/2211G06F17/2264G06F17/248G06F17/30896G06K9/6219Y10S707/99935
    • Web page clustering techniques described herein are URL Clustering and Page Clustering, whereby clustering algorithms cluster together pages that are structurally similar. Regarding URL clustering, because similarly structured pages have similar patterns in their URLs, grouping similar URL patterns will group structurally similar pages. Embodiments of URL clustering may involve: (a) URL normalization and (b) URL variation computation. Regarding page clustering, page feature-based techniques further cluster any given set of homogenous clusters, reducing the number of clusters based on the underlying page code. Embodiments of page clustering may reduce the number of clusters based on the tag probabilities and the tag sequence, utilizing an Approximate Nearest Neighborhood (ANN) graph along with evaluation of intra-cluster and inter-cluster compactness.
    • 本文描述的网页聚类技术是URL聚类和页面聚类,其中聚类算法将结构上类似的页面聚类在一起。 关于URL聚类,因为类似结构化的页面在其URL中具有相似的模式,所以对类似的URL模式进行分组会将结构上类似的页面分组。 URL聚类的实施例可以包括:(a)URL归一化和(b)URL变化计算。 关于页面聚类,基于页面特征的技术进一步聚集任何给定的同质聚类集合,从而减少基于底层页面代码的聚类数量。 基于标签概率和标签序列,页面聚类的实施例可以使用近似最近邻域(ANN)图以及群内和集群内紧密度的评估来减少簇的数量。