会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 8. 发明授权
    • Techniques for crawling dynamic web content
    • 抓取动态网页内容的技巧
    • US07536389B1
    • 2009-05-19
    • US11064278
    • 2005-02-22
    • Bangalore Subbaramaiah PrabhakarShivakumar GanesanYarram Sunil KumarShreekanth KarvajeBinu Raj
    • Bangalore Subbaramaiah PrabhakarShivakumar GanesanYarram Sunil KumarShreekanth KarvajeBinu Raj
    • G06F17/30G06F15/16
    • G06F17/30864Y10S707/92Y10S707/99931Y10S707/99953
    • An automated form filler and script executor is integrated with a web browser engine, which is communicatively coupled to a web crawler, thereby enabling the crawler to identify dynamic web content based on submission of forms completed by the form filler. The crawler is capable of identifying web pages containing forms that require submission, and JavaScript code that requires execution, respectively, for requesting dynamic web content from a server. The crawler passes a representation of such web pages to the browser engine. The form filler systematically completes the form based on various combinations of search parameter values provided by the web page for requesting dynamic content. Request messages are constructed by the browser engine and passed to the crawler for submission to the server. The dynamic content, received by the crawler from the server in response to the request, can be indexed according to conventional search engine indexing techniques.
    • 自动填充表单和脚本执行器与Web浏览器引擎集成,Web引擎通信地耦合到网页抓取工具,从而使爬行者能够基于表单填写完成的表单提交来识别动态网页内容。 抓取器能够识别包含需要提交的表单的网页,以及需要执行的JavaScript代码,用于从服务器请求动态Web内容。 爬行器将这种网页的表示传递给浏览器引擎。 表单填充系统基于由网页提供的用于请求动态内容的搜索参数值的各种组合系统地完成表单。 请求消息由浏览器引擎构建,并传递到爬网程序以提交到服务器。 根据传统的搜索引擎索引技术,可以根据请求从履历服务器收到的动态内容进行索引。
    • 9. 发明授权
    • Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
    • 无监督,自动化的Web主机动态检测,死链接检测和搜索索引网页的先决条件页面发现
    • US07610267B2
    • 2009-10-27
    • US11203832
    • 2005-08-13
    • Parashuram KulkarniThejas Madhavan NairBinu Raj
    • Parashuram KulkarniThejas Madhavan NairBinu Raj
    • G06F17/30
    • G06F17/3089G06F17/30864Y10S707/99932Y10S707/99933
    • Automated crawling of page links associated with a site domain that was previously crawled involves computing the dynamicity of a site based on totals of continuous dead links, live links and/or prerequisite pages encountered while crawling page links corresponding to the site. The degree to which links are crawled is optimized based on the dynamicity of the site. Some pages require that another particular page (i.e., a prerequisite page) is retrieved from the host prior to retrieving a given page, e.g., so that the prerequisite page can set a cookie. Prerequisite pages are determined based on stored information about pages that were retrieved, during a previous crawl, prior to retrieving a page. Prerequisite pages are identified to a search system so that when a user clicks on the URL for the page, the request is redirected to the prerequisite page to set the cookie appropriately.
    • 与先前抓取的站点域相关联的页面链接的自动抓取涉及基于在爬行与站点对应的页面链接时遇到的连续死链接,实时链接和/或前提页面的总计来计算站点的动态性。 根据站点的动态性优化链接被爬行的程度。 某些页面要求在检索给定页面之前从主机检索另一个特定页面(即,先决条件页面),例如,使得前提页面可以设置cookie。 先决条件页面是基于存储的信息确定的,该信息是在检索页面之前在之前的爬网中检索到的页面。 先决条件页面被标识到搜索系统,使得当用户点击页面的URL时,请求被重定向到先决条件页面以适当地设置cookie。
    • 10. 发明申请
    • Unsupervised learning tool for feature correction
    • 无监督学习工具进行功能校正
    • US20070043707A1
    • 2007-02-22
    • US11253023
    • 2005-10-17
    • Parashuram KulkarniBinu Raj
    • Parashuram KulkarniBinu Raj
    • G06F17/30
    • G06F17/30861
    • Techniques for correcting miscategorized features excerpted from web pages are provided. For each of several categories and several pages on a particular web site, a separate feature may be excerpted from that page and associated with that page in relation to that category. Often, many of the “high confidence” features that have been associated with the same category are found to be associated with similar characteristics regardless of the pages from which those features were excerpted. Thus, a set of category characteristics, which are often found associated with the “high confidence” features in a particular category, may be determined. For each page, a candidate feature that is associated with the set of category characteristics may be identified in that page. If, in relation to the particular category, a feature other than the candidate feature is associated with that page, then that other feature may be replaced by the candidate feature.
    • 提供了从网页上摘录的分类功能的技巧。 对于特定网站上的几个类别和多个页面中的每一个,可以从该页面摘录单独的特征并且与该页面相关联地与该页面相关联。 通常,与相同类别相关联的许多“高信度”特征被发现与相似的特征相关,而不管这些特征被摘录的页面。 因此,可以确定通常发现与特定类别中的“高置信度”特征相关联的一组类别特征。 对于每个页面,可以在该页面中识别与该组类别特征相关联的候选特征。 如果关于特定类别,除了候选特征之外的特征与该页面相关联,则该另一特征可以被候选特征替换。