专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07685112B2 Method and apparatus for retrieving and indexing hidden pages 有权
标题翻译：用于检索和索引隐藏网页的方法和装置
公开(公告)号：US07685112B2
公开(公告)日：2010-03-23
申请号：US11570330
申请日：2005-05-27
申请人： Alexandros Ntoulas , Junghoo Cho , Petros Zerfos
发明人： Alexandros Ntoulas , Junghoo Cho , Petros Zerfos
IPC分类号： G06F17/30 , G06F15/173
CPC分类号： G06F17/30864
摘要： A method and system for autonomously downloading and indexing Hidden Web pages from Websites includes the steps of selecting a query term and issuing a query to a site-specific search interface containing Hidden Web pages. A results index is then acquired and the Hidden Web pages are downloaded from the results index. A plurality of potential query terms are then identified from the downloaded Hidden Web pages. The efficiency of each potential query term is then estimated and a next query term is selected from the plurality of potential query terms, wherein the next selected query term has the greatest efficiency. The next selected query term is then issued to the site-specific search interface using the next query term. The process is repeated until all or most of the Hidden Web pages are discovered.
摘要翻译：用于从网站自主下载和索引隐藏网页的方法和系统包括以下步骤：选择查询项并向包含隐藏网页的站点特定搜索界面发出查询。然后获取结果索引，并从结果索引中下载隐藏的网页。然后从下载的隐藏的网页中识别出多个潜在的查询词。然后估计每个潜在查询项的效率，并且从多个潜在查询项中选择下一个查询项，其中下一个所选择的查询项具有最大的效率。然后使用下一个查询项将下一个选定的查询项发布到特定于站点的搜索界面。重复该过程，直到发现所有或大部分隐藏的网页。

2. 发明申请

US20080097958A1 Method and Apparatus for Retrieving and Indexing Hidden Pages 有权
标题翻译：检索和索引隐藏页面的方法和装置
公开(公告)号：US20080097958A1
公开(公告)日：2008-04-24
申请号：US11570330
申请日：2005-05-27
申请人： Alexandros Ntoulas , Junghoo Cho , Petros Zerfos
发明人： Alexandros Ntoulas , Junghoo Cho , Petros Zerfos
IPC分类号： G06F15/16
CPC分类号： G06F17/30864
摘要： A method and system is provided for autonomously downloading and indexing Hidden Web pages from Websites having site-specific search interfaces. The method may be implemented using a crawler program or the like to autonomously cull Hidden Web content. The method includes the steps of selecting a query term and issuing a query to a site-specific search interface containing Hidden Web pages. A results index is then acquired and the Hidden Web pages are downloaded from the results index. A plurality of potential query terms are then identified from the downloaded Hidden Web pages. The efficiency of each potential query term is then estimated and a next query term is selected from the plurality of potential query terms, wherein the next selected query term has the greatest efficiency. The next selected query term is then issued to the site-specific search interface using the next query term. The process is repeated until all or most of the Hidden Web pages are discovered. In one aspect of the invention, the efficiency of each potential query term is expressed as a ratio of number of new documents returned for the potential query term to the cost associated with issuing the potential query.
摘要翻译：提供了一种方法和系统，用于从具有特定于站点的搜索界面的网站自动下载和索引隐藏的网页。该方法可以使用爬行程序等来实现，以自主地剔除隐藏的Web内容。该方法包括以下步骤：选择查询项并向包含隐藏网页的站点特定搜索界面发出查询。然后获取结果索引，并从结果索引中下载隐藏的网页。然后从下载的隐藏的网页中识别出多个潜在的查询词。然后估计每个潜在查询项的效率，并且从多个潜在查询项中选择下一个查询项，其中下一个所选择的查询项具有最大的效率。然后使用下一个查询项将下一个选定的查询项发布到特定于站点的搜索界面。重复该过程，直到发现所有或大部分隐藏的网页。在本发明的一个方面，每个潜在查询项的效率被表示为为潜在查询项返回的新文档的数量与发布潜在查询相关联的成本的比率。

3. 发明授权

US06754650B2 System and method for regular expression matching using index 有权
标题翻译：使用索引进行正则表达式匹配的系统和方法
公开(公告)号：US06754650B2
公开(公告)日：2004-06-22
申请号：US09850825
申请日：2001-05-08
申请人： Junghoo Cho , Sridhar Rajagopalan
发明人： Junghoo Cho , Sridhar Rajagopalan
IPC分类号： G06F1730
CPC分类号： G06F17/30622 , G06F17/30864 , Y10S707/99932 , Y10S707/99933
摘要： A system and method for executing a regular expression (regex) query against a large data repository such as the World Wide Web includes an index engine that constructs multigram indices based on regex. A run time then receives a regex query and accesses the indices to return a set of potentially matching pages, which are then efficiently and quickly searched for matches to the regex query.
摘要翻译：用于对诸如万维网的大型数据存储库执行正则表达式（正则表达式）查询的系统和方法包括构建基于正则表达式的多格式索引的索引引擎。运行时间然后接收正则表达式查询并访问索引以返回一组潜在的匹配页面，然后高效快速搜索与正则表达式查询的匹配。

4. 发明申请

US20060294124A1 Unbiased page ranking 审中-公开
标题翻译：无偏见的页面排名
公开(公告)号：US20060294124A1
公开(公告)日：2006-12-28
申请号：US11033691
申请日：2005-01-12
申请人： Junghoo Cho
发明人： Junghoo Cho
IPC分类号： G06F7/00
CPC分类号： G06F16/951
摘要： The pages in a network of linked pages are ranked based on the quality of the pages. Page quality is obtained by determining the change over time of the link structure of the page, which is obtained by determining the link structure of the page at different periods of time by taking multiple snapshots of the link structure of the network. The link structures are approximated by their PageRanks, page quality being determined by the formula: Q ⁡ ( p ) ≈ D · Δ ⁢ ⁢ PR ⁢ ( p ) PR ⁡ ( p ) + PR ⁡ ( p ) where Q(p) is the quality of the page, PR(p) is the current PageRank of the page, ΔPR(p) is the change over time in the PageRank of the page, and D is a constant that determines the relative weight of the terms ΔPR(p)/PR(p) and PR(p).
摘要翻译：链接页面网络中的页面根据页面的质量进行排名。通过确定通过采用网络的链接结构的多个快照来确定不同时间段的页面的链接结构而获得的页面的链接结构随时间的变化而获得页面质量。链接结构由它们的PageRank近似，页面质量由公式确定： $（p）MO>\approxDDeltaPRPRPR（p）其中Q（p）是页面的质量，PR（p）是页面的当前PageRank，DeltaPR（p）是PageRank中随时间推移的变化并且D是确定项DeltaPR（p）/ PR（p）和PR（p）的相对权重的常数。$

5. 发明授权

US06317740B1 Method and apparatus for assigning keywords to media objects 有权
标题翻译：将关键字分配给媒体对象的方法和装置
公开(公告)号：US06317740B1
公开(公告)日：2001-11-13
申请号：US09216521
申请日：1998-12-16
申请人： Sougata Mukherjea , Junghoo Cho
发明人： Sougata Mukherjea , Junghoo Cho
IPC分类号： G06F1730
CPC分类号： G06F17/30017 , G06F17/30244 , Y10S707/914 , Y10S707/915 , Y10S707/916 , Y10S707/917 , Y10S707/921 , Y10S707/99931 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要： A method and apparatus are defined for assigning keywords to media objects in files. The media objects include image, video and audio, for example. Various criteria are used for assigning the keywords, including measuring visual distance, measuring syntactical distance, detecting regular patterns in a table and detecting groups of images.
摘要翻译：定义了一种方法和装置，用于将关键字分配给文件中的媒体对象。媒体对象包括图像，视频和音频。使用各种标准来分配关键字，包括测量视距，测量句法距离，检测表中的规则图案和检测图像组。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式