专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US06907380B2 Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
标题翻译：基于加权分割和征服的计算机实现可扩展，增量和并行聚类
公开(公告)号：US06907380B2
公开(公告)日：2005-06-14
申请号：US10726254
申请日：2003-12-01
申请人： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani
发明人： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani
IPC分类号： G06K9/62 , G06F101/14 , G06F17/18 , G06F17/30
CPC分类号： G06K9/6218 , Y10S707/99936 , Y10S707/99937
摘要： A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.
摘要翻译：一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。该技术包括：1）将集合S划分成P个不相交的部分S 1。。。，S 2）对于每个块S i确定k个中间中心的集合D i i i i， 3）将每个片段S i中的每个数据点分配给k个中间中心中最接近的一个; 4）通过分配给该中心的相应片段S i i中的点的数量对每个集合D i i i中的每个k个中间中心进行加权; 和5）将加权中间体聚类在一起以找到所述k个最终中心，使用特定的误差度量和聚类方法A进行聚类。

2. 发明授权

US06684177B2 Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
标题翻译：基于加权分割和征服的计算机实现可扩展，增量和并行聚类
公开(公告)号：US06684177B2
公开(公告)日：2004-01-27
申请号：US09854212
申请日：2001-05-10
申请人： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani
发明人： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani
IPC分类号： G06F10114
CPC分类号： G06K9/6218 , Y10S707/99936 , Y10S707/99937
摘要： A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , SP; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.
摘要翻译：一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。该技术包括：1）将集合S划分成P个不相交的部分S1。。。，SP; 2）对于每个块Si，确定k个中心的集合Di; 3）将每个片段Si中的每个数据点分配给k个中间的最近的一个; 4）通过分配给该中心的相应片段Si中的点的数量对每个集合Di中的每个k个中间中心进行加权; 和5）将加权中间体聚类在一起以找到所述k个最终中心，使用特定的误差度量和聚类方法A进行聚类。

3. 发明授权

US07363301B2 Database aggregation query result estimator 有权
标题翻译：数据库聚合查询结果估计器
公开(公告)号：US07363301B2
公开(公告)日：2008-04-22
申请号：US11246355
申请日：2005-10-07
申请人： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar
发明人： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar
IPC分类号： G06F17/30
CPC分类号： G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943
摘要： Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
摘要翻译：通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后对没有异常值的剩余数据进行采样，以提供统计学上相关的样本，然后对其进行聚合和外插，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。

4. 发明申请

US20060085410A1 Sampling for queries 有权
标题翻译：查询抽样
公开(公告)号：US20060085410A1
公开(公告)日：2006-04-20
申请号：US11296036
申请日：2005-12-07
申请人： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar
发明人： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar
IPC分类号： G06F17/30
CPC分类号： G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942
摘要： A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
摘要翻译：通过基于执行工作负载所需的元组的使用概率，对数据库中的加权元组进行抽样来估计估计数据库查询结果的方法。每个元组采样的概率相关。并且，可以在每个采样的元组中的值上计算可以聚合，同时乘以与每个元组采样相关联的概率的逆。

5. 发明授权

US07191181B2 Database aggregation query result estimator 有权
标题翻译：数据库聚合查询结果估计器
公开(公告)号：US07191181B2
公开(公告)日：2007-03-13
申请号：US10873569
申请日：2004-06-22
申请人： Sarajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar
发明人： Sarajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar
IPC分类号： G06F17/30
CPC分类号： G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943
摘要： Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.
摘要翻译：通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后以许多已知方式之一对剩余的没有异常值的数据进行采样，以提供统计学相关的样本，然后进行聚合和外推，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

6. 发明授权

US08396886B1 Continuous processing language for real-time data streams 有权
标题翻译：用于实时数据流的连续处理语言
公开(公告)号：US08396886B1
公开(公告)日：2013-03-12
申请号：US11346119
申请日：2006-02-02
申请人： Mark Tsimelzon , Aleksey Sanin , Rajeev Motwani , Glenn Robert Seidman , Gayatri Patel
发明人： Mark Tsimelzon , Aleksey Sanin , Rajeev Motwani , Glenn Robert Seidman , Gayatri Patel
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30533 , G06F17/30516
摘要： A computer software language capable of expressing registered queries that operate on one more or more data streams continuously. The language of the present invention is based on a publish/subscribe model in that queries subscribe to data streams and publish to data streams. Also, the language of the present invention can express queries that operate directly on data streams. Since queries expressed in the language of the present invention may be executed continuously and directly on data streams, the language includes a clause for specifying time-based and/or row-based windows for the input data stream. Operations are then performed on the data within such windows. In one embodiment, the language is also SQL-like and includes a clause for defining named windows (which can be used in any number of queries); a clause for detecting a pattern, and correlated database subqueries for correlating data stream data with database tables.
摘要翻译：一种能够连续地表达对一个或多个数据流进行操作的注册查询的计算机软件语言。本发明的语言基于发布/订阅模型，其中查询订阅数据流并发布到数据流。此外，本发明的语言可以表达直接对数据流进行操作的查询。由于以本发明的语言表示的查询可以连续且直接地在数据流上执行，所以该语言包括用于为输入数据流指定基于时间的和/或基于行的窗口的子句。然后对这些窗口内的数据执行操作。在一个实施例中，语言也是类似SQL的，并且包括用于定义命名窗口（可以在任意数量的查询中使用）的子句; 用于检测模式的子句，以及用于将数据流数据与数据库表相关联的数据库子查询。

7. 发明授权

US07296011B2 Efficient fuzzy match for evaluating data records 有权
标题翻译：用于评估数据记录的高效模糊匹配
公开(公告)号：US07296011B2
公开(公告)日：2007-11-13
申请号：US10600083
申请日：2003-06-20
申请人： Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
发明人： Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30542 , G06F17/30303 , Y10S707/99933
摘要： To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
摘要翻译：为了帮助确保高数据质量，数据仓库验证和清理，如果需要外部来源的传入数据元组。在许多情况下，输入元组或输入元组的一部分必须匹配参考表中可接受的元组。例如，分销商的销售记录中的产品名称和描述字段必须与产品参考关系中的预先记录的名称和描述字段相匹配。所公开的系统实现有效和准确的近似或模糊匹配操作，其可以有效地清除传入元组，如果它不能与参考关系中的任何多个元组完全匹配。使用称为q-gram的令牌子串的公开的相似度函数克服了现有技术相似度功能的限制，同时有效地执行模糊匹配过程。

8. 发明申请

US20060053103A1 Database aggregation query result estimator 有权
标题翻译：数据库聚合查询结果估计器
公开(公告)号：US20060053103A1
公开(公告)日：2006-03-09
申请号：US11246354
申请日：2005-10-07
申请人： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar
发明人： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar
IPC分类号： G06F17/30
CPC分类号： G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943
摘要： Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
摘要翻译：通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后对没有异常值的剩余数据进行采样，以提供统计学上相关的样本，然后对其进行聚合和外插，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。

9. 发明授权

US07577638B2 Sampling for queries 有权
标题翻译：查询抽样
公开(公告)号：US07577638B2
公开(公告)日：2009-08-18
申请号：US11296034
申请日：2005-12-07
申请人： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar
发明人： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar
IPC分类号： G06F17/30
CPC分类号： G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942
摘要： An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.
摘要翻译：数据库和给定工作负荷的异常值索引是通过识别由工作负载中的查询中的选择引起的数据库中的元数据元素的子关系而生成的。然后为每个子关系中的值生成方差。选择具有较高方差的子关系，并且产生具有较高方差的这种子关系的异常值。

10. 发明授权

US07383253B1 Publish and subscribe capable continuous query processor for real-time data streams 有权
标题翻译：发布和订阅能力强的连续查询处理器实时数据流
公开(公告)号：US07383253B1
公开(公告)日：2008-06-03
申请号：US11015963
申请日：2004-12-17
申请人： Mark Tsimelzon , Aleksey Sanin , Rajeev Motwani , Glenn Robert Seidman , Gayatri Patel
发明人： Mark Tsimelzon , Aleksey Sanin , Rajeev Motwani , Glenn Robert Seidman , Gayatri Patel
IPC分类号： G06F17/30
CPC分类号： G06F17/30516 , Y10S707/918 , Y10S707/99933
摘要： A Continuous Query Processor processes queries on continuously updating data sources or data streams and includes a Publication Manager for accepting published structured elements of data from data stream Publishers, a Subscription Manager for giving structured elements of data to one or more data stream Subscribers, a Query Module Manager for processing queries represented by Query Modules, a Query Module Store for maintaining deployed Query Modules, a Query Primitive Manager performing processing for various primitives that comprise a Query Module, and a Schedule Manager for coordinating when a primitive within a Query Module gets processed in order to maintain that each continuous query is continuously updated immediately upon the arrival of structured element data affecting any part of a continuous query.
摘要翻译：连续查询处理器处理对持续更新数据源或数据流的查询，并包括一个出版管理器，用于接受数据流发布者发布的数据结构元素，订阅管理器，用于向一个或多个数据流订阅者提供数据结构化元素，查询用于处理由查询模块表示的查询的模块管理器，用于维护部署的查询模块的查询模块存储库，执行针对构成查询模块的各种图元的处理的查询基元管理器，以及用于在查询模块中的原语处理时进行协调的计划管理器以便保持在影响连续查询的任何部分的结构化元素数据到达时立即连续更新每个连续查询。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式