会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 5. 发明授权
    • Database aggregation query result estimator
    • 数据库聚合查询结果估计器
    • US07191181B2
    • 2007-03-13
    • US10873569
    • 2004-06-22
    • Sarajit ChaudhuriVivek R. NarasayyaRajeev MotwaniMayur D. Datar
    • Sarajit ChaudhuriVivek R. NarasayyaRajeev MotwaniMayur D. Datar
    • G06F17/30
    • G06F17/30489G06F17/30536G06F2216/03Y10S707/957Y10S707/99932Y10S707/99933Y10S707/99935Y10S707/99942Y10S707/99943
    • Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.
    • 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后以许多已知方式之一对剩余的没有异常值的数据进行采样,以提供统计学相关的样本,然后进行聚合和外推,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。 进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。
    • 6. 发明授权
    • Continuous processing language for real-time data streams
    • 用于实时数据流的连续处理语言
    • US08396886B1
    • 2013-03-12
    • US11346119
    • 2006-02-02
    • Mark TsimelzonAleksey SaninRajeev MotwaniGlenn Robert SeidmanGayatri Patel
    • Mark TsimelzonAleksey SaninRajeev MotwaniGlenn Robert SeidmanGayatri Patel
    • G06F7/00G06F17/30
    • G06F17/30533G06F17/30516
    • A computer software language capable of expressing registered queries that operate on one more or more data streams continuously. The language of the present invention is based on a publish/subscribe model in that queries subscribe to data streams and publish to data streams. Also, the language of the present invention can express queries that operate directly on data streams. Since queries expressed in the language of the present invention may be executed continuously and directly on data streams, the language includes a clause for specifying time-based and/or row-based windows for the input data stream. Operations are then performed on the data within such windows. In one embodiment, the language is also SQL-like and includes a clause for defining named windows (which can be used in any number of queries); a clause for detecting a pattern, and correlated database subqueries for correlating data stream data with database tables.
    • 一种能够连续地表达对一个或多个数据流进行操作的注册查询的计算机软件语言。 本发明的语言基于发布/订阅模型,其中查询订阅数据流并发布到数据流。 此外,本发明的语言可以表达直接对数据流进行操作的查询。 由于以本发明的语言表示的查询可以连续且直接地在数据流上执行,所以该语言包括用于为输入数据流指定基于时间的和/或基于行的窗口的子句。 然后对这些窗口内的数据执行操作。 在一个实施例中,语言也是类似SQL的,并且包括用于定义命名窗口(可以在任意数量的查询中使用)的子句; 用于检测模式的子句,以及用于将数据流数据与数据库表相关联的数据库子查询。
    • 7. 发明授权
    • Efficient fuzzy match for evaluating data records
    • 用于评估数据记录的高效模糊匹配
    • US07296011B2
    • 2007-11-13
    • US10600083
    • 2003-06-20
    • Surajit ChaudhuriKris GanjamVenkatesh GantiRajeev Motwani
    • Surajit ChaudhuriKris GanjamVenkatesh GantiRajeev Motwani
    • G06F7/00G06F17/30
    • G06F17/30542G06F17/30303Y10S707/99933
    • To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
    • 为了帮助确保高数据质量,数据仓库验证和清理,如果需要外部来源的传入数据元组。 在许多情况下,输入元组或输入元组的一部分必须匹配参考表中可接受的元组。 例如,分销商的销售记录中的产品名称和描述字段必须与产品参考关系中的预先记录的名称和描述字段相匹配。 所公开的系统实现有效和准确的近似或模糊匹配操作,其可以有效地清除传入元组,如果它不能与参考关系中的任何多个元组完全匹配。 使用称为q-gram的令牌子串的公开的相似度函数克服了现有技术相似度功能的限制,同时有效地执行模糊匹配过程。