专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US11620986B2 Cold fusing sequence-to-sequence models with language models 有权
公开(公告)号：US11620986B2
公开(公告)日：2023-04-04
申请号：US17061455
申请日：2020-10-01
申请人： Baidu USA, LLC
发明人： Anuroop Sriram , Heewoo Jun , Sanjeev Satheesh , Adam Coates
IPC分类号： G10L15/06 , G06N3/08 , G10L15/183 , G06N3/04 , G06N3/088 , G10L15/16
摘要： Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

2. 发明申请

US20160171974A1 SYSTEMS AND METHODS FOR SPEECH TRANSCRIPTION 审中-公开
标题翻译：用于语音转录的系统和方法
公开(公告)号：US20160171974A1
公开(公告)日：2016-06-16
申请号：US14735002
申请日：2015-06-09
申请人： BAIDU USA LLC
发明人： Awni Hannun , Carl Case , Jared Casper , Bryan Catanzaro , Gregory Diamos , Erich Elsen , Ryan Prenger , Sanjeev Satheesh , Shubhabrata Sengupta , Adam Coates , Andrew Y. Ng
IPC分类号： G10L15/06 , G10L15/16 , G10L15/26
摘要： Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.
摘要翻译：这里提出的是使用端对端深度学习开发的最先进的语音识别系统的实施例。在实施例中，模型架构比传统的语音系统要简单得多，传统的语音系统依赖于经过精心设计的处理流水线; 当在嘈杂的环境中使用时，这些传统系统也往往表现不佳。相比之下，系统的实施例不需要手工设计的组件来建模背景噪声，混响或者说话者的变化，而是直接学习对这种效果是鲁棒的功能。音素字典，甚至是“音素”的概念都是必需的。实施例包括可以使用多个GPU的良好优化的循环神经网络（RNN）训练系统，以及一组新颖的数据合成技术，其允许有效获得用于训练的大量变化的数据。该系统的实施例也可以比广泛使用的最先进的商业语音系统更好地处理具有挑战性的嘈杂环境。

3. 发明授权

US10657955B2 Systems and methods for principled bias reduction in production speech models 有权
公开(公告)号：US10657955B2
公开(公告)日：2020-05-19
申请号：US15884239
申请日：2018-01-30
申请人： Baidu USA, LLC
发明人： Eric Battenberg , Rewon Child , Adam Coates , Christopher Fougner , Yashesh Gaur , Jiaji Huang , Heewoo Jun , Ajay Kannan , Markus Kliegl , Atul Kumar , Hairong Liu , Vinay Rao , Sanjeev Satheesh , David Seetapun , Anuroop Sriram , Zhenyao Zhu
IPC分类号： G10L17/18 , G10L15/16 , G10L15/04 , G10L15/22 , G10L15/02 , G10L25/18
摘要： Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

4. 发明授权

US10540961B2 Convolutional recurrent neural networks for small-footprint keyword spotting 有权
公开(公告)号：US10540961B2
公开(公告)日：2020-01-21
申请号：US15688221
申请日：2017-08-28
申请人： Baidu USA, LLC
发明人： Sercan Arik , Markus Kliegl , Rewon Child , Joel Hestness , Andrew Gibiansky , Christopher Fougner , Ryan Prenger , Adam Coates
IPC分类号： G10L15/16 , G06F3/16 , G10L15/18 , G10L21/0208 , G06N3/04 , G06N3/08
摘要： Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.

5. 发明授权

US11562733B2 Deep learning models for speech recognition 有权
公开(公告)号：US11562733B2
公开(公告)日：2023-01-24
申请号：US16542243
申请日：2019-08-15
申请人： BAIDU USA LLC
发明人： Awni Hannun , Carl Case , Jared Casper , Bryan Catanzaro , Gregory Diamos , Erich Eisen , Ryan Prenger , Sanjeev Satheesh , Shubhabrata Sengupta , Adam Coates , Andrew Ng
IPC分类号： G10L15/06 , G10L15/26 , G10L15/16 , G06N3/04 , G06N3/08
摘要： Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. Neither a phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

6. 发明授权

US10540957B2 Systems and methods for speech transcription 有权
公开(公告)号：US10540957B2
公开(公告)日：2020-01-21
申请号：US14735002
申请日：2015-06-09
申请人： BAIDU USA LLC
发明人： Awni Hannun , Carl Case , Jared Casper , Bryan Catanzaro , Gregory Diamos , Erich Elsen , Ryan Prenger , Sanjeev Satheesh , Shubhabrata Sengupta , Adam Coates , Andrew Y. Ng
IPC分类号： G06N3/04 , G10L15/16 , G06N3/08 , G10L15/06 , G10L15/26
摘要： Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

7. 发明申请

US20180261213A1 CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING 审中-公开
公开(公告)号：US20180261213A1
公开(公告)日：2018-09-13
申请号：US15688221
申请日：2017-08-28
申请人： Baidu USA, LLC
发明人： Sercan Arik , Markus Kliegl , Rewon Child , Joel Hestness , Andrew Gibiansky , Christopher Fougner , Ryan Prenger , Adam Coates
IPC分类号： G10L15/16 , G06N3/04
CPC分类号： G10L15/16 , G06F3/16 , G06N3/0445 , G06N3/0454 , G06N3/049 , G06N3/08 , G06N7/005 , G10L15/063 , G10L15/18 , G10L21/0208 , G10L2015/088
摘要： Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式