会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Altering audio to improve automatic speech recognition
    • 改变音频以改善自动语音识别
    • US09251787B1
    • 2016-02-02
    • US13627890
    • 2012-09-26
    • Gregory M. HartWilliam Spencer Worley, III
    • Gregory M. HartWilliam Spencer Worley, III
    • H03G3/20G10L15/20
    • G10L15/22G10L15/20G10L15/265G10L17/005G10L2015/223G11B27/005H03G3/32H03G5/02H04R3/12
    • Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user's subsequent command.
    • 用于改变由语音控制设备或另一设备输出的音频的技术,以便由语音控制设备实现更准确的自动语音识别(ASR)。 例如,语音控制设备可以使用设备的扬声器在环境中输出音频。 在输出音频的同时,设备的麦克风可以捕获环境内的声音并且可以基于所捕获的声音产生音频信号。 然后,设备可以分析音频信号以识别信号内的用户的语音,语音指示用户将要向设备提供后续命令。 此后,设备可以改变音频的输出(例如,衰减音频,暂停音频,从立体声切换到单声道等),以便于用户的后续命令的语音识别。
    • 3. 发明授权
    • Audio signal transmission techniques
    • 音频信号传输技术
    • US09111542B1
    • 2015-08-18
    • US13430407
    • 2012-03-26
    • Gregory M. HartJeffrey P. Bezos
    • Gregory M. HartJeffrey P. Bezos
    • G10L21/00G10L19/00
    • G10L15/20G10L15/30G10L19/00G10L21/00G10L21/0216G10L21/0272G10L2021/02082G10L2021/02166
    • A voice interaction architecture that compiles multiple audio signals captured at different locations within an environment, determines a time offset between a primary audio signal and other captured audio signals and identifies differences between the primary signal and the other signal(s). Thereafter, the architecture may provide the primary audio signal, an indication of the determined time offset(s) and the identified differences to remote computing resources for further processing. For instance, the architecture may send this information to a network-accessible distributed computing platform that performs beamforming and/or automatic speech recognition (ASR) on the received audio. The distributed computing platform may in turn determine a response to provide based upon the beamforming and/or ASR.
    • 编码在环境内的不同位置处捕获的多个音频信号的语音交互架构确定主音频信号和其它捕获的音频信号之间的时间偏移,并识别​​主信号与其它信号之间的差异。 此后,架构可以提供主音频信号,所确定的时间偏移的指示和所识别的与远程计算资源的差异以进一步处理。 例如,架构可以将该信息发送到在所接收的音频上执行波束成形和/或自动语音识别(ASR)的网络可访问的分布式计算平台。 分布式计算平台可以依次确定基于波束形成和/或ASR提供的响应。
    • 4. 发明授权
    • Speech-inclusive device interfaces
    • 包含语音的设备接口
    • US08700392B1
    • 2014-04-15
    • US12879981
    • 2010-09-10
    • Gregory M. HartIan W. FreedGregg Elliott ZehrJeffrey P. Bezos
    • Gregory M. HartIan W. FreedGregg Elliott ZehrJeffrey P. Bezos
    • G10L15/00
    • G10L15/25
    • A user can provide input to a computing device through various combinations of speech, movement, and/or gestures. A computing device can analyze captured audio data and analyze that data to determine any speech information in the audio data. The computing device can simultaneously capture image or video information which can be used to assist in analyzing the audio information. For example, image information is utilized by the device to determine when someone is speaking, and the movement of the person's lips can be analyzed to assist in determining the words that were spoken. Any gestures or other motions can assist in the determination as well. By combining various types of data to determine user input, the accuracy of a process such as speech recognition can be improved, and the need for lengthy application training processes can be avoided.
    • 用户可以通过语音,移动和/或手势的各种组合来向计算设备提供输入。 计算设备可以分析所捕获的音频数据并分析该数据以确定音频数据中的任何语音信息。 计算设备可以同时捕获可用于帮助分析音频信息的图像或视频信息。 例如,设备利用图像信息来确定某人何时在说话,并且可以分析人的嘴唇的移动以帮助确定所说的话。 任何手势或其他动作也可以帮助确定。 通过组合各种类型的数据来确定用户输入,可以提高诸如语音识别之类的处理的准确性,并且可以避免对漫长的应用程序训练过程的需要。