专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US11625604B2 Reinforcement learning using distributed prioritized replay 有权
公开(公告)号：US11625604B2
公开(公告)日：2023-04-11
申请号：US16641751
申请日：2018-10-29
申请人： DeepMind Technologies Limited
发明人： David Budden , Gabriel Barth-Maron , John Quan , Daniel George Horgan
IPC分类号： G06N3/08 , G06N3/04 , G06N20/00 , G06N3/088
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. One of the systems includes (i) a plurality of actor computing units, in which each of the actor computing units is configured to maintain a respective replica of the action selection neural network and to perform a plurality of actor operations, and (ii) one or more learner computing units, in which each of the one or more learner computing units is configured to perform a plurality of learner operations.

2. 发明申请

US20230101930A1 GENERATING IMPLICIT PLANS FOR ACCOMPLISHING GOALS IN AN ENVIRONMENT USING ATTENTION OPERATIONS OVER PLANNING EMBEDDINGS 有权
公开(公告)号：US20230101930A1
公开(公告)日：2023-03-30
申请号：US17794780
申请日：2021-02-08
申请人： DeepMind Technologies Limited
发明人： Samuel Ritter , Ryan Faulkner , David Nunes Raposo
IPC分类号： G06N3/04 , G06N3/08
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment to accomplish a goal. In one aspect, a method comprises: generating a respective planning embedding corresponding to each of multiple experience tuples in an external memory, wherein each experience tuple characterizes interaction of the agent with the environment at a respective previous time step; processing the planning embeddings using a planning neural network to generate an implicit plan for accomplishing the goal; and selecting the action to be performed by the agent at the time step using the implicit plan.

3. 发明申请

US20230084700A1 TRAINING ACTION SELECTION NEURAL NETWORKS USING LOOK-AHEAD SEARCH 有权
公开(公告)号：US20230084700A1
公开(公告)日：2023-03-16
申请号：US17948016
申请日：2022-09-19
申请人： DeepMind Technologies Limited
发明人： Karen Simonyan , David Silver , Julian Schrittwieser
IPC分类号： G06N3/08 , G06N7/00
摘要： Methods, systems and apparatus, including computer programs encoded on computer storage media, for training an action selection neural network. One of the methods includes receiving an observation characterizing a current state of the environment; determining a target network output for the observation by performing a look ahead search of possible future states of the environment starting from the current state until the environment reaches a possible future state that satisfies one or more termination criteria, wherein the look ahead search is guided by the neural network in accordance with current values of the network parameters; selecting an action to be performed by the agent in response to the observation using the target network output generated by performing the look ahead search; and storing, in an exploration history data store, the target network output in association with the observation for use in updating the current values of the network parameters.

4. 发明申请

US20230083486A1 LEARNING ENVIRONMENT REPRESENTATIONS FOR AGENT CONTROL USING PREDICTIONS OF BOOTSTRAPPED LATENTS 有权
公开(公告)号：US20230083486A1
公开(公告)日：2023-03-16
申请号：US17797886
申请日：2021-02-08
申请人： DeepMind Technologies Limited
发明人： Zhaohan Guo , Mohammad Gheshlaghi Azar , Bernardo Avila Pires , Florent Altché , Jean-Bastien François Laurent Grill , Bilal Piot , Remi Munos
IPC分类号： G06N3/08 , G06N3/04
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an environment representation neural network of a reinforcement learning system controls an agent to perform a given task. In one aspect, the method includes: receiving a current observation input and a future observation input; generating, from the future observation input, a future latent representation of the future state of the environment; processing, using the environment representation neural network, to generate a current internal representation of the current state of the environment; generating, from the current internal representation, a predicted future latent representation; evaluating an objective function measuring a difference between the future latent representation and the predicted future latent representation; and determining, based on a determined gradient of the objective function, an update to the current values of the environment representation parameters.

5. 发明申请

US20230082326A1 TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS 有权
公开(公告)号：US20230082326A1
公开(公告)日：2023-03-16
申请号：US17797203
申请日：2021-02-08
申请人： DeepMind Technologies Limited
发明人： Abbas Abdolmaleki , Sandy Han Huang
IPC分类号： G06N3/08
摘要： There is provided a method for training a neural network system by reinforcement learning, the neural network system being configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy that aims to satisfy a plurality of objectives. The method comprises obtaining a set of one or more trajectories. Each trajectory comprises a state of an environment, an action applied by the agent to the environment according to a previous policy in response to the state, and a set of rewards for the action, each reward relating to a corresponding objective of the plurality of objectives. The method further comprises determining an action-value function for each of the plurality of objectives based on the set of one or more trajectories. Each action-value function determines an action value representing an estimated return according to the corresponding objective that would result from the agent performing a given action in response to a given state according to the previous policy. The method further comprises determining an updated policy based on a combination of the action-value functions for the plurality of objectives.

6. 发明申请

US20230073326A1 PLANNING FOR AGENT CONTROL USING LEARNED HIDDEN STATES 有权
公开(公告)号：US20230073326A1
公开(公告)日：2023-03-09
申请号：US17794797
申请日：2021-01-28
申请人： DeepMind Technologies Limited
发明人： Julian Schrittwieser , Ioannis Antonoglou , Thomas Keisuke Hubert
IPC分类号： G06N7/00 , G06N5/00 , G06K9/62
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions to be performed by an agent interacting with an environment to cause the agent to perform a task. One of the methods includes: receiving a current observation characterizing a current environment state of the environment; performing a plurality of planning iterations to generate plan data that indicates a respective value to performing the task of the agent performing each of the set of actions in the environment and starting from the current environment state, wherein performing each planning iteration comprises selecting a sequence of actions to be performed by the agent starting from the current environment state based on outputs generated by a dynamics model and a prediction model; and selecting, from the set of actions, an action to be performed by the agent in response to the current observation based on the plan data.

7. 发明申请

US20220366245A1 TRAINING ACTION SELECTION NEURAL NETWORKS USING HINDSIGHT MODELLING 有权
公开(公告)号：US20220366245A1
公开(公告)日：2022-11-17
申请号：US17763901
申请日：2020-09-23
申请人： DeepMind Technologies Limited
发明人： Arthur Clement Guez , Fabio Viola , Theophane Guillaume Weber , Lars Buesing , Nicolas Manfred Otto Heess
IPC分类号： G06N3/08
摘要： A reinforcement learning method and system that selects actions to be performed by a reinforcement learning agent interacting with an environment. A causal model is implemented by a hindsight model neural network and trained using hindsight i.e. using future environment state trajectories. As the method and system does not have access to this future information when selecting an action, the hindsight model neural network is used to train a model neural network which is conditioned on data from current observations, which learns to predict an output of the hindsight model neural network.

8. 发明申请

US20220343157A1 ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION 有权
公开(公告)号：US20220343157A1
公开(公告)日：2022-10-27
申请号：US17620164
申请日：2020-06-17
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： Daniel J. Mankowitz , Nir Levine , Rae Chan Jeong , Abbas Abdolmaleki , Jost Tobias Springenberg , Todd Andrew Hester , Timothy Arthur Mann , Martin Riedmiller
IPC分类号： G06N3/08 , G06N3/04
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.

9. 发明申请

US20220326663A1 EXPLORATION USING HYPER-MODELS 有权
公开(公告)号：US20220326663A1
公开(公告)日：2022-10-13
申请号：US17639504
申请日：2020-09-25
申请人： DeepMind Technologies Limited
发明人： Benjamin Van Roy , Xiuyuan Lu , Vikranth Reddy Dwaracherla , Zheng Wen , Morteza Ibrahimi , Ian David Moffat Osband
IPC分类号： G05B13/02 , G05B13/04
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling one or more index variables from a continuous space of possible index variables in accordance with a probability distribution over the continuous space; for each index variable: processing the index variable using a hypermodel, in accordance with values of a plurality of parameters of the hypermodel, to generate an output that specifies values of a plurality of parameters of an environment model; and generating an action selection output using the environment model in accordance with the values of the plurality of parameters of the environment model that are specified by the hypermodel output for the index variable; and selecting the action to be performed by the agent at the time step using the one or more action selection outputs for the one or more index variables.

10. 发明授权

US11462034B2 Generating images using neural networks 有权
公开(公告)号：US11462034B2
公开(公告)日：2022-10-04
申请号：US17198096
申请日：2021-03-10
申请人： DeepMind Technologies Limited
发明人： Aaron Gerard Antonius van den Oord , Nal Emmerich Kalchbrenner , Karen Simonyan
IPC分类号： G06V30/194 , G06N3/04 , G06N3/08 , H04N19/52 , H04N19/50 , G06V10/56 , H04N19/186 , H04N19/172 , H04N19/182 , G06K9/62
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating images using neural networks. One of the methods includes generating the output image pixel by pixel from a sequence of pixels taken from the output image, comprising, for each pixel in the output image, generating a respective score distribution over a discrete set of possible color values for each of the plurality of color channels.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式