What are the required elements to solve an RL problem?
Let’s consider a problem where an agent can be in various states and can choose an action from a set of actions. Such type of problems are called Sequential Decision Problems. An mdpis the mathematical framework which captures such a fully observable, non-deterministic environment with Markovian Transition Model and additive rewards in which the agent acts. The solution to an MDP is an optimal policy which refers to the choice of action for every state that maximizes overall cumulative reward. Thus, the transition model that represents an agent’s environment(when the environment is known) and theoptimal policywhich decides what action the agent needs to perform in each state are required elements for training the agent learn a specific behavior.
Don't forget, when your helpful posts earn a kudos or get accepted as a solution you can unlock perks and badges. Those aren't the only badges, either. How many can you collect? Click here to learn more.
Community Help Hub
New to the forums or need help finding your way around the forums? There's a whole hub of community resources to help you.