The basic reinforcement learning model consists of: a set of environment states ; a set of actions ; rules of transitioning between states; rules that determine the scalar immediate reward of a transition; and rules that describe what the agent observes. The rules are often stochastic. The observation typically involves the scalar immediate reward associated with the last transition. In many works, the agent is also assumed to observe the current environmental state, in which case we talk about full observability, whereas in the opposing case we talk about partial observability. Sometimes the set of actions available to the agent is restricted (e.g., you cannot spend more money than what you possess). A reinforcement learning agent interacts with its environment in discrete time steps. At each time , the agent receives an observation , which typically includes the reward . It then chooses an action from the set of actions available, which is subsequently sent to the environment. The environment moves to a new state and the reward associated with the transition is determined. The goal of a reinforcement learning agent is to collect as much reward as possible. The agent can choose any action as a function of the history and it can even randomize its action selection.
I became convinced - and I remain so even today - that one can achieve universality, not through religion, not through emotions or tradition, but through the sciences.
Through a scientific way of thinking.
But even with that, one can get nowhere without general ideas, points of departure. Scientific thought is only a means through which to realize my ideas, which are not of scientific origin. These ideas are born of intuition, some kind of vision. None of this was clear for me then but I worked instinctively in this direction" — Iannis Xenakis, in Balint Varga, “Conversations with Iannis Xenakis”