# Reinforcement Learning

## Why I am writing about it 🤔?

When looking for another subject to learn, my ideas went around various AI problems. I wrote something about Deep Neural Networks in Swift (here and here) last year and I want to continue doing so. I made up myself learning plan which was structured like Deep Learning >> Artificial Intelligence basics >> Reinforcement Learning. But while waiting on the 5th part of Andrew’s Ng Deep Learning MOOC I found about the Reinforced Learning: An Introduction (which I might refer as *the book* in those posts) and got hooked. What this series will be my journey through this very exciting area.

Wha is Reinforced Learning? As it’s already 2018, and last year we had such amazing accomplishments in the area like DeepMind’s AlphaGo or OpenAI’s Dota 2. Which got me very, very excited (and worried but mostly the first 😉). I mean both games are very controlled environments but(!) still are pretty complex problems. If such algorithm can play a game it can do much more. I know there are skeptics but I believe it could be very useful as of now. There is just not enough of people besides the academics trying and imagining. The best definition on RL yet I found in Sutton’s book:

Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them.

## Index

This page is a directory and a hub for all the posts on the topic. I will reference it every now and then. I will try to do both `Swift`

and `Python`

version but no promises when we reach some harder topics down the road. `Python`

ecosystem and community is more into the subject 🐍. First few posts will be journey through *the book*. Further down the road with experience I will get ideas to do something fun and different.

- K-armed bandit and basics – covered: policy, reward, value function, exploration, exploitation, greedy, k-armed bandit algorithm
- Non-stationary k-armed bandit problem – covered: optimistic, realistic, weighted, UCB, greedy, non-stationary problem, stationary problem
- Finite Markov Decision Process – Finite Markov Decision Process, Reinforcement Learning basics