# Category: Reinforcement Learning

## Temporal Difference Learning in Python

Temporal-Difference Learning (or TD Learning) is quite important and novel thing around. It's the first time where you can really see some patterns emerging and everything is building upon a previous knowledge. Hop in for some theory and Python code.

## Monte Carlo method in Python

In this post, we will explore our first reinforcement learning methods for estimating value. It's the first taste of real RL in this series. I bet you've heard the term Monte Carlo method before. Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. - ...

## Dynamic programming in Python

What is dynamic programming? Behind this strange and mysterious name hides pretty straightforward concept. Dynamic programming or DP, in short, is a collection of methods used calculate the optimal policies - solve the Bellman equations. Before you get any more hyped up there are severe limitations to it which makes DP use very limited. Here ...

## Finite Markov Decision Process a high-level introduction

I wanted to avoid making this post as there will be zero code. But as I assumed my series will be stand-alone I have to write it. So to move further I have to first establish a definition of Finite Markov Decision Process. It is a crucial assumption. Solving the problem of Finite Markov Decision ...

## Non stationary K-armed bandit problem in Python

Recently I described simple K-bandit problem and solution. I also did a little introduction to Reinforcement Learning problem. Today I am still going to focus on the same problem with a little bit more terminology and few different algorithms (or more like few different variants). I am not going to exhaust the topic as it's ...

## Reinforcement Learning Basics: Multi Armed Bandit Problem

Basics As I introduced very basic what Reinforcement Learning is in the series hub. There are 4 basic terms which are worth to know when reading around RL stuff, a policy, a reward signal, a value function andÂ environment model. I will skip the model as we will explore model-free learning for now. We will get ...