Tag: Artificial Intelligence Temporal Difference Learning in Python

Temporal-Difference Learning (or TD Learning) is quite important and novel thing around. It's the first time where you can really see some patterns emerging and everything is building upon a previous knowledge. Hop in for some theory and Python code. Monte Carlo method in Python

In this post, we will explore our first reinforcement learning methods for estimating value. It's the first taste of real RL in this series. I bet you've heard the term Monte Carlo method before. Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. - ... Dynamic programming in Python

What is dynamic programming? Behind this strange and mysterious name hides pretty straightforward concept. Dynamic programming or DP, in short, is a collection of methods used calculate the optimal policies - solve the Bellman equations. Before you get any more hyped up there are severe limitations to it which makes DP use very limited. Here ... Finite Markov Decision Process a high-level introduction

I wanted to avoid making this post as there will be zero code. But as I assumed my series will be stand-alone I have to write it. So to move further I have to first establish a definition of Finite Markov Decision Process. It is a crucial assumption. Solving the problem of Finite Markov Decision ... Non stationary K-armed bandit problem in Python

Recently I described simple K-bandit problem and solution. I also did a little introduction to Reinforcement Learning problem. Today I am still going to focus on the same problem with a little bit more terminology and few different algorithms (or more like few different variants). I am not going to exhaust the topic as it's ... Reinforcement Learning Basics: Multi Armed Bandit Problem

Basics As I introduced very basic what Reinforcement Learning is in the series hub. There are 4 basic terms which are worth to know when reading around RL stuff, a policy, a reward signal, a value function and environment model. I will skip the model as we will explore model-free learning for now. We will get ...