Markov Decision Process Explorer

Visualize, simulate, and optimize Markov Decision Processes with advanced reinforcement learning algorithms

Load Preset Examples

Choose from pre-built MDP examples to get started quickly

🎓

Simple 3-State MDP

beginner

A basic 3-state Markov chain with deterministic transitions. Great for learning the basics.

3states

2actions

γ = 0.9

🎓

2x2 Grid World

beginner

A classic grid world problem where an agent navigates to reach a goal while avoiding obstacles.

4states

4actions

γ = 0.95

🔬

Gambler's Problem

intermediate

A classic reinforcement learning problem where a gambler tries to reach a target amount.

6states

5actions

γ = 0.9

🔬

Robot Navigation

intermediate

A robot navigating through a simple environment with obstacles and goals.

5states

4actions

γ = 0.8

MDP Configuration

Configure your Markov Decision Process using the interactive controls below

States

Define the possible states in your MDP

Actions

Define the available actions in your MDP

Discount Factor (γ)

Controls how much future rewards are valued relative to immediate rewards

0.95

Current value: 0.95 (High discount - future rewards matter more)

Transitions

Configure state transitions, probabilities, and rewards for each action

State: S0

Action: a

Total: 1.000

Next State

Probability

0.90

Reward

Next State

Probability

0.10

Reward

Action: b

Total: 1.000

Next State

Probability

1.00

Reward

State: S1

Action: a

Total: 1.000

Next State

Probability

1.00

Reward

Action: b

Total: 1.000

Next State

Probability

1.00

Reward

State: S2

Action: a

Total: 1.000

Next State

Probability

1.00

Reward

Action: b

Total: 1.000

Next State

Probability

1.00

Reward

Monte Carlo Simulation

Configure simulation parameters and run Monte Carlo analysis

Start State

Episodes

Max Steps per Episode

Histogram Bins

📊

Configure your MDP using the visual configurator above to see the graph.