Programming Ocean Academy

Reinforcement Learning (RL)

Learn the inner workings of RL in an interactive and visually appealing way.

Reinforcement Learning (RL) Explainer

How RL Works

  • Agent: The decision-maker that interacts with the environment by choosing actions based on its policy.
  • Environment: The system or world in which the agent operates. Provides observations and rewards based on the agent’s actions.
  • Observation: The agent observes the current state of the environment to gather information.
  • Actions: Based on observations, the agent selects an action from a set of possible actions to influence the environment.
  • Rewards: The environment provides feedback (rewards or penalties) to evaluate the effectiveness of the agent’s actions.
  • Policy: A strategy (\(\pi\)) that defines how the agent selects actions based on the current state or observation.
  • Objective: The agent learns to maximize the cumulative reward by improving its policy over time through trial and error.
  • Applications: - Robotics (e.g., navigation and control). - Game AI (e.g., mastering chess or video games). - Finance (e.g., portfolio optimization). - Autonomous systems (e.g., traffic control).

Deep Reinforcement Learning (DQN) Code Example

Here's how we can define the layers of a Deep Q-Network (DQN):


import tensorflow as tf
from tensorflow.keras import layers

def build_dqn(state_shape, num_actions):
    """
    Deep Q-Network (DQN) Model
    :param state_shape: Shape of the input state (e.g., (84, 84, 4) for Atari games)
    :param num_actions: Number of possible actions in the environment
    """
    model = tf.keras.Sequential([
        layers.Input(shape=state_shape),  # Input Layer (state)
        layers.Conv2D(32, kernel_size=8, strides=4, activation="relu"),  # Convolutional Layer 1
        layers.Conv2D(64, kernel_size=4, strides=2, activation="relu"),  # Convolutional Layer 2
        layers.Conv2D(64, kernel_size=3, strides=1, activation="relu"),  # Convolutional Layer 3
        layers.Flatten(),  # Flatten layer to convert 3D feature maps into 1D
        layers.Dense(512, activation="relu"),  # Fully Connected Layer
        layers.Dense(num_actions, activation="linear")  # Output Layer (Q-values for each action)
    ])
    return model

# Instantiate the DQN Model
state_shape = (84, 84, 4)  # Example: stacked 4 frames of an 84x84 grayscale input
num_actions = 6  # Example: 6 possible actions in the environment
dqn_model = build_dqn(state_shape, num_actions)