About MELGYM

../_images/logo.png

MELGYM is a Python tool designed to facilitate interactive control over MELCOR 1.8.6 simulations.

Every control feature in MELCOR is governed by Control Functions (CFs). However, MELCOR’s batch execution mode makes it difficult to interactively control and modify these functions under user-defined conditions. Control conditions must be defined a priori and often require chaining several CFs in a cumbersome and unintuitive way.

MELGYM enables the use of external, user-defined controllers—such as reinforcement learning agents or any other custom control algorithm.

../_images/mdp-simp.png

Warning

The MELGEN and MELCOR executables are NOT INCLUDED in this package, as they are not freely distributable. MELGYM requires both executables, which must be located at the following paths: melgym/exec/MELGEN and melgym/exec/MELCOR.

Note

Access to MELGEN/MELCOR executables can be requested via the Sandia National Laboratories website.

How MELGYM works

MELGYM’s control system is based primarily on MELCOR’s restart capabilities. The restart mechanism in MELCOR involves:

  1. Dumping the current simulation state into a restart file generated by MELGEN (i.e., creating a simulation checkpoint).

  2. Performing a warm start from the last recorded state.

../_images/melcor.png

The MELGEN/MELCOR execution process

These two steps are performed periodically, according to the restart frequency specified in the MELCOR input configuration.

MELGYM introduces intermediate steps between restart dumps and warm starts, during which control functions or other user-defined components can be modified.

This allows simulations to be discretised into small time intervals, inserting intermediate steps where the model is continuously adapted.

../_images/timeline.png

MELGYM introduces additional restart steps to enable fine-grained control over MELCOR elements

Note

The elements that MELGYM can modify depend on the overwrite capabilities of the MELCOR input. See the MELCOR Code Manual for more information.

Thus, MELGYM modifies the input model every few simulation cycles. After each warm start, the model resumes from the last saved state and continues under the new configuration defined by the external controller.

../_images/restart.png

Control actions modify the MELCOR model just before each warm start

The following image summarizes the main functions and the low-level agent-environment interaction.

../_images/functions.png

Low-level diagram of the agent–environment interaction loop

Environments

Every MELGYM environment inherits from the MelcorEnv class. To function properly, users should consider the following:

  • The CFs to be controlled must be specified when creating or registering the environment.

  • EDF records to be used as observations must be defined in the MELCOR input file.

  • Environment-specific parameters, such as the action space, must be set in the environment constructor.

  • Users must define the reward function and the termination/truncation conditions, as they are specific to each environment.

  • The EDF dump and restart frequencies must be consistent with the control_horizon, and properly defined in the MELCOR input file.

Tip

Refer to API reference for a detailed definition of the MelcorEnv class and its methods.

Reinforcement learning integration

Formulation

Reinforcement Learning (RL) algorithms are particularly useful for learning control policies through interaction between an agent and a simulated environment.

In RL, an agent interacts with a dynamic process—called the environment—over a discrete sequence of time steps \(\mathcal{T} = \{0,1,2,...\}\). In MELGYM, the environment corresponds to a MELCOR simulation with which an agent interacts.

../_images/mdp-melgym.png

The Partially Observable Markov Decision Process implemented in MELGYM

At each time step \(t\), the agent receives an observation \(o_t \in \mathcal{O}\), representing a subset of variables that define the current state \(s_t \in \mathcal{S}\). Based on this observation, it selects an action \(a_t \in \mathcal{A}\), which modifies the MELCOR model. The simulation then proceeds, generating a new state \(s_{t+1}\) and a reward \(r_t \in \mathbb{R}\) that evaluates the outcome of the transition, thereby guiding the learning process.

The agent follows a policy function \(\pi\), such that \(a_t \sim \pi(\cdot|s_t)\). The goal is to find an optimal policy \(\pi^*\) that maximizes expected cumulative reward: \(G_t = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1}\), where \(\gamma \in [0,1]\) is a discount factor weighting future rewards.

When RL is combined with neural networks, it yields deep reinforcement learning (DRL). These methods employ parameterized policies \(\pi_\theta\)—such as neural networks with weights \(\theta\)—and use gradient-based optimization to approximate the optimal policy.

RL control in MELGYM

MELGYM enables the integration of DRL-based controllers into MELCOR simulations. This is achieved via the intermediate restarts described earlier.

../_images/mdp.png

Agent-environment interaction loop

EDF outputs from MELCOR are parsed into observations used by the agent to determine the next control action and calculate the associated reward. Actions involve modifying control elements permitted by the MELGYM environment. The set of observed variables, available actions, and the reward and termination conditions are defined by each specific environment.

Note

MELGYM adheres to the standard Gymnasium interface, and its environments implement typical methods such as reset, step, and render. The agent implementation depends on user preferences. The Stable-Baselines3 library is a well-tested option that is highly recommended.

Tip

After this introduction, head to section Examples for a practical guide on using a DRL controller with MELGYM.