Bachelor Thesis: Automated ELM Detection with RNNs (ASDEX Upgrade)

August 2023

MLOps

PyTorch

Recurrent Neural Networks

Python

Data Science

Plasma Physics

ASDEX Upgrade

This bachelor thesis designed and evaluated a recurrent neural network (RNN) for **automatic detection of Edge Localized Modes (ELMs)** in the ASDEX Upgrade tokamak. The approach uses the divertor plate current (measured particle/heat flux to the divertor) as a single high-fidelity diagnostic and converts expert ELM start/end labels into a binary time series suitable for supervised learning. A lightweight, single‑layer RNN (100 units, ReLU) trained with cross‑entropy loss reaches ~95% ELM detection on unseen discharges with start/end timing errors of ≈0.1 ms (typically ≤10% of the ELM duration). A simple probability‑based post‑processor turns patchy timestep predictions into contiguous ELM windows, enabling robust start/end timestamp extraction at near‑real‑time rates for batch analysis.

What are Edge Localized Modes (ELMs)?

In H‑mode, steep edge pressure gradients can trigger edge‑localized instabilities (ELMs) that eject bursts of particles and heat to plasma‑facing components. These bursts degrade confinement and can damage hardware in future reactors. Detecting and timing ELMs reliably is key for control, protection, and physics studies.

Manual & Rule‑based Detection—Why Automate?

Conventional workflows rely on manual inspection or a rule‑based peak‑detection algorithm that requires human‑tuned thresholds per shot. That makes large‑scale analysis slow and inconsistent and is not ideal for real‑time pipelines. This work replaces manual cutoff tuning with a learned detector and lightweight post‑processing.

Data & Preprocessing

Input: divertor plate current from selected ASDEX Upgrade discharges. Labels: expert start/end times converted to a binary sequence (1 during an ELM, 0 otherwise). To reduce noise and runtime, the signal is compressed with a reduction coefficient rc (rc = 8 used for training), averaging consecutive samples and retaining mid‑window timestamps.

Model & Training

Model: single‑layer RNN with 100 hidden units and ReLU, linear readout to two classes. Loss: cross‑entropy. Optimization: trained in several sequences to avoid numerical instabilities (occasional NaNs) and allow learning‑rate decay. The best model reached a stable loss ≈ 0.035 after ~3.4k epochs (aggregated across sequences).

Post‑processing

Softmax probabilities are thresholded and clustered to repair short misclassifications inside an ELM. Tuned defaults: confidence ≥ 0.60, min cluster width ≥ 10 timesteps, max gap between positive timesteps ≤ 12; thresholds scale with rc. The output is a set of contiguous ELM windows with start/end timestamps.

Results

On a held‑out test set of AU shots (e.g., #37472, #34245, #30729, #40303), the RNN correctly identified ~95% of ELMs. Start/end timing error is ~0.1 ms (typically ≤10% of the mean ELM duration). Performance is robust for rc in ~6–12 and best at rc = 8. For single‑shot runs, a reference peak‑detector is faster (~35 s/shot), but for batch runs the RNN scales efficiently and becomes advantageous.

Impact

The detector enables fast, consistent ELM timing across large discharge sets and is suited as a first‑pass analysis. It provides a foundation for real‑time monitoring and future extensions (e.g., multi‑diagnostic input or ELM forecasting).

Acknowledgments

Supervision: Dr. Matthias Willensdorfer. Institutions: TU Wien (Institute of Applied Physics) and Max‑Planck‑Institut für Plasmaphysik, Garching. Thesis date: 2023‑08‑23 (Vienna).

Key Features

1Single‑diagnostic pipeline: divertor plate current as input
2RNN architecture: single layer, 100 hidden units, ReLU activation
3Label engineering: start/end timestamps → binary time series
4Time-series smoothing via reduction coefficient (rc = 8) for speed
5Probability‑threshold post‑processing to recover contiguous ELM windows
6Benchmark vs. internal peak‑detection tool (manual cutoff required)

Challenges

•High sampling rate and noisy experimental signals → required smoothing
•Prediction ‘holes’ within ELMs → needed custom post‑processing
•Training stability (NaN spikes) → trained in sequences with LR scheduling
•Limited, heterogeneous shots and partial diagnostics → careful shot selection
•Balancing timing resolution vs. runtime with compression (rc)

Learnings

•Designing compact RNNs for long 1‑D time series
•Converting sparse event labels into learnable sequences
•Post‑processing ML probabilities into physics‑meaningful events
•Hyperparameter tuning (hidden size, rc, thresholds) vs. generalization
•Practical training hygiene for stability (sequenced training, LR decay)