ParticleGPT is my running frame for treating particle collision events as structured sequences rather than flat tables. The core idea is to make collider events look enough like language that modern sequence modeling tools become useful, while preserving the physics constraints that make the generated samples meaningful.

The hard parts are not just model architecture. They are tokenization, event boundaries, starter prompts, validation, untokenization, and metrics that catch physically implausible generations. This post is a placeholder for the longer writeup I will use to document those design decisions.

I am especially interested in the space between generative-model evaluation and high-energy physics phenomenology: how to tell whether a model has learned useful event structure rather than simply memorizing marginal distributions.