Case study

Rubin Variable Star Workflow

A reproducible, end-to-end pipeline that transforms raw Rubin catalog data into a ranked shortlist of candidate variable stars using engineered variability metrics and an interpretable composite index.

Outcomes

Confusion matrix for variable star classification showing strong performance on LPVs and RR Lyrae
Confusion matrix for the initial machine learning classifier trained on the first photometric data release. The model performs strongly on LPVs and RR Lyrae, which comprise over 95% of the available sample, demonstrating robustness to class imbalance and noisy observations.

Problem

Time-domain astronomical catalogs are large, noisy, and difficult to translate into concrete decisions. The goal was to design a transparent workflow that converts catalog-level variability signals into a ranked candidate list suitable for inspection, comparison, and follow-up.

Approach

  • Performed EDA on large-scale observational data to separate meaningful variability signal from noise.
  • Engineered complementary variability metrics and standardized them to enable consistent comparison.
  • Combined metrics into an interpretable composite index to support ranking and prioritization.
  • Implemented a top-N selection step that produces inspectable outputs (tables and saved artifacts).
  • Packaged the workflow as a runnable demo with documented assumptions and clear scope boundaries.

Why this matters

Rather than stopping at exploratory analysis, this project produces decision-ready outputs: a ranked shortlist with inspectable artifacts and documented assumptions. This pattern transparent scoring, prioritization, and repeatability translates directly to strategy analytics, operations, and decision science contexts.