Case study

Rubin Variable Star Workflow

A decision pipeline that turns large, noisy Rubin Observatory catalog data into a prioritized shortlist of candidate variable stars. The workflow was designed to support ranking, review, and follow-up decisions through transparent scoring, repeatable analysis, and interpretable model outputs.

Outcomes

Confusion matrix for variable star classification showing strong performance on LPVs and RR Lyrae
Confusion matrix for the initial classification model. Beyond model performance, the key value of this step was operational: creating a repeatable way to triage large volumes of candidates and focus review effort where it is most useful.

Problem

Large astronomical catalogs create a prioritization problem. Researchers cannot manually inspect every object with equal attention, especially when observations are noisy, incomplete, or updated at scale. The challenge was to turn raw catalog-level signals into a structured decision process: which candidates should be reviewed first, why they ranked highly, and how the process could be repeated as more data becomes available.

Approach

  • Translated an open-ended research problem into a structured ranking and prioritization workflow.
  • Explored large-scale observational data to identify signals that could support candidate triage.
  • Engineered interpretable variability metrics that could be compared consistently across objects.
  • Combined metrics into a composite index to create a transparent prioritization framework.
  • Added machine learning classification to support scalable review as additional catalog data becomes available.
  • Packaged the workflow with documented assumptions, outputs, and scope boundaries for reuse by future researchers.

Why this matters

This project is less about astronomy alone and more about a transferable analytics pattern: turning complex, noisy data into a repeatable decision system. The same structure applies to strategy, operations, product, and program analytics contexts where teams need to prioritize limited attention, explain recommendations, and create workflows that scale as new data arrives.