Work · Nike · 2025

A regulated data migration with a fixed deadline

A population of legacy consent records needed to move through a fixed renewal path. The work was not just writing a batch job. It was proving the deadline math, keeping the live platform safe, and leaving a trail the team could defend later.

Role
Lead engineer — spike, design, build, rollout, validation
Stack
Batch data processing, production write paths, infrastructure as code, and source-of-record validation
Outcome
Migration completed within the regulatory window, with validation against the source of record and a follow-up reconciliation for edge cases.

Selected outcomes

  • Large population of regulated consent records moved through the required renewal path within the deadline.
  • Validation compared post-migration state against the source of record instead of relying on count checks alone.
  • Pipeline stayed inside its capacity envelope without competing with live traffic.
  • Rollback path designed and tested before the higher-risk rollout phases.

The problem in one paragraph

The platform stores consent records keyed by jurisdiction and policy. For the relevant regulated consent category, records had to be renewed on a defined cadence. The product fix was to re-prompt affected users on their next eligible visit. The data work was to mark those records correctly without disturbing valid consent that did not need to move.

Constraints

A few things made this more than a one-time script:

  1. Volume. A large population of records spread across data slices that could not be treated like a simple table scan.
  2. Liveness. The migration had to coexist with production traffic, not block it.
  3. Reversibility. Every step needed a way back. Marking a record incorrectly as "needs re-prompt" is recoverable; marking a record as renewed when it was not is a different class of mistake.
  4. Auditability. Compliance reviewers need the trail. Every record touched needed a corresponding entry in the audit pipeline that downstream legal/compliance could reconcile against.
  5. Time. The deadline was real.

Approach

The shape ended up being three stages:

Spike. First, answer "what does the data actually look like." Source-of-record queries sized the affected population by rule-relevant attributes and record age. This is where the deadline math happened: given the renewal cadence, what needs to move by what date, and what sustained rate is required through the window.

Data pipeline. A batch job read from source-of-record data, applied the renewal criteria, and wrote back through the platform's normal write path. Bounded filters kept the job scoped; throttled writes kept it from competing with live traffic for capacity. The run was defined through infrastructure as code so it could be reproduced across environments.

Validation. A second query set compared the post-migration state against the source of record. Not a count check — a record-level diff. Edge cases got logged and triaged manually. That validation is what made the deadline-met story honest rather than aspirational.

Tradeoffs

The tradeoffs that mattered:

What I'd do next

The reconciliation tail was the thing I'd most want to design differently next time. A cleaner mechanism for catching long-tail edge cases up front — instead of leaving them as a follow-up — would have made the rollout feel less like two stages and more like one.

Also: more dashboards earlier. I built dashboards reactively as the rollout surfaced questions. Building them ahead of time would have saved a few stressful hours of "wait, is the throughput we expected actually the throughput we're seeing."