Designing Work Allocation Systems at Amazon Scale

The Problem Space

Work allocation sounds simple: you have tasks, you have workers, match them up. But at Amazon's scale, the devil is in the details. The Product Safety & Compliance team reviews millions of products across dozens of global marketplaces. Each review requires specific domain expertise, regulatory knowledge, and language skills. The allocation decision must consider:

Reviewer expertise: Does this person have the right certification for chemical safety reviews in the EU?
Workload balance: Is this reviewer already at capacity?
SLA requirements: This product has a 4-hour review deadline — can this reviewer complete it in time?
Priority: A product involved in active customer injuries takes precedence over a routine compliance check

Design Principles

Explainability Over Optimization

In regulated industries, you can't just optimize for throughput. When a regulator asks "why was this product assigned to this reviewer?", you need a clear, auditable answer. This ruled out ML-based assignment for the initial version — we chose a rules engine with weighted scoring that produces an explainable assignment trace.

Eventual Consistency is Acceptable

Dashboard metrics showing reviewer workload can be a few seconds stale. But the assignment itself must be strongly consistent — two reviewers must never be assigned the same task. We achieved this with DynamoDB conditional writes using optimistic concurrency control.

Graceful Degradation

When the allocation engine is overloaded, it should still assign work — just less optimally. We implemented a tiered degradation strategy:

Normal: Full multi-factor scoring with expertise matching
Degraded: Round-robin with basic expertise filtering
Emergency: FIFO queue with any available reviewer

Implementation Details

Priority Scoring

Each incoming task receives a priority score based on:

score = (urgency_weight × urgency) + (risk_weight × product_risk) 
      + (expertise_weight × expertise_match) + (sla_weight × time_remaining)

Weights are configurable per marketplace and product category, allowing regional teams to tune allocation behavior without code changes.

Work Stealing

A pull-based work stealing mechanism allows idle reviewers to claim tasks from overloaded peers. The implementation uses DynamoDB conditional updates to prevent race conditions:

Reviewer requests work from a peer's queue
System atomically removes the task from the source queue and assigns it to the requesting reviewer
If the conditional update fails (task already claimed), the reviewer retries with a different task

SLA Tracking

Real-time SLA tracking with escalation triggers at configurable thresholds. When a task reaches 75% of its deadline without assignment, it's automatically escalated to a senior reviewer. At 90%, it triggers a page to the on-call team lead.

Lessons Learned

Start with manual overrides — No allocation algorithm is perfect. Give team leads the ability to manually reassign work and use that override data to improve the algorithm.
Measure fairness, not just throughput — Optimizing for speed alone can overload top performers. Track per-reviewer workload variance.
Time zones are hard — A "24-hour SLA" means different things when the task crosses time zones and weekends. Use business hours, not wall clock time.