AWS DynamoDB Parallel Scan Calculator: Number of Threads and RCU Planning
Estimate how many parallel scan workers you should run, how many RCUs your scan needs, and whether your target completion time is realistic without overwhelming a DynamoDB table.
Calculator
How to calculate AWS DynamoDB parallel scan threads and RCU correctly
Planning a DynamoDB parallel scan is not just about choosing a random number of worker threads and hoping the job finishes quickly. A scan touches every item in the target table or index, so the operation can consume a meaningful portion of your read capacity and can interfere with live application traffic if it is not controlled. The practical question most teams ask is simple: how many threads should I run, and how many RCUs will the scan require? The correct answer depends on data volume, item size, consistency level, target completion time, and the amount of capacity you can safely dedicate to the scan.
This calculator uses the standard DynamoDB read model. For billing and throughput math, DynamoDB charges reads in 4 KB increments. One RCU supports one strongly consistent read per second for an item up to 4 KB, or two eventually consistent reads per second for the same size. When you scan a full table, the fastest way to approximate read demand is to work from total bytes read rather than from individual request counts. That gives you a clean capacity estimate before you fine tune workers, pagination, retries, and adaptive backoff.
The formula behind the calculator
The calculator treats table size as the total amount of data to read. It converts gigabytes to kilobytes, then divides by the amount of data a single RCU can read per second:
- Strongly consistent reads: 1 RCU reads 4 KB per second
- Eventually consistent reads: 1 RCU reads 8 KB per second
From there, the calculation is:
- Convert table size from GB to KB.
- Convert target scan time from minutes to seconds.
- Determine throughput per RCU based on consistency: 4 KB/s for strong, 8 KB/s for eventual.
- Compute baseline required RCU = total KB / (KB per RCU per second × seconds).
- Apply a safety overhead multiplier to account for uneven segment distribution, retries, network overhead, and throttling.
- Estimate thread count = adjusted required RCU / target RCU per thread.
This is a practical engineering estimate, not a guarantee of exact production behavior. Real scans are influenced by hot partitions, storage layout, application retry policies, and the fact that partitions may not be perfectly balanced. That is why the overhead multiplier matters. For small or well distributed tables, 1.10 can be reasonable. For large or skewed datasets, 1.20 to 1.30 often gives a more realistic budget.
Why thread count and RCU are related but not identical
A common mistake is to assume that adding more threads automatically creates more throughput. It does not. Throughput is limited by the amount of read capacity your table can deliver and by the amount of that capacity you are willing to let the scan use. Threads are simply a way to parallelize the work across segments so that a scan can keep more of the table busy at once. If your RCU budget is low, adding many extra threads may only increase contention and retries. If your RCU budget is high and your table is large enough, too few threads can underutilize available capacity and make the job take much longer than necessary.
In practice, good planning means picking a thread count that is high enough to saturate the scan budget without pushing each worker into aggressive retry loops. That is why the calculator asks for a target RCU per thread. It gives you a manageable way to think about worker intensity. For example, if you estimate the scan needs 2,400 RCU and you prefer workers around 200 RCU each, then a good starting point is 12 threads. If the same scan only has a safe budget of 1,000 RCU, then 12 threads will not help much because the table cannot sustainably feed them all at the intended rate.
Reference numbers every DynamoDB scan planner should know
| DynamoDB metric | Real value | Why it matters for parallel scan |
|---|---|---|
| Read billing chunk | 4 KB | Every item read is billed in 4 KB increments, so larger items consume multiple read units. |
| Strongly consistent read throughput | 1 RCU = 4 KB/s | Use this when accuracy matters and you need the latest committed value on every read. |
| Eventually consistent read throughput | 1 RCU = 8 KB/s | For the same data volume, eventual reads need about half the RCU of strong reads. |
| Maximum item size | 400 KB | Large items can dramatically increase scan cost because one item may consume many 4 KB read chunks. |
| Scan page size | Up to 1 MB per request page | Application level pagination affects request rate, memory use, and how evenly workers progress through segments. |
Example scenarios using capacity math
The table below illustrates how the same data volume behaves under different completion targets and consistency choices. These values are directly calculated from DynamoDB read unit rules and are useful for back of the envelope planning before you run a load test.
| Table size | Target time | Consistency | Baseline RCU needed | RCU with 15% overhead |
|---|---|---|---|---|
| 100 GB | 60 min | Eventually consistent | 218.45 | 251.22 |
| 100 GB | 30 min | Eventually consistent | 436.91 | 502.45 |
| 100 GB | 30 min | Strongly consistent | 873.81 | 1,004.88 |
| 500 GB | 45 min | Eventually consistent | 1,456.36 | 1,674.81 |
Best practices for choosing the number of parallel scan threads
1. Start with a capacity budget, not with a thread count
The safest process begins by deciding how much of the table’s read capacity can be reserved for the scan without harming user traffic. Many teams initially allocate 20% to 50% of the table’s available read capacity to background work. If the application workload is predictable and there is a maintenance window, you might go higher. If the table supports customer facing requests with bursty patterns, stay conservative.
2. Use eventual consistency when the job allows it
If your use case is analytics, migration validation, archival export, or backfill discovery, eventually consistent reads are usually the right choice. They effectively double the amount of data scanned per RCU compared with strongly consistent reads. That can cut your scan budget in half or let you finish in the same time using fewer RCUs.
3. Treat worker count as a tuning knob
A recommended thread count from a calculator is a starting point, not a sacred number. Measure real throughput, retry frequency, and table throttling. If workers are consistently under the target RCU per thread, add a few more. If retries spike or the table starts throttling foreground traffic, reduce workers or lower the page rate.
4. Expect skew in real tables
Parallel scan distributes work by segment, but segment completion is rarely perfectly uniform. Some workers finish early while others drag because the underlying storage distribution is uneven or because item sizes vary materially. That is why production teams often apply 10% to 30% extra capacity when planning SLA driven scans.
5. Monitor before, during, and after the run
Capacity planning should be validated with runtime metrics. Track consumed read capacity, throttled requests, average latency, retry counts, and business traffic health. If the scan is part of a one time migration, test on a representative sample first. If it is a recurring batch process, iterate until you find the lowest cost setting that still meets your window.
Step by step method for production planning
- Measure or estimate total data volume to scan.
- Choose eventual or strong consistency based on business correctness needs.
- Define the completion window in minutes.
- Set a safe percentage of table read capacity that the batch job may consume.
- Calculate required RCU and compare it with available scan budget.
- Pick an initial worker intensity, such as 100 to 300 RCU per thread.
- Launch a small test and inspect consumed capacity and throttling.
- Adjust thread count and pacing until the job reaches stable throughput.
Common mistakes that lead to bad scan performance
- Ignoring item size: two tables with the same item count can have very different read costs if average item size differs.
- Using too many threads on a low capacity table: more workers do not create new capacity and may only amplify retries.
- Running scans at 100% of provisioned capacity: this often harms customer facing reads and creates noisy alarm conditions.
- Assuming strong consistency is always necessary: many background jobs can safely use eventual consistency.
- Skipping a safety factor: perfect balance rarely happens in large production datasets.
How to interpret the calculator output
After clicking calculate, you will see the required scan RCU, the safe scan budget based on your allocated percentage, the recommended threads, and the estimated completion time at your current budget. If required RCU is below the safe budget, your target is feasible and your thread recommendation mainly helps you reach that throughput efficiently. If required RCU is above the safe budget, the scan can still be run, but not in the target time unless you increase capacity, widen the maintenance window, or reduce consistency cost.
Authoritative learning resources
If you want to deepen your understanding of cloud capacity planning and database system behavior, these resources are useful complements to DynamoDB specific documentation:
- NIST SP 800-145: The NIST Definition of Cloud Computing
- Carnegie Mellon University Database Systems course materials
- UC Berkeley Database research resources
Final recommendation
For most real world DynamoDB batch jobs, the best approach is to calculate the RCU required for your target time, cap the scan to a safe fraction of table capacity, and then choose a moderate number of workers that can collectively consume that budget. Do not optimize thread count in isolation. Optimize for the combination of table safety, predictable completion time, and low retry pressure. This calculator is designed for that exact planning workflow.