Pulse x reMKTR

Cut Build Times with EC2 C8id, M8id, R8id Instances

Written by Jacob Heinz | Feb 5, 2026 8:24:15 PM

Your bottleneck isn’t CPU. It’s your scratch disk.

You’ve felt it before: fast CPUs, big memory, then everything crawls during shuffles, spills, staging. That’s the classic EBS tax for jobs that live and die on local I/O.

Here’s the switch flip: AWS just launched Amazon EC2 C8id, M8id, and R8id with up to 22.8 TB of local NVMe SSD attached to the host. Translation: less time watching spinners, more time shipping. They run custom Intel Xeon 6 processors tuned for AWS, with DDR5 and AVX-512, and tie into Nitro with up to 100 Gbps networking and EFA for low-latency HPC moves.

If your jobs need fast temporary space—Spark shuffles, in-memory databases with spill, video encoding, scientific scratch, CPU inference—this is your new favorite button. You’re not just getting more storage. You’re getting 3x vCPUs, memory, and local storage versus prior-gen families, and R8id specifically posts a 43% performance jump over R6id.

Use the instance store right, and you can squeeze >7 GB/s per drive in RAID setups, crush tail latencies, and stop overpaying for IOPS you don’t use.

TLDR

  • New: Amazon EC2 C8id, M8id, and R8id with up to 22.8 TB local NVMe SSD.
  • Powered by custom Intel Xeon 6, DDR5, AVX-512; up to 100 Gbps networking and EFA.
  • R8id: +43% vs. R6id; C8id: 15–60% faster on web/AI/cache benchmarks.
  • Ideal for Spark, HANA/Redis, video encode, CPU inference, EDA, risk modeling.
  • Local NVMe slashes scratch I/O latency; keep state ephemeral and back it up smart.
  • Price-performance improves up to 15% with Savings Plans and Spot options.

Your real bottleneck

Why local NVMe matters

When your workload spills or shuffles, EBS (even io2 Block Express) can choke. With instance store NVMe physically on the host, you trade network hops for PCIe lanes. Net result: huge sequential throughput and microsecond access that speeds your “throwaway” data paths.

AWS’ launch pitch is clear: “up to 22.8 TB of local NVMe SSD storage” on C8id, M8id, and R8id. In practice, RAID 0 across multiple NVMe devices gives linear throughput gains, often pushing beyond 7 GB/s per drive for sequential reads and writes. If your Spark jobs spend half their time on shuffle I/O, that lever moves your wall-clock.

Think of it this way: moving shuffle, temp, and cache reads from network storage to on-host NVMe cuts a round trip. EBS is great, but it’s still a network hop with per-volume throughput and IOPS limits. Local NVMe gives microseconds not milliseconds, and wide lanes you don’t reserve ahead of time. For big sequential spills (ETL, sorting, encoding) and high-entropy shuffles, that’s the gap between CPUs idling and CPUs sprinting.

To be clear, io2 Block Express can deliver serious performance—hundreds of thousands of IOPS and multi-GB/s per volume when provisioned right. But if your pattern is “generate, chew, discard,” the provisioning tax and network hop often become the bottleneck under concurrency. Instance store flips that script.

Pro tip: don’t judge by a single-thread test. NVMe shines under concurrency. Use a multi-threaded workload (or a tool like fio with parallel jobs and queue depth) to measure the real curve your app will see.

Where EBS still fits

EBS stays gold for durable state, database volumes that need snapshots, and workloads that like decoupled storage and compute scaling. Instance store is ephemeral. If the instance stops or fails, data is gone. The winning pattern: keep hot scratch on NVMe and persist data you care about to EBS or S3.

As AWS docs put it, “An instance store is temporary block-level storage for your instance.” Use it deliberately, and you get both speed and safety.

"An instance store is temporary block-level storage for your instance." — AWS EC2 Documentation

Use this split-brain approach:

  • Put logs, checkpoints, model artifacts, and job outputs on S3 or EBS. Automate the copy.
  • Keep temp tables, intermediate Parquet files, caches, and shuffle on instance store.
  • If you must mirror scratch (rare), RAID 1 across NVMe tolerates a single drive failure—but it won’t survive an instance stop or termination. RAID is not a backup.

Two common patterns:

  • Build farms: pull dependencies and containers into NVMe-backed caches to avoid re-downloading. Persist the final build to S3 or ECR.
  • Analytics clusters: Spark uses instance store for spark.local.dir. Periodically checkpoint to S3, and write final tables to durable storage.

Pick your weapon

C8id CPU bound

C8id extends the compute-optimized C8i family and adds local NVMe. You still get the tight 2:1 memory-to-vCPU profile for CPU-heavy tasks—web proxies, encoding, CPU inference—and now you feed them fast scratch. AWS benchmarks show 60% faster NGINX, 40% faster AI recommendation inference (e.g., TensorFlow Serving), and 35% faster Memcached vs. prior gen C7id. The c8id.96xlarge tops out at 384 vCPUs, 768 GiB memory, and 22.8 TB NVMe.

If you’ve been running inference with batch pre and post-processing that spills to disk, this is a clean win. Keep transient tensors, feature caches, and logs on NVMe to shave tail latency.

Try this mapping to get started fast:

  • Set your model server to write temporary transforms to an NVMe-backed path.
  • Keep model weights and config on EBS for persistence, but copy them to NVMe on boot for faster cold starts.
  • If you run NGINX or Envoy sidecars, point their cache directories (TLS session, OCSP or stapling, content cache) at NVMe to smooth p99s under load.

M8id general purpose

M8id balances compute and memory while layering in big local storage. Think mid-tier databases with temp tables, ERP workloads that burst during ETL windows, CI or CD pipelines that fan out builds and cache artifacts locally. Sample sizes scale from m8id.large with 0.95 TB NVMe up to m8id.96xlarge with 22.8 TB NVMe and 3,072 GiB of RAM. It’s the versatile choice when your architecture has many moving parts but you don’t want to overfit a specialized family.

The twist: you can keep durable data on EBS, but move temp tables, package caches, and build artifacts to NVMe for immediate gains.

Real-world examples that map well to M8id:

  • CI fleets that pull dozens of language runtimes and node_modules: warm the cache on NVMe and watch build times drop.
  • ETL systems that run nightly: sort, aggregate, and join on instance store; write the curated datasets to S3.
  • Web apps that do heavy image or video transforms at request time: stage inputs and outputs on NVMe and only persist the final asset.

R8id memory monster

R8id is memory-optimized plus local NVMe. It’s built for in-memory databases (SAP HANA, Redis), real-time analytics (Spark), big caches, simulation workloads, and EDA. AWS reports up to 43% higher performance than R6id, 3.3x higher memory bandwidth, and triple the local storage—topping at 22.8 TB on r8id.96xlarge with 3 TB of DDR5.

If your HANA box spills during delta merges or your Redis snapshot pipeline thrashes, R8id shrinks the penalty by keeping temporary I/O on-device. Pair with EFA for tightly coupled HPC patterns when you need low-latency east-west traffic.

"R8id provides up to 43% higher performance than R6id." — AWS launch materials

Best practices for memory-first stacks:

  • Redis: keep RDB or AOF on EBS for durability, but stream snapshot staging to NVMe to avoid blocking. Use faster local fsync paths and copy completed snapshots out.
  • HANA: place temp and trace directories on NVMe; keep data and log volumes on EBS. This reduces merge pain and restart times.
  • Spark: on big joins that exceed RAM, many NVMe lanes keep spill from becoming a parking lot.

Architecture patterns

Treat instance store

Instance store is ephemeral by design. Great for scratch, terrible for irreplaceable state. The playbook:

  • Use RAID 0 across multiple NVMe devices for throughput on temporary datasets.
  • Persist checkpoints, logs, and outcomes to S3 or EBS on a schedule.
  • On reboot or stop, expect a clean whiteboard. Automate provisioning via cloud-init or user data to rehydrate caches and directories.

AWS says it plainly in docs: instance store is temporary block-level storage. That’s not a bug; it’s a feature when you’re optimizing throughput with acceptable risk boundaries.

Operational add-ons that save headaches:

  • Use Auto Scaling groups with launch templates that create the RAID, format, and mount on boot. Treat nodes as cattle, not pets.
  • Add lifecycle hooks to drain work and flush outputs to S3 before termination.
  • Tag scratch volumes and write a health check that fails fast if the mount isn’t present. Fail early, not halfway into your ETL.

Squeeze throughput safely

  • Filesystems: XFS or ext4 often deliver consistent performance for large sequential I/O.
  • Queue depth: tune iodepth (e.g., fio) to find the knee of the curve for your workload.
  • NUMA awareness: pin high-traffic processes to cores local to NVMe controllers when applicable.
  • Networking: use EFA for MPI-style HPC or latency-sensitive east-west traffic; ENA and enhanced networking for everything else.

Don’t forget placement groups for full-bisection bandwidth across nodes. And when you’re mixing EBS and instance store, ensure your EBS bandwidth (up to 80 Gbps on the largest sizes) isn’t your new bottleneck.

"Use RAID 0 on instance store to increase volume size and to provide higher I/O performance." — AWS EC2 User Guide (RAID)

A few more low-effort wins:

  • Set noatime on mounts to cut metadata churn. Consider increasing read-ahead on large sequential jobs.
  • For NVMe, the default I/O scheduler is often optimal, but test mq-deadline versus none for your pattern.
  • Keep block sizes aligned with your access pattern. Large blocks help sequential scans and writes.
  • Warm caches during bootstrap if you know your working set. Pre-fetching avoids a cold-start cliff.

Cost math

Stop paying for idle IOPS

If you’re throwing EBS io2 Block Express at scratch problems, you might be paying a premium for durability and provisioned IOPS you don’t need. Local NVMe gives you raw, attached performance without per-IOPS pricing. For shuffle-heavy analytics or build farms, that really matters.

A simple framing for finance: if your job runtime drops 30–50% by moving scratch to instance store, your compute-hours fall accordingly. Combine that with up to 15% better price-performance on C8id versus C7id and the savings stack fast. Add Savings Plans for baseline capacity and Spot Instances for burst windows.

Two extra levers to remember:

  • Instance store is included in the instance price. No extra per-GB-month for scratch.
  • EBS is fantastic for persistence, but you pay for capacity and, with io2, provisioned IOPS. Keep it for data you must keep.

Back of envelope scenario

  • Before: 100-node Spark cluster on instances without local NVMe, heavy on EBS io2 for shuffle. 2-hour jobs, frequent tail latency stalls.
  • After: same node count on R8id with RAID 0 across NVMe for shuffle. Jobs complete in ~70–90 minutes.

You didn’t just buy performance. You bought back time. And time is the only non-renewable resource in your roadmap.

"Savings Plans provide flexible pricing for EC2 usage in exchange for a 1- or 3-year term." — AWS Savings Plans

If you want to push savings further:

  • Run on-demand for canaries and nightly for stability; move bursty daytime backfills to Spot with diversification.
  • Use instance fleets, multiple sizes in one ASG or EMR cluster, to grab the best availability and price.
  • Right-size aggressively. With faster scratch, you may finish within the same billable hour on fewer nodes.

What this unlocks

Todays wins

  • AI or ML inference on CPUs: AVX-512 plus fast local feature stores reduce p99 latency.
  • Real-time analytics: Spark shuffle and spill accelerate with on-host NVMe; EMR or self-managed.
  • In-memory DB headroom: faster merges, snapshots, and restarts when paired with durable EBS for state.
  • HPC and scientific workloads: scratch-heavy simulations benefit from NVMe and EFA.

These families snap into AWS Nitro for security isolation, support IMDSv2, and can leverage Nitro Enclaves for confidential compute when you need to segment sensitive workloads on the same host.

Add in the operational niceties:

  • Placement groups to keep nodes close for low-latency east-west chatter.
  • CloudWatch Agent on each node to watch disk throughput and iowait. Alert when p95 latencies creep.
  • S3 lifecycle policies so intermediate outputs age out automatically. Keep storage bills tidy.

The near future

As the Xeon 6 lineup matures, expect tighter perf-per-watt curves and broader region availability. In parallel, Graviton remains the ARM path for specific price-performance wins—but in x86-heavy stacks (legacy software, proprietary binaries, AVX-accelerated libraries), C8id, M8id, and R8id are the straightforward upgrade. The big story isn’t CPU versus CPU; it’s end-to-end throughput from core to cache to disk to network.

"Elastic Fabric Adapter provides low-latency, high-throughput communications for tightly coupled HPC applications." — AWS Documentation

If you’re modernizing, consider a two-lane approach:

  • For services that can recompile and benefit from ARM, test Graviton for steady-state fleets.
  • For everything x86-bound, move to 8id families now so you stop leaving throughput on the table.

Fast facts

  • Local NVMe on C8id, M8id, R8id turns scratch from a drag into a glide path.
  • R8id posts +43% vs. R6id with 3.3x memory bandwidth; C8id delivers 15–60% gains on web, AI, and cache benchmarks.
  • Use RAID 0 for throughput on ephemeral data; persist outcomes to EBS or S3.
  • Pair with EFA and placement groups for low-latency multi-node work.
  • Price-performance improves when you stop paying for scratch IOPS you don’t need.

Launch plan

  • Pick the family: C8id (CPU-bound), M8id (balanced), R8id (memory-heavy).
  • Right-size: aim for headroom in vCPU and memory; don’t starve your app threads.
  • Choose AMI: latest kernel with NVMe optimizations; enable ENA or EFA as needed.
  • Create RAID 0 across NVMe devices (mdadm) for scratch.
  • Format with XFS or ext4; set noatime for fewer metadata writes.
  • Mount scratch under /mnt or /local; set permissions for app users.
  • Wire your app: point temp dirs, shuffle, and caches to the NVMe mount.
  • Add persistence: periodic checkpoints to S3 or EBS; rotate logs off-host.
  • Tune: iodepth, queue sizes, thread pools; test with fio before production.
  • Automate bootstrap in user data; test stop or start semantics (remember: ephemeral).

Bonus guardrails that pay off:

  • Put node drains and checkpoint hooks in your autoscaling termination policies.
  • Keep a tiny “canary” job that runs every hour to validate disk speed and mount health.
  • Document recovery: if scratch is empty on boot (it will be), does your node self-heal in under 2 minutes? Make it so.

FAQs

Durability

No. Instance store volumes are ephemeral. If the instance stops, fails, or is terminated, data on instance store is lost. Keep irreplaceable data on EBS or S3 and use instance store for caches, shuffles, and other scratch paths.

How fast is NVMe

AWS indicates sequential reads and writes can exceed 7 GB/s per drive. With multiple devices in RAID 0, throughput scales nearly linearly for large sequential workloads. Your actual performance depends on filesystem, queue depth, and workload profile.

Supported sizes

The largest c8id.96xlarge, m8id.96xlarge, and r8id.96xlarge configurations offer up to 22.8 TB of NVMe instance storage, alongside up to 384 vCPUs. R8id also reaches 3 TB of DDR5 memory on its largest sizes.

When to choose EBS

When you need durable, high-IOPS storage with snapshots, encryption at rest, and decoupled scaling from compute. Databases that require persistence across reboots should keep primary volumes on EBS, while offloading temp and spill files to instance store.

EFA and networking

Yes. They integrate with Nitro and support ENA up to 100 Gbps networking on larger sizes. Elastic Fabric Adapter is available for HPC-style, low-latency communications. Always confirm instance-size-specific limits in the latest AWS documentation.

Control costs

Use Savings Plans for steady-state capacity and EC2 Spot for bursty or fault-tolerant jobs. Right-size instances, move scratch to NVMe to reduce runtime, and monitor utilization. The combination can deliver up to 15% better price-performance versus prior gens, plus time savings.

Snapshot instance store

Not directly. Instance store doesn’t support snapshots like EBS. If you need to preserve data, copy it to EBS or S3 on a schedule or at job boundaries.

Instance store encryption

On Nitro-based instances, instance store data is encrypted at rest and is securely wiped when the instance is stopped or terminated. You still control application-layer encryption for data written to EBS or S3.

Instance store for Docker

Yes. Point your container runtime’s image and cache directories to the NVMe mount. You’ll see faster pulls and layer extraction, especially in CI or CD pipelines.

You don’t need more servers—you need more throughput where it counts. These new EC2 families put fat NVMe lanes right next to your CPUs, so your hot code paths stop tripping over storage. Move scratch, spills, and temp artifacts onto instance store, keep your state safe on EBS or S3, and let the CPUs breathe. Your jobs get shorter, your bills get saner, and your team gets weekends back.

In 2026, the fastest “upgrade” isn’t more cores—it’s fewer I/O hops.

References