I Let Two AI Agents Race to Modernize pigz

Compression is everywhere — from PNG/JPG images to reducing network bandwidth for faster transfers. It’s also one of those fascinating areas where you learn about trade-offs: you can find a method that’s great at reducing size but painfully slow (gzip level 9), or one that’s "okay" but fast (e.g. gzip level 3), or a trade-off (e.g. gzip level 6).

The Need for Faster Compression

Ever since I started working on machine translation, our datasets have been kept compressed. Initially I just used gzip/gunzip/zcat directly or via tar. But the datasets kept growing, and gzip is single-threaded.

Then I found pigz — Parallel Implementation of GZip. It uses multithreading to parallelize compression. Over the past few years I’ve compressed GBs to TBs of datasets with pigz at work. What takes hours to days with gzip can be reduced to minutes on a beefy server with 40+ cores. pigz is great — easy to install, clean CLI, just pipe data in and out.

But then enter Python. I build data orchestration frameworks in Python (e.g., mtdata). Python’s stdlib gzip module is convenient but painfully slow at scale. There are many Python libs promising faster gzip, but figuring out which ones are actually compatible with pigz/gzip output is its own headache. My go-to has been subprocess pigz — just fork the process and stream I/O. It works, but there’s overhead from process forking and kernel-level copies of data through STDIN/STDOUT.

I learned how to bind Python APIs to C/C++ code while working on Pymarian. Naturally I thought — why not do the same for pigz? But pigz was written as a standalone CLI tool. It’s not thread-safe. It has one single global context (struct g, ~60 mutable fields) for all state, so having a pigz API call for reading and another for writing in the same process is asking for trouble. I looked at refactoring it into a library… and gave up. Too much work for my brain.

The Agent Benchmark

I keep an eye on coding agents. I refer to benchmarks like Terminal Bench and SWE-bench to assess which one is better, but whenever people claim agents will automate coding forever, I want to test it on a real task — something I actually need done.

Modernizing pigz became my go-to litmus test. The task is well-defined and easy to verify:

Must produce compatible outputs (compress with pigz, decompress with pigzpp, or vice versa)
Performance must be at least as good as original pigz

I asked agents to rewrite pigz in C++23, using modern language features — threads, RAII, exceptions — and restructure it as a thread-safe library.

2024 (GPT/Codex): Nowhere close. I tried, failed, wasted hours. Commits here.

2025: Agents could rewrite pigz, but performance was degraded. They couldn’t fix the regression — still too much work.

2026: A different story entirely.

The Race

This weekend I launched two parallel Copilot agent sessions in two VS Code workspaces — one with Claude Opus 4.6, and the other with GPT 5.4 — and let them loose on the same requirements document.

Both got the same goals: rewrite pigz in C++23 as a thread-safe library called pigzpp, with at least performance parity and full gzip compatibility.

I shared the requirements, let them plan, clarified their questions, and then let them implement.

How They Planned It

Both agents read the full pigz source (~5,200 lines of C) and produced a detailed plan before writing code. The plans were architecturally similar — both proposed class-based APIs, std::jthread replacing yarn.c, exceptions replacing try.c’s `setjmp/longjmp, and a staged approach. But they diverged sharply on strategy:

Claude planned a clean-room rewrite: new pigzpp/ directory, build everything from scratch, the original pigz stays as a reference for benchmarking only. Direct, bold.
GPT planned an incremental migration: keep the existing C engine buildable as pigz-ref, wrap it in a C++ "facade" with a process-local mutex, and replace internals piece by piece. Conservative, risk-averse.

How They Executed It

Claude went mostly autopilot. Created the directory structure and started writing Config, BufferPool, Compressor, Decompressor, CRC utilities, format helpers, CLI — the whole stack from scratch. I only had to nudge it 2 times during the initial implementation. It produced a genuinely new C++23 codebase with no dependency on the original C code.
GPT kept pausing to announce what it would do next and wait for approval. I had to say "continue" or "yes, go ahead" 16 times during the session. Its first milestone was a "bootstrap bridge" — a C++ wrapper that called pigz_call() (the original C entry point) behind a mutex. The pigzpp binary existed, but the actual compression/decompression still ran through the legacy C code. It called this a "migration scaffold" and kept the real rewrite as future work.

After ~70 minutes, both delivered working binaries. Both passed compatibility tests — compress with pigzpp, decompress with gzip, and vice versa.

But the Performance Story Was Different Than It First Appeared

	Compression (L1)	Compression (L6)	Compression (L9)	Decompression
Claude	34% faster	~equal	~7% slower	~43% slower
GPT	±5%	±5%	±5%	±5%

Compression (L1)

Compression (L6)

Compression (L9)

Decompression

Claude

34% faster

~equal

~7% slower

~43% slower

GPT

±5%

Claude’s rewrite was a genuinely new implementation — faster compression at low levels (new code, no legacy overhead), but decompression was initially 43% slower because it hadn’t yet implemented parallel I/O threads for inflate. Real gains, real gaps.

GPT’s ±5% across the board looks like "parity," but it’s misleading. Its pigzpp binary was routing through the original C pigz_call() behind a facade. It was benchmarking the old engine through a new wrapper, not a new implementation. Parity was guaranteed by construction.

Code Quality

Claude’s output was a proper library: Config value type (no globals), BufferPool with RAII, Compressor/Decompressor classes, separate CLI. You could #include <pigzpp/compress.h> and use it in another program.

GPT’s output was build infrastructure + test scaffolding + a thin C++ API that called the legacy C code. Good engineering practice for a migration — but not what I asked for. The actual rewrite was still TODO.

I picked Claude’s version to push further.

I have captured the full chat histories (sanitized to remove local paths):

Look for User: substring for user inputs. The histories are sanitized to remove private info such as local paths and URIs, so they are not 100% complete.

Beating pigz

I asked: "We’ve matched pigz. Now can you beat it?"

The agent identified two key optimizations:

Eliminating unnecessary memory copies in the compression pipeline
Replacing zlib with zlib-ng, which exploits SIMD instructions on modern CPUs

Result: up to 1.8x faster compression, 2.4x faster decompression vs original pigz (at default thread count). At single-thread, the gap is even wider — 5.7x faster compression thanks to zlib-ng’s SIMD-optimized DEFLATE.

The final pigzpp:

3,744 lines of C++23 (vs. ~5,200 lines of C in the original)
35 GoogleTest cases (original had zero automated tests)
Zero global mutable state — fully thread-safe, multiple simultaneous compress/decompress operations in one process
Replaces hand-rolled pthreads with std::jthread, setjmp/longjmp with proper exceptions, manual ref-counting with shared_ptr + RAII

CLI Benchmarks

Environment: 48-core Intel Xeon, Ubuntu 24.04.3 LTS. All tools use default thread counts.

Tools tested:

gzip 1.12 — the classic single-threaded compressor, our baseline
pigz 2.8 — Mark Adler’s parallel gzip (uses zlib, multithreaded compression)
pigzpp 1.0.0 — our C++23 rewrite of pigz (uses zlib-ng, multithreaded compression)
igzip 2.31.0 — Intel ISA-L’s native CLI (apt install isal), using hand-tuned x86 assembly for DEFLATE

All produce gzip-compatible output. Throughput in MB/s, speedup relative to gzip:

Compression:

Size	gzip	pigz	pigzpp	igzip
16 MB	145 (1.0x)	772 (5.3x)	954 (6.6x)	1824 (12.6x)
128 MB	189 (1.0x)	1597 (8.5x)	2487 (13.2x)	2542 (13.5x)
1024 MB	199 (1.0x)	1862 (9.4x)	3365 (16.9x)	2989 (15.0x)

Size

gzip

pigz

pigzpp

igzip

16 MB

145 (1.0x)

772 (5.3x)

954 (6.6x)

1824 (12.6x)

128 MB

189 (1.0x)

1597 (8.5x)

2487 (13.2x)

2542 (13.5x)

1024 MB

199 (1.0x)

1862 (9.4x)

3365 (16.9x)

2989 (15.0x)

Decompression:

Size	gzip	pigz	pigzpp	igzip
16 MB	219 (1.0x)	798 (3.6x)	1318 (6.0x)	2107 (9.6x)
128 MB	299 (1.0x)	965 (3.2x)	2132 (7.1x)	4568 (15.3x)
1024 MB	301 (1.0x)	1112 (3.7x)	2658 (8.8x)	5325 (17.7x)

Size

gzip

pigz

pigzpp

igzip

16 MB

219 (1.0x)

798 (3.6x)

1318 (6.0x)

2107 (9.6x)

128 MB

299 (1.0x)

965 (3.2x)

2132 (7.1x)

4568 (15.3x)

1024 MB

301 (1.0x)

1112 (3.7x)

2658 (8.8x)

5325 (17.7x)

Analysis: pigzpp and igzip are both dramatically faster than pigz. For compression, they’re neck-and-neck at large sizes — pigzpp edges ahead at 1 GB (16.9x vs 15.0x) thanks to multithreaded parallel compression, while igzip leads at 16 MB where its raw single-core speed dominates before threading overhead amortizes. For decompression, igzip wins decisively (up to 17.7x) — Intel’s hand-written assembly for inflate is hard to beat. DEFLATE decompression is inherently sequential, so pigzpp’s multithreading advantage doesn’t help here; its gains come purely from zlib-ng’s SIMD optimizations.

Thread scaling (pigz vs pigzpp):

Size	Threads	pigz	pigzpp	Speedup
16 MB	1	168 MB/s	812 MB/s	4.8x
16 MB	4	445 MB/s	1561 MB/s	3.5x
16 MB	16	970 MB/s	1522 MB/s	1.6x
128 MB	1	176 MB/s	1011 MB/s	5.7x
128 MB	4	535 MB/s	2628 MB/s	4.9x
128 MB	16	1797 MB/s	2695 MB/s	1.5x

Size

Threads

pigz

pigzpp

Speedup

16 MB

168 MB/s

812 MB/s

4.8x

16 MB

445 MB/s

1561 MB/s

3.5x

16 MB

970 MB/s

1522 MB/s

1.6x

128 MB

176 MB/s

1011 MB/s

5.7x

128 MB

535 MB/s

2628 MB/s

4.9x

128 MB

1797 MB/s

2695 MB/s

1.5x

At single-thread, pigzpp is 4.8-5.7x faster than pigz — this is pure zlib-ng vs zlib. As thread count increases, pigz narrows the gap but never catches up.

Python Bindings

I then had the agent build Python bindings via pybind11. Unlike my old approach of forking a subprocess, pigzpp calls the C++ library directly — zero fork/exec overhead.

I benchmarked against every relevant Python compression library. All speedups below are relative to stdlib gzip (1.0x baseline).

Libraries tested:

gzip — Python stdlib, backed by system zlib (1.3.1 on this machine)
pigz (subprocess) — forking the pigz binary via subprocess.Popen, my old workaround
zlib-ng (pip install zlib-ng) — Python bindings for SIMD-optimized zlib-ng, single-threaded
pigzpp — our pybind11 bindings, calling the C++ library in-process with parallel compression
isal (pip install isal) — Python bindings for Intel ISA-L, single-threaded but with hand-tuned assembly

Benchmarks on a 48-core Xeon, Ubuntu 24.04.3 LTS, Python 3.14.3 (verified identical on 3.13.11):

Python 3.14 bundles zlib-ng for Windows/macOS binary releases (cpython#91349), which should make stdlib gzip significantly faster on those platforms. On Linux, both 3.13 and 3.14 use the system zlib, so results are identical.

File API (MB/s, speedup vs gzip)

Compression:

Size	gzip	pigz (subprocess)	zlib-ng	pigzpp	isal
16 MB	150 (1.0x)	497 (3.3x)	820 (5.5x)	1033 (6.9x)	2313 (15.4x)
128 MB	151 (1.0x)	465 (3.1x)	530 (3.5x)	748 (5.0x)	1072 (7.1x)
1024 MB	150 (1.0x)	497 (3.3x)	705 (4.7x)	749 (5.0x)	1078 (7.2x)

Size

gzip

pigz (subprocess)

zlib-ng

pigzpp

isal

16 MB

150 (1.0x)

497 (3.3x)

820 (5.5x)

1033 (6.9x)

2313 (15.4x)

128 MB

151 (1.0x)

465 (3.1x)

530 (3.5x)

748 (5.0x)

1072 (7.1x)

1024 MB

150 (1.0x)

497 (3.3x)

705 (4.7x)

749 (5.0x)

1078 (7.2x)

Decompression:

Size	gzip	pigz (subprocess)	zlib-ng	pigzpp	isal
16 MB	359 (1.0x)	222 (0.6x)	660 (1.8x)	444 (1.2x)	666 (1.9x)
128 MB	322 (1.0x)	231 (0.7x)	448 (1.4x)	444 (1.4x)	441 (1.4x)
1024 MB	327 (1.0x)	238 (0.7x)	557 (1.7x)	483 (1.5x)	563 (1.7x)

Size

gzip

pigz (subprocess)

zlib-ng

pigzpp

isal

16 MB

359 (1.0x)

222 (0.6x)

660 (1.8x)

444 (1.2x)

666 (1.9x)

128 MB

322 (1.0x)

231 (0.7x)

448 (1.4x)

444 (1.4x)

441 (1.4x)

1024 MB

327 (1.0x)

238 (0.7x)

557 (1.7x)

483 (1.5x)

563 (1.7x)

Bytes API (MB/s, speedup vs gzip)

Compression:

Size	gzip	zlib-ng	pigzpp	isal
16 MB	173 (1.0x)	468 (2.7x)	1889 (10.9x)	215 (1.2x)
128 MB	168 (1.0x)	459 (2.7x)	1805 (10.7x)	213 (1.3x)
1024 MB	170 (1.0x)	458 (2.7x)	1787 (10.5x)	213 (1.2x)

Size

gzip

zlib-ng

pigzpp

isal

16 MB

173 (1.0x)

468 (2.7x)

1889 (10.9x)

215 (1.2x)

128 MB

168 (1.0x)

459 (2.7x)

1805 (10.7x)

213 (1.3x)

1024 MB

170 (1.0x)

458 (2.7x)

1787 (10.5x)

213 (1.2x)

Decompression:

Size	gzip	zlib-ng	pigzpp	isal
16 MB	1092 (1.0x)	2937 (2.7x)	4982 (4.6x)	2946 (2.7x)
128 MB	1058 (1.0x)	2431 (2.3x)	4184 (4.0x)	2461 (2.3x)
1024 MB	584 (1.0x)	1063 (1.8x)	1340 (2.3x)	1071 (1.8x)

Size

gzip

zlib-ng

pigzpp

isal

16 MB

1092 (1.0x)

2937 (2.7x)

4982 (4.6x)

2946 (2.7x)

128 MB

1058 (1.0x)

2431 (2.3x)

4184 (4.0x)

2461 (2.3x)

1024 MB

584 (1.0x)

1063 (1.8x)

1340 (2.3x)

1071 (1.8x)

Analysis: The two APIs tell very different stories.

For the file API, each library writes/reads one file at a time. ISA-L dominates compression (up to 15.4x) because its hand-tuned assembly is simply the fastest single-stream DEFLATE implementation available. pigzpp’s multithreading helps less here because the file API triggers one background thread per open file. For decompression, zlib-ng and isal converge around 1.4-1.9x — DEFLATE inflate is sequential, so all single-threaded SIMD libraries perform similarly.

For the bytes API, pigzpp pulls far ahead on compression (10-11x) because it fires up all 48 cores to compress the in-memory buffer in parallel chunks. zlib-ng and isal are limited to single-threaded compression in their Python bindings, so despite having faster per-core DEFLATE, they can’t compete with 48 cores working in parallel. Same story for decompression: pigzpp at 4.6x vs isal/zlib-ng at 2.7x.

Full interoperability verified: pigzpp.open() ↔ gzip.open(), pigzpp.compress() ↔ gzip.decompress(), and vice versa.

Lessons Learned

Agents can modernize real tools — and make them faster. This isn’t a toy demo. pigz is a widely-used utility written by Mark Adler, co-creator of zlib. A coding agent rewrote it from scratch in 70 minutes, matched performance, then beat it by 2x with targeted optimizations. That’s remarkable.

The Python ecosystem has quietly gotten fast. Python 3.14 will bundle zlib-ng in stdlib for Windows/macOS (cpython#91349). ISA-L is even faster on Intel CPUs — its igzip CLI hits 5.3 GB/s decompression, and the Python isal library dominates single-threaded file compression. The case for pigzpp Python bindings is strongest for the bytes API where multithreading gives a 10x advantage — but for file-at-a-time use, isal is the better choice. The CLI pigzpp remains useful as a faster drop-in replacement for pigz.

Agent style matters. Claude’s autopilot approach — just doing the work without asking permission at every step — was noticeably more productive than GPT’s pause-and-announce style. For a task with clear requirements, I want an agent that executes, not one that narrates.

Well-defined tasks make agents shine. The key was having clear, verifiable success criteria: compatibility and performance. If I couldn’t objectively measure whether the output was correct and fast, I wouldn’t have trusted either agent’s result.

pigzpp is open source: github.com/thammegowda/pigzpp

The Need for Faster Compression#

The Agent Benchmark#

The Race#

How They Planned It#

How They Executed It#

But the Performance Story Was Different Than It First Appeared#

Code Quality#

Beating pigz#

CLI Benchmarks#

Python Bindings#

File API (MB/s, speedup vs gzip)#

Bytes API (MB/s, speedup vs gzip)#

Lessons Learned#

The Need for Faster Compression

The Agent Benchmark

The Race

How They Planned It

How They Executed It

But the Performance Story Was Different Than It First Appeared

Code Quality

Beating pigz

CLI Benchmarks

Python Bindings

File API (MB/s, speedup vs gzip)

Bytes API (MB/s, speedup vs gzip)

Lessons Learned