VRIG Memory Allocator Project

2026-03-01

Project Description

This was a mini competition run by the Vulnerability Research Interest Group (VRIG) at RITSEC during the start of Spring 2026. We studied modern allocator implementations — specifically Mimalloc, PartitionAlloc, TCMalloc, Scalloc, Scudo, and bmalloc — then built our own from scratch. The benchmark harness was created by Oleg and tests basic allocator behavior, memory fragmentation, throughput, tail latency, and a suite of security edge cases.

The benchmark covers correctness (malloc/free, realloc, calloc, alignment, size boundaries), stress and threaded suites (long realloc chains, 1M+ ops, producer-consumer, mixed sizes), fragmentation ("Swiss cheese" patterns, peak live set cycles), edge and security cases (invalid frees, double frees, stack frees / house of spirit, heap poisoning / house of lore, top chunk abuse / house of force), and realistic workloads replaying traces from Redis YCSB, SQLite TPC-C, and Firefox page loads.

This post covers my entry: Zialloc. The full project blog including entries from other VRIG members is published at blog.ritsec.club.

Zialloc Design

I'm somewhat continuing to work on this project, so the design doc may not reflect the latest security features and execution paths. The overall layout won't change though.

Size Model

Zialloc uses fixed size classes for regular allocations plus a directly-mapped XL path.

Reserved heap virtual address space (default): 100 GB
Segment size / alignment: 128 MiB
Page classes: SMALL (1 MiB), MEDIUM (8 MiB), LARGE (16 MiB), XL (direct mapping)
Chunk size thresholds: 512 KiB – 16 B (small), 4 MiB – 16 B (medium), 8 MiB – 16 B (large); above is XL
Regular requests are normalized: round to at least 16, round up to next power-of-two, clamp to class cap
Final stride = align_up(normalized_request + inline_header, 16)

Heap Layout

At initialization, zialloc reserves a large vmem region (100 GB) and commits 128 MiB segments from it on demand. It immediately seeds one segment each for small, medium, and large classes.

Hierarchy: Each segment is classed by size — all pages in a segment have the same page size. Within one page, all slots have identical stride/usable size and allocation state is tracked with a bitmap. XL allocations bypass the class system and are mmapped as standalone mappings with inline headers.

Metadata model (OOL — allocator-owned):

Chunks resolve their owning page and slot index via pointer arithmetic on themselves
Per-page: bitmap, used counts, owner TID, deferred-free ring
Per-segment: class, page array, full-page count, chunk-geometry lock-in, integrity key/canary
XL metadata is inline in front of the returned pointer (XLHeader)

Allocation Workflow

Allocation enters through API wrappers (malloc, calloc, realloc) and dispatches to HeapState::allocate(size).

Validate request size and compute size class (SM/MD/LG/XL)
XL has a second chance: if it fits a large-page chunk geometry, reroute to large class
Fast path: search thread-local cached pages by class
Next path: search thread-local preferred segment
Next path: shard queue of known non-full segments
Next path: bounded scan of same-class segments
Slow path: carve another segment from the pre-reserved virtual address space
Fallback: mmap a new segment-aligned mapping

Bitmap/chunk behavior: The chunk allocator inside a page is bitmap-driven. Allocation searches from a hint, finds the first zero bit, marks it, writes a chunk header, and returns a pointer after the header. Free validates the header/magic/owner/slot, clears the bit, decrements used count, and updates first_hint. Double-free detection works by checking if a bitmap bit is already clear.

Free Workflow

Free enters through free() and dispatches to HeapState::free_ptr(ptr, usable_out). Note: "freeing" here means undoing physical mappings, not necessarily returning memory to the OS.

Null free is ignored
Inline ChunkHeader is parsed and CHUNK_MAGIC is validated — invalid state aborts
If freeing thread is not the page owner, allocator attempts a deferred enqueue into the page-local lock-free ring
If enqueue fails (queue full/contention), falls back to direct free
Deferred frees are drained by the owner-side allocation path opportunistically when queue pressure is high
Chunk free itself is a bitmap bit clear + used count decrement (+ optional zero-on-free)
XL: checks XL_MAGIC, optionally zeroes payload, unmaps the entire mapping

Deferred-Free Ring — Unintended Bonus

The deferred-free queue is a bounded per-page ring used to defer remote-thread mutation of pages it doesn't own. This gives a cheeky capability for detecting UAFs (if checks are enabled) and can delay reuse, acting as a pseudo temporal quarantine by preventing writes to pointers currently in the queue.

Security Strategy

Zialloc uses several integrity checks plus optional hardening toggles:

Pointer/header ownership checks before free and usable-size operations
Abort-on-corruption for invalid headers, bad transitions, and detected double frees
Segment integrity key/canary check in validation path
Optional zero-on-free memory scrubbing
Optional UAF check path in usable_size (aborts if slot is no longer marked allocated)

Known Limits

Heap layout is not optimal; metadata lookup/access isn't as good as radix trees
XL allocations are direct-mapped and behave differently from class-segmented allocations
Segment classing and fixed chunk geometry per segment trade memory efficiency for predictable behavior
Deferred ring for cross-thread frees is capped and may fall back to direct page free

Source Map

API entrypoints, init/teardown, stats: zialloc/alloc.cpp
Core allocator internals (heap/segment/page/cache/deferred free): zialloc/segments.cpp
OS mapping/protection/reservation wrappers: zialloc/os.cpp
Shared constants/macros/enums: zialloc/types.h, zialloc/mem.h
Memory interface declarations: zialloc/zialloc_memory.hpp

API Surface

Supported: malloc, free, realloc, calloc, usable_size, print_stats, validate_heap, get_stats, init, teardown

Not yet implemented: memalign, aligned_alloc, free_sized, realloc_array, bulk_free

Other Entries

Two other VRIG members submitted allocators for the competition:

share-rdAlloc — A 3-tiered allocator inspired by TCMalloc. Reserves 1 TB of virtual memory at init with a 512 GB small heap, 512 GB medium heap, and mmap-backed massive allocations (>8 MB). Uses thread-local free lists with batch refills of 256 objects and a per-class mutex for medium tiers. Named after an inside joke.
dualalloc — Supports two modes: a simple per-allocation mmap/munmap default mode, and a GAMBLE_MODE arena-based design with multiple arenas, spans, and block metadata including magic value validation. By Dylan Pachan.

The full technical details for all three allocators are on the RITSEC blog.