How I Built Iteratione: Scaling Untrusted Code Execution with Docker & Redis

How do you safely run hundreds of untrusted student programs without taking down your server? A deep dive into the architecture behind Iteratione, featuring real time WebSocket streaming, Redis queues, and Docker out of Docker isolation.

·10 min read

Let's be honest. Building an online judge platform is terrifying. You are literally inviting college students to run their arbitrary code on your servers. As developers, we spend most of our time trying to keep malicious actors out of our infrastructure. With Iteratione, I had to do the exact opposite.

I needed a way for computer science students to write, run, and auto grade their Java code in real time without letting a single bad loop take down the whole system. Here is a look at how I handled 100+ concurrent code executions using Fastify, Redis, BullMQ, and a Docker out of Docker setup.

1. Defending Against the "Billion Laughs"

If you execute student code directly on your main backend process, you are exactly one while(true) loop away from a total server blackout. But infinite loops are actually the easy part.

When you open the execution floodgates, you have to defend against memory bombs that try to allocate massive amounts of RAM, fork bombs that endlessly spawn child processes, and directory traversal scripts trying to read your local environment files. To survive this, I had to drastically separate the code submission step from the actual execution step.

2. Worker Pools and Message Queues

Iteratione is broken down into four distinct microservices orchestrated via Coolify. This separation guarantees that if the execution engine crashes and burns, the web platform stays online.

  • The Fastify Backend: Handles Better Auth, PostgreSQL database queries via Prisma, and routing. It never actually touches raw code execution.
  • The WebSocket Runner: Provides the live interactive terminal experience.
  • The BullMQ Grader: A background worker that pulls from Redis, runs the code against hidden tests, and records the final grade.
  • The Execution Nodes: Ephemeral Alpine Linux Docker containers where the code is executed and then immediately destroyed.

My API basically acts as a traffic cop. When a student submits code, the Fastify server pushes the payload to a Redis queue, returns a tracking ID, and moves on. A background worker picks up the job from Redis, spins up a container, runs the code, and pushes the result back.

3. The Real Time WebSocket Bridge

For the interactive Runner, a standard Redis queue was way too slow. Students expect a console where they can type inputs mid execution and see responses instantly. I used @fastify/websocket to build a persistent, two way bridge between the browser and the container.

When a user clicks "Run", the Runner service spawns a Docker container attached to a pseudo TTY. I pipe the Docker stdout straight to the WebSocket, and I funnel the incoming WebSocket payloads right into the Docker stdin.

typescript
// Spawning the jailed container
dockerProcess = spawn('docker', [
  'run',
  '-i',
  '--rm',
  '--network',
  'none', // Kill the internet
  '-m',
  '256m', // Cap the memory
  '--cpus',
  '0.5', // Cap the processing power
  '-v',
  `${tempDir}:/app`,
  'eclipse-temurin:21-jdk-alpine',
  'java',
  mainClass,
])

// Streaming stdout to the frontend in real-time
dockerProcess.stdout.on('data', (chunk) =>
  socket.send(JSON.stringify({ type: 'stdout', data: chunk.toString() })),
)

// Piping frontend keystrokes into the container
socket.on('message', (msg) => {
  dockerProcess.stdin.write(msg.data + '\n')
})

4. Security via Docker out of Docker

To orchestrate all of this inside my own Coolify environment, I utilized the Docker out of Docker pattern.

Unlike Docker in Docker which runs a nested daemon and creates huge security risks, my pattern shares the host machine's Docker daemon. The Runner and Grader Node.js containers just have the Docker CLI installed. They issue commands to the host using a Unix socket mount at /var/run/docker.sock.

By talking directly to the host daemon, I use Linux cgroups to enforce strict jail conditions.

  • CPU: --cpus 0.5 prevents a single thread from pegging the host processor and starving other executions.
  • Memory: -m 256m imposes a hard cap. If a student makes a massive array, Java throws an OutOfMemoryError instead of killing my server.
  • Network: --network none unplugs the virtual ethernet cable so the code cannot make HTTP requests or map the internal network.
  • Timeout: A 120 second watchdog timer sends a SIGKILL to the container to destroy infinite loops.

This setup introduced a massive headache known as the Volume Trap. When my Node container told the host Docker daemon to mount a temporary workspace, the container booted up completely empty. The host daemon was looking for that folder on its own hard drive instead of inside the Node container's isolated file system. I fixed this by creating a global /iteratione-workspaces volume mounted symmetrically across the host and the workers.

5. The Nightmare of Auto Grading Java

Building a grader for Python is easy since you just run the file. Java is a logistical nightmare because of its strict package rules. A file with package com.iteratione.math; must physically live inside a /com/iteratione/math/ folder to compile.

I did not want to force students to upload complex zip files. Instead, my Grader accepts flat files and reconstructs the directory tree on the fly. Before compilation, I use regex to read the package declaration of every uploaded file, generate the necessary folders, and move the files into place.

typescript
async function prepareJavaProject(files: any[], tempDir: string) {
  for (const file of files) {
    const packageMatch = file.content.match(/package\s+([\w.]+);/)

    // Convert 'com.iteratione.math' to 'com/iteratione/math'
    const pkgPath = packageMatch
      ? path.join(tempDir, packageMatch[1].replace(/\./g, '/'))
      : tempDir

    await fs.mkdir(pkgPath, { recursive: true })
    await fs.writeFile(path.join(pkgPath, file.name), file.content)
  }
}

This abstraction allows Iteratione to support complex object oriented submissions while keeping the platform completely user friendly.

6. Wrapping Up the Beta

Deploying Iteratione to production taught me a lot about modern DevOps. Moving from localhost to a microservice environment forced me to deal with stateless architecture, message brokers, and kernel level resource limits.

The end result is a platform that feels as fast as a native IDE but runs with the security of a batch processor. Looking ahead, I am exploring Firecracker MicroVMs to reduce execution boot times to milliseconds and looking into eBPF for deeper security monitoring.

Iteratione is officially in Beta right now. The labs are open. Let the code run.