AI Engineer & Full Stack Developer

ETH Arena — AI Exam Prep Platform

Hybrid RAG exam preparation platform with in-browser PDF intelligence and AI-powered tutoring

Hybrid RAG

Pipeline

Local ONNX

Embedding

4-container

Services

The Problem

ETH Zurich students preparing for exams are forced to manually transcribe questions from PDF exam papers, lack targeted practice material for specific topics, and receive no structured feedback on their attempts. Existing study tools treat exam preparation as a static, one-directional content consumption problem rather than an interactive, AI-augmented learning loop.

Constraints

  • 1Zero-transcription workflow: students must be able to visually select questions directly from rendered PDFs without manual text entry
  • 2Hybrid intelligence model: semantic search must run locally for latency and cost control while reasoning-heavy tasks leverage cloud LLMs
  • 3Retrieval-first architecture: the system must prefer semantically similar existing questions over generation, falling back to LLM-generated content only when retrieval quality is insufficient
  • 4Multi-format answer support: the platform must handle numeric, true/false, multiple-choice, and open-ended responses with type-specific validation and UI
  • 5Production-ready multi-service orchestration with isolated persistence layers and reverse proxy routing

Approach

Designed a visual-first interaction model where users upload exam PDFs, render pages client-side via PDF.js, and drag-select question regions that are rasterized to PNG before upload. The backend extracts structured question representations using Gemini's vision capabilities, embeds the question text locally with ONNX Runtime (embeddinggemma-300m, 768-dimensional vectors), and queries Qdrant for semantically similar practice questions. When retrieval candidates fall below similarity thresholds or are too close to the source, the system generates novel practice questions via Gemini and persists them back into the vector store for future reuse. Progressive hinting is context-aware, building on previously given hints to become incrementally more specific without revealing the answer. Answer evaluation uses a mixed validation strategy: objective answers are checked client-side for instant feedback while open-ended responses are delegated to AI-backed evaluation with structured comparison against expected answers.

Architecture

The frontend is a Next.js 15 App Router application built with React 19, featuring TipTap rich text editing for open-ended answers, KaTeX-powered math rendering via remark-math and rehype-katex, and a thin explicit service layer for backend communication. The backend is a FastAPI service with Pydantic v2 contracts, asyncpg for PostgreSQL access, and lazy-loaded service initialization (both the Gemini client and ONNX embedding model load on first use, preventing boot-time fragility). Qdrant stores 768-dimensional question vectors with rich metadata payloads (source path, question type, point value, normalized JSON, validated answers). PostgreSQL serves as the transactional system of record for answer submissions. The production topology uses Docker Compose with Traefik as a reverse proxy: root routes to the frontend, /api routes strip-prefix to the backend. Both Qdrant and PostgreSQL startup paths include retry behavior for container orchestration resilience.

System Architecture
InputProcessingLLM + ToolsOutputStore

Results

Delivered a fully operational AI-powered exam preparation platform that eliminates manual question transcription entirely, generates contextually relevant practice material through a hybrid RAG pipeline that minimizes unnecessary LLM calls, and provides a complete tutoring feedback loop with progressive hints and structured answer evaluation. The local embedding strategy removes external API dependencies for search operations while the lazy initialization pattern ensures reliable service startup across development and production environments. The multi-service Docker orchestration supports reproducible deployments with a single command.

Learnings

  • 1

    Separating local embeddings (ONNX) from cloud reasoning (Gemini) is a powerful architectural pattern for controlling both latency and cost — semantic search runs without network overhead while expensive LLM calls are reserved for generation and evaluation.

  • 2

    A retrieval-first pipeline with generation fallback dramatically reduces unnecessary AI calls: most practice sessions can be served from the vector store, with LLM generation only triggered when the existing corpus lacks quality matches.

  • 3

    Lazy service initialization (deferring model loading and API client creation until first use) significantly improves startup reliability in containerized multi-service environments where dependency ordering is non-deterministic.

  • 4

    In-browser PDF rendering and client-side image rasterization keeps the interaction loop immediate and reduces backend payload sizes, but requires careful coordination between PDF.js worker threads and React component lifecycle.

Next Project

Gym PT App

Full-stack mobile app for gym management and real-time messaging