Projects
Things I’ve built.
Systems, networks, architecture, and a few side projects — mostly from coursework at IIT Delhi. Listed newest first. Most are in C/ C++ and Python.
Apr — May 2026
Deep Learning
Representation Learning, Vision–Language Modeling, and Diffusion
CLIP-style contrastive image–text pretraining, DINO self-supervised distillation, a two-stage vision–language model with chain-of-thought QA, and latent-diffusion generation — all from scratch on CLEVR-style data.
COL775 · A2 · w/ Yash Bansal, Tarang Shah
- CLIP from scratch — ViT-S/16 image encoder, causal-masked text transformer, dual-projection (CLS + GAP) with auxiliary contrastive loss; DropPath and bicubic positional-embedding interpolation for higher-resolution probing.
- DINO self-supervised distillation with multi-crop augmentation and centring; no text supervision.
- Two-stage vision–language model: image–text alignment, then chain-of-thought QA with explicit numerical-stability treatment of the CoT logits.
- VAE + Latent Diffusion Model trained end-to-end; quantitative comparison via linear probes, t-SNE, and cross-modal retrieval, plus qualitative generation analysis.
Apr — May 2026
Machine Learning
Visual Question Answering
A multimodal VQA system fusing a frozen ResNet-101 image encoder with a Transformer text encoder via multi-head cross-attention, finished with an MLP classifier.
COL774 · A4 · w/ Vipul Vaibhav
- ResNet-101 backbone with the final pooling/FC layers removed; projected to a joint embedding via a trainable linear layer.
- Transformer text encoder with learnable positional embeddings and padded-token masking; multi-head cross-attention from the [CLS] question token to image features for cross-modal fusion.
- Progressive training schedule: frozen backbone → fine-tuned image encoder → further regularization; standard cross-entropy with Adam.
- Zero-shot evaluation on held-out question types to probe compositional generalization.
Neural Machine Translation with Cross-Lingual Transfer
Four progressively stronger English → Hindi translation models, then transferred to a Hindi → Marathi task to exploit shared script and morphology.
COL775 · A1.2
- Implemented four modes side-by-side: GloVe + BiLSTM without attention, GloVe + BiLSTM + Bahdanau attention, frozen-BERT encoder + attention, and fine-tuned-BERT encoder + attention.
- Teacher-forcing schedule with annealed ratio, label smoothing, and beam-search decoding. Full hyperparameter sweep over learning rate, teacher-forcing ratio, and beam width.
- Cross-lingual transfer: the English→Hindi model was adapted to Hindi → Marathi, exploiting the shared Devanagari script and overlapping morphology, then analysed qualitatively.
ResNet-18 from Scratch, with Seven Normalization Schemes
Built ResNet-18 from scratch on a 100-class ImageNet subset, then trained it under seven different normalization variants for a controlled comparison.
COL775 · A1.1
- Every normalization implemented from scratch as nn.Module subclasses — Batch / Instance / Batch-Instance / Layer / Group / a custom BN / a no-norm baseline.
- Full modern training recipe: SGD with cosine annealing, MixUp (α = 0.4), RandAugment, label smoothing, mixed-precision, gradient clipping — 100 epochs across all seven variants.
- Sanity-checked the custom BN against PyTorch’s built-in (84.2% vs 84.5%; gap explained by Bessel-corrected variance).
- Grad-CAM analysis across visually similar classes (snakes, reptiles) to interrogate what each normalization actually learned to attend to.
Feb — Apr 2025
Operating Systems
Kernel Extensions in xv6
Five subsystems added to the xv6 teaching OS — authentication, syscall access control, custom interrupt handling, a priority-boosted scheduler, and disk-backed page swapping.
Prof. Smruti R. Sarangi
- Login authentication via Makefile macros with a retry-limited username/password system.
- Syscall-level access control (block / unblock other syscalls) and a persistent syscall-history mechanism.
- Custom interrupt handler supporting background, foreground, and user-defined modes.
- Modified scheduler: priority-boosted, with delayed-fork execution and per-process time limits.
- Page-swapping subsystem backed by a real disk swap partition with adaptive replacement.
Mar 2025
Parallel Programming
Parallel Matrix Modification on CUDA
A CUDA implementation that rearranges every element of an N×M matrix to the maximum of its top-left sub-matrix, in O(log n) parallel steps.
COL380 · A3
- Reduced the problem to counting sort + parallel prefix sum + binary search, with all three phases on the GPU.
- Implemented a work-efficient Blelloch scan for the prefix-sum step — O(log n) parallel steps versus O(n) sequential.
- Pinned host memory and multiple CUDA streams to overlap host ↔ device transfers with kernel execution.
- Benchmarked on matrices up to 10⁵ × 10⁵ with element ranges to 10⁸; reported execution-time scaling against matrix size and batch count.
Oct — Nov 2024
Computer Networks
TCP-like Reliability over UDP
A reliable file-transfer protocol built on UDP — application-layer ACKs, retransmission, TCP Reno + CUBIC congestion control, evaluated on Mininet.
Prof. Tarun Mangla
- ACK-based reliability over UDP: cumulative ACKs, retransmissions, and fast recovery driven by sequence numbers.
- TCP Reno congestion control — slow start, congestion avoidance, and timeout behaviour for throughput optimisation.
- TCP CUBIC implementation; head-to-head fairness / efficiency comparison against Reno under low- and high-latency regimes.
- Mininet + Ryu controller experiments quantifying throughput against packet loss and RTT.
Sep — Oct 2024
Computer Networks
Software-Defined Networking with Ryu
Four OpenFlow controllers on the Ryu framework — from a self-learning switch to a congestion-aware shortest-path router — evaluated on Mininet topologies.
Prof. Tarun Mangla · w/ Anubhav Pandey
- Self-learning switch + hub controller; verified the learned flow tables and measured 35.6 Gbps throughput on the learning path.
- Spanning Tree Protocol implementation to prevent loops on meshed topologies.
- Shortest-path routing via Dijkstra over the discovered topology.
- Congestion-aware routing using LLDP-based link-delay measurement and dynamic link-cost re-evaluation.
Aug — Sep 2024
Computer Networks
Concurrent Sockets and Server Scheduling
A client–server suite in C probing the effects of packet sizing, concurrency, and centralized versus decentralized request scheduling.
Prof. Tarun Mangla · w/ Anubhav Pandey
- Measured completion time as a function of packet size and client concurrency (1 – 32 concurrent clients).
- Implemented a “grumpy server” using multiple decentralized access protocols.
- Centralized scheduling algorithms (FCFS, round-robin variants) with logged completion-time analysis.
Mar — Apr 2024
Computer Architecture
Cache Hierarchy Simulator
A cycle-accurate cache simulator in C++ supporting every standard cache organisation with configurable replacement and write policies.
Prof. Kolin Paul
- Direct-mapped, set-associative, and fully-associative caches with LRU and FIFO replacement.
- Write-Through / Write-Back × Write-Allocate / Write-No-Allocate configurations for end-to-end efficiency analysis.
- CPU-cycle penalties modelled on cache misses to approximate real memory-system behaviour.
- Matplotlib sweeps across cache configurations over real load / store traces.
Jan — Feb 2024
Side project
Stock Trading Platform
A Flask web app for stock screening, multi-equity charting, and pluggable algorithmic-strategy backtesting.
Prof. Huzur Saran
- Screening by P/E, P/E/G, EPS, and other fundamentals; per-ticker RSI and VWAP charting.
- Multi-stock comparison on a single chart, with both relative and absolute pricing modes.
- Strategy framework with MACD, DMA, DMA++, Linear Regression, RSI, and ADX — run concurrently using threads.