Projects · Ankit Dhandharia

SOFTWARE · FULL-STACK 2021 · PRODUCTION

Blogging Platform — Full-Stack CMS

Engineered the backend architecture for a content-management blogging platform as PHP Programmer at Writin. Built RESTful APIs for user authentication, post CRUD, and commenting systems. Designed the MySQL schema, integrated with the frontend team, and wrote responsive HTML/CSS/JS. Optimized database queries for real-world traffic patterns.

PHPMySQLREST APIsAuthHTML/CSS/JSResponsive Design

SOFTWARE · FULL-STACK 2021

Dynamic Web App Platform

As PHP Web Developer at Notebooknb, built and maintained dynamic web applications contributing to the company's core product. Designed normalized MySQL schemas optimized for data-storage and retrieval operations, and implemented responsive UIs using HTML5, CSS3, and JavaScript ensuring cross-browser compatibility.

PHPMySQLHTML5CSS3JavaScript

SOFTWARE · DATA · API 2023–24

BI Reporting REST APIs

Built REST APIs at Rajlaxmi Solutions that powered the company's BI tooling — serving analytics data to internal dashboards and 40+ client-facing reports. Designed for throughput and pagination, handled auth, rate limiting, and consistent error responses. Contributed to Rajlaxmi winning an Industry Excellence Award for client analytics and insights.

PythonREST APIsSQLAuthJSONBI Integration

SOFTWARE · FRONT-END 2026

Personal Portfolio Website

This site — hand-coded from a blank canvas with no frontend framework. Custom design system in vanilla CSS (Fraunces + JetBrains Mono + Instrument Serif), responsive across mobile/tablet/desktop, scroll-reveal animations via IntersectionObserver, interactive SVG illustrations for project cards, client-side form handling. Demonstrates attention to craft: typography, motion, spacing.

HTML5CSS3Vanilla JSSVGResponsiveA11y

SOFTWARE · ALGORITHMS CS 600 · 2025

Advanced Algorithms & Data Structures

Graduate-level algorithm design implementations from CS 600 at Stevens. Heap traversal orders, Huffman coding, Prim-Jarník MST optimization to O(n²), and dynamic programming variants. Implementations in Python with time-complexity analysis, Big-O proofs, and benchmarks comparing naive vs optimized versions.

PythonC++Big-O AnalysisGraph AlgorithmsDP

AI / ML · GENERATIVE AI SPRING 2025

Scalable RAG Pipeline for Document Analysis

Architected a zero-cost RAG pipeline using LangChain with local HuggingFace embeddings (384-dim) — eliminating cloud API spend while preserving semantic-similarity fidelity on large corpora. Persistent vector storage via ChromaDB + SQLite delivers sub-second retrieval across 400+ page documents. Tuned chunking with RecursiveCharacterTextSplitter.

PythonLangChainChromaDBHuggingFaceSQLiteuv

AI / ML · COMPUTER VISION 2025

Image Classification with CNNs

Convolutional neural network for multi-class image classification built in PyTorch. Data augmentation pipeline (random crops, flips, normalization), training loop with learning-rate scheduling, checkpointing, and TensorBoard logging. Benchmarked against scikit-learn classical baselines — a useful lesson in when deep learning is actually worth the complexity.

PyTorchCNNstorchvisionNumPyMatplotlibTensorBoard

AI / ML · NLP 2024

Sentiment Analysis on Review Data

NLP pipeline for binary and multi-class sentiment on text data. TF-IDF + Logistic Regression baseline, then fine-tuning a pretrained transformer (DistilBERT) for comparison. Proper train/val/test split, cross-validation, confusion matrices, and calibration analysis. Shows understanding of the whole stack — not just the fashionable bit.

PythonHuggingFacescikit-learnNLPTF-IDFTransformers

AI / ML · RECOMMENDATIONS 2024

Collaborative Filtering Recommender

Movie/product recommender system using matrix factorization (SVD, ALS) and item-based collaborative filtering. Evaluated with RMSE, precision@k, and recall@k on held-out test set. Also implemented a content-based hybrid layer using TF-IDF over product descriptions for cold-start handling.

PythonSurprisescikit-learnPandasSVDALS

DATA ENG · STREAMING FALL 2025

End-to-End Pipeline with Streaming Ingestion

Built a Medallion (Bronze/Silver/Gold) pipeline using Kafka & AWS Kinesis for real-time ingestion and Spark for distributed processing. Cut compute costs 40% via Incremental Materialization. Metadata-driven Gold layer in dbt Core with Jinja consolidated per-table SQL into a single reusable macro. SCD Type 2 + dbt tests + freshness checks ensure point-in-time accuracy.

AWS S3KafkaKinesisSnowflakedbt CoreSparkAirflow

DATA ENG · PRODUCTION 2023–24

Multi-Client Warehouse & Analytics Platform

The data platform I built and operated at Rajlaxmi Solutions: ELT/ETL pipelines processing 500k+ daily records across 40+ regional clients, consolidated into Snowflake with dimensional modeling. Parallel extraction and automated scheduling cut pipeline latency by 35%. Schema validation, anomaly detection, and SLA alerting kept company-wide KPIs honest — contributed to an Industry Excellence Award.

PythonSparkAirflowSnowflakePower BIREST APIs

DATA ENG · MIGRATION 2023

Legacy Database Cloud Migration

Led migration of 200GB+ of legacy databases into modern cloud frameworks at Rajlaxmi. Query performance improved 25% through indexing strategies, query rewriting, and adoption of columnar storage best practices. Built Airflow-scheduled ingestion & transformation that eliminated 15+ hours of manual reporting per month and established KPI definitions still in use today.

SQLPythonAirflowIndexingColumnar Storage

DATA ENG · BIG DATA FALL 2024

NYC Taxi Trip Analytics Pipeline

Scalable big-data pipeline analyzing millions of NYC taxi trips using Hadoop MapReduce and HBase to generate revenue, operational, and customer-behavior insights. Explored distributed data processing patterns, partitioning strategies, and HBase row-key design for efficient time-series lookups. Coursework project that deepened my understanding of batch big-data systems beyond Spark.

HadoopMapReduceHBaseJavaHDFS

DATA ENG · STREAMING SPRING 2025

Patient Alert ETL — Real-time Vitals

Real-time data pipeline monitoring vital health parameters from simulated IoT devices in hospitals. Apache Kafka for streaming ingestion, Spark Structured Streaming for windowed aggregations, and HBase for low-latency lookups of patient history. Threshold-based alerting on anomalous readings — an exercise in building end-to-end event-driven systems with real latency constraints.

KafkaSpark StreamingHBasePythonEvent-driven

DATA ENG · EDUCATIONAL 2024

MapReduce Explained — Patterns & Optimizations

Hands-on repository for learning distributed data processing using Python and Hadoop Streaming. Covers real-world MapReduce patterns (word count, inverted index, joins, top-K, secondary sort), optimization techniques (combiners, custom partitioners), and scale-testing notes. Built as a teaching artifact — the one I wish I'd had when I started with Hadoop.

PythonHadoop StreamingHDFSMapReduce Patterns

DATA · ANALYTICS · BUSINESS FALL 2024

ATM Refill Frequency Analysis

Business-driven ETL and analytics project for a Spar Nord Bank case study analyzing refilling frequency of ATMs across Europe. Ingested transactional and demographic data, built a dimensional model for location-over-time analysis, and produced an analytical layer + dashboard supporting operational decisions around cash logistics. End-to-end: business question → data model → insight → recommendation.

PythonPostgreSQLAirflowPower BIDimensional ModelingBusiness Case

ANALYTICS · BUSINESS · DATA SCI 2025

Retail Demand Forecasting & Inventory Analytics

Data-driven sales reporting and demand forecasting system deployed at Barnes & Noble College (Hoboken, NJ). Analyzed historical POS data to predict textbook demand, which improved inventory accuracy by 30%, minimized stockouts & overstock, and reduced checkout times by 20% during peak hours. Customer satisfaction: 95%.

PythonPandasTime SeriesForecastingPOS DataReporting

ANALYTICS · BI · DASHBOARDS 2023–24

Executive KPI Dashboards

Power BI and Tableau dashboards built during my time at Rajlaxmi — exposing operational and customer-facing KPIs across revenue, conversion, churn, and data quality. Defined canonical metrics, built semantic layer on top of Snowflake warehouse, created drill-downs for 40+ clients, and drove adoption with department leads. Saved 15+ hours of manual reporting per month.

Power BITableauSQLSnowflakeKPI DesignStorytelling

DATA SCIENCE · EDA 2023

Exploratory Data Analysis — KPI Definition

Rigorous EDA work during my Rajlaxmi internship: distribution analysis, correlation matrices, outlier detection, and hypothesis testing on business data to define critical KPIs. Automated reporting workflows from findings. Solid proof-of-work that applied statistics and careful data-inspection translate directly to business impact.

PythonPandasNumPySeabornMatplotlibStatistics

19 projects
across every domain
I work in.