Enterprise-grade multi-level sales outlier forecasting combining statistical methods, ML algorithms, and hierarchical analysis with 92%+ precision.
About Me

Shaurya Jain
29 years old β’ New York City
Hey I'm Shaurya, a passionate Data Scientist and AI Engineer who believes in the transformative power of artificial intelligence. With 4+ years of experience in AI, I design, develop, and deploy cutting-edge software and AI solutions, specializing in production-grade multi-agent systems that solve real-world problems. I've led cross-functional teams, translating business goals into actionable Data & AI strategies that deliver measurable results.
Experience
AI Engineer
UPS - GenAI Systems
Data Science Intern
Novartis - Patient Support, Advance Therapy
ForecastX (Cost-Sensitive Forecasting)
ForecastX is a four-layer, cost-aware forecaster that predicts weekly demand for RLT Drugs (Pluvicto/Lutathera) 12 weeks ahead while ordering happens at Tβ6 week. It fuses a patient-lifecycle engine that projects repeat doses, cohort/seasonal guardrails for stability, a stack of time-series + gradient-boosted models trained to the quantile, and a residual-correction layer with conformal uncertainty bands. Deployed via Streamlit and PowerBI with drift monitoring and SLA alerts.
Dose Lifecycle & Adherence System
A sequencing pipeline that turns messy, multi-source dose events into clean patient journeys and reliable adherence KPIs. It uses identity resolution, dose-order reconstruction, and survival analysis (cycle-wise dropout hazards) to estimate projected returns and real-world adherence (mean ~3.8/6 doses). These KPIs feed ForecastXβs repeat-demand baseline and power site-level coaching.
MDM Semantic Matching (NLP + LLM)
A production canonicalizer for ADAR typos, acronyms, and multilingual HCP/site names. It pairs high-recall retrieval (BM25+fuzzy) with SBERT reranking and specialty rules. Low-confidence flows to an LLM adjudicator that outputs schema-constrained JSON. We dedupe via Louvain and keep it sharp with weekly active-learning updates.
Unused Dose Reason Classifier (OCR + Transformer)
The system takes scanned or online βunused doseβ forms, runs OCR (Azure Form Recognizer, Tesseract, TrOCR) with light cleanup, and classifies the reason (no-show, logistics, adverse event, prep error, scheduling, or cold-chain) using a fine-tuned BERT/DistilBERT model. We calibrate probabilities and set a clear confidence cutoff. If a case falls below that cutoff, a constrained LLM returns a structured JSON label with a confidence score, and truly ambiguous records go to a brief human review. We log outcomes for drift checks, retrain weekly with reviewer feedback, and surface trends in PowerBI so teams can fix the biggest sources of waste.
Senior Data Scientist
Zomato - Blinkit - Inventory ML
Milk Ordering (Perishables: FEFO + Probabilistic Reorder)
An anomaly-hardened perishables system that hits availability targets with minimal waste. It combines an Anomaly (STL/Hampel/IsolationForest), true-demand reconstruction, hierarchical probabilistic forecasts with newsvendor-aligned quantiles, and a FEFO policy (reorder + cadence by tier) executed via handheld prompts, with drift/reason-code monitoring for stability.
EventLift (Promo/Holiday/IPL Uplift Modeling)
An event-aware forecaster that learns uplift/decay so peaks are served without post-event overhang. It includes a curated calendar with lead/peak/decay windows, category-wise event embeddings and interactions, asymmetric, cost-aware training for must-win days, and a kill-switch if live sell-through lags.
PO Clubber (Scenario-Robust Purchase-Order Consolidation)
A predict-then-optimize system that turns spiky demand into robust PO batches. It fuses forecast scenarios from quantile models, a mixed-integer optimizer with MOQ/lead-time/capacity constraints, chance-constraint/SAA for high-probability fill, and ERP write-back with human overrides and infeasibility alerts.
Cost-Aware Short-Horizon Throughput
A cost-aware rush-hour control layer that turns 30 minute demand quantiles into simple, shop-floor actions. It fuses short-horizon quantile forecasts (day-part, weather, events, staffing), a queue-based mapper that converts demand into batch thresholds by store tier. Dashboards track realized cost per 1k units, DOI, and spoilageβoptimizing to business cost, not symmetric error.
BundleRanker (Combos & Bundling to Lift AOV)
A two-stage bundling engine that moves attachable pairs without cannibalizing heroes. It blends FP-Growth + attribute cosine for candidate pairs, a learning-to-rank scorer with margin/price-gap/complementarity features, elasticity-aware price guardrails, and CMS placements with auto-pullback if GM dips.
Data Scientist
StepSetGo - Growth & Product Analytics
RiskShield β Real-time, Cost-Aware Fraud Scoring
Real-time fraud detection model that scores events and makes allow/deny/review decisions under asymmetric loss. It fuses a multi-signal feature layer (velocity, cadence, geolocation, account history), a calibrated classifier with cost-curve thresholding to minimize expected business cost, and gray-zone reviewer routing with drift monitors and threshold auto-tuning. Deployed with reason-coded decisions.
GrowthSeg β Hybrid Segmentation
Growth Segmentation turns black-box clusters into launchable rule segments marketers can use directly. It fuses a feature grid across demo/psycho/geo/behavior, clustering with stability checks and CART-based rule distillation for interpretable "who/why," and a governed segment registry with priorities, mutual-exclusion, and burnout caps. Deployed into campaign tools with lift tracking and refresh SLAs.
Personalization Recommendation System
It personalizes activities/offers/products by mixing business eligibility with content signals while enforcing diversity and freshness. It uses TF-IDF vectors with cosine retrieval, a ranker that adds diversity penalties and recency decay and robust fallbacks for sparse catalogs. Evaluated via offline replay and online A/B Testing.
DataOpsHub (Events β Models β Dashboards)
DataOpsHub is an end-to-end analytics layer that puts funnels, cohorts, and KPIs on tap for product and marketing. It fuses event contracts into BigQuery raw, modeled tables for sessions/funnels/cohorts (partitioned/clustered), Metabase/Mixpanel dashboards and scheduled weekly reports. Built for governed self-serve with owner-tagged assets.

Data Scientist
Sociolla - E-commerce & Personalization
Personalized Recommendation System
A "beauty-taste" recommender that ranks products by cosine similarity. It combined item embeddings from TF-IDF + SBERT with price/discount scalars, time-decayed user vectors (buys>carts>views), ANN retrieval with MMR diversification and in-stock/policy filters with A/B testing to tune repetition caps and thresholds.
Product Popular Recommendation System
Trending and Look-alike modules that drive discovery. Built from decayed popularity over buys/adds/views by category Γ price band, item-item neighbors via co-occurrence lift and cosine similarity, Redis-backed precomputed lists with OOS guards and near-duplicate suppression, and switchback tests with coverage for offline gates.
"Bought Together" Recommendation System
A market-basket engine that ranks complements by expected profit. It used FP-growth on 90day baskets with support/confidence/lift thresholds, a profit-aware objective (lift Γ margin Γ eligibility) with inventory, light personalization and cart with causal attribution to confirm incrementality.
Churn Model
A retention scorer that flags who's fading and picks the cheapest effective nudge. It combined time-based labels with point-in-time joins, CatBoost on RFM + engagement features with monotone constraints and isotonic calibration, customer aliveness as a meta-feature, and uplift modeling to route users contentβreminderβcoupon rather than blanket discounts.
Real-time Reporting Framework
Partnered with DE/PM to turn day-old reports into near-real-time truth supporting models and ops. Stack: Kinesis/Glue streaming ETL with S3 Parquet tables for orders/events/inventory (partitioned by dt/hr) queried via Athena/Redshift to generate QuickSight exec/ops/growth views, and freshness SLAs, anomaly alerts, and label-delay monitors.

Software Engineer - Data Science
Netomi - Conversational AI & NLP
IntentRouter β Message/Email Classifier for Precise Queueing
A routing brain for support that turns messy threads into precise message/email queues under tight latency. It combines a BERT/SBERT multi-class classifier with temperature scaling, a collaborative-filter re-ranker trained on historical co-occurrence and time-ordered evaluation plus calibration monitors for stable performance week-to-week.
Intent Classifier β XGBoost + CF Router
A routing layer for support that fixes long-tail misroutes by re-ranking intents with collaborative filtering. It combines an XGBoost multi-class classifier on leak-safe TF-IDF/char n-grams with probability calibration.
Readability Grade System
A style grading system that keeps chatbot replies inside the target reading grade without hurting task success. It combines an XGBoost grade regressor on linguistic features with monotone constraints, dual-threshold soft guidance on high-traffic intents, locale-specific calibration with safe rewrite templates.
User Clustering β Playbooks & SLAs
A segmentation lens that maps sessions/users into value buckets to drive routing, SLAs, and agent playbooks. It combines RFM, issue, channel, and satisfaction features with robust scaling/encoding, MiniBatch k-means to bucket for outliers, and a feature-store refresh with stability tests to keep segments durable.
Kafka β Glue Streaming for <5-min Freshness
A streaming backbone that lands chat/email events in analytics within minutes with correctness and lineage. It combines versioned Kafka schemas and compat checks, Spark Structured Streaming on AWS Glue to partitioned S3, idempotent keys, watermarking, and replay for exactly-once semantics in practice, and Airflow orchestration with cost/throughput monitors and blue/green deploys.
Weekly Review Dashboards
Dashboards that turn metrics into weekly actions instead of fire-drills. It combines metric contracts with owners, formulas, and tests, notebook narratives with "next best action" and accountable owners, and experiment readouts with CIs, a changelog, and data-quality SLAs.
Skills & Expertise
Languages
Core programming & query languages
ML & AI Frameworks
Machine learning & AI frameworks
Cloud & Infrastructure
Cloud platforms & containerization
Databases & Storage
Data storage & processing systems
Visualization & Analytics
Data visualization & monitoring tools
Development & Ops
Development & workflow tools
Core Concepts & Expertise
Specialized knowledge & methodologies
Featured Projects
Advanced event-driven seasonality platform integrating historical patterns, seasonal trends, and event impact analysis for enterprise demand planning.
Revolutionary forecasting system combining Variational Autoencoders with time series methods for synthetic data generation and improved predictions.
Advanced multi-algorithm anomaly detection system for enterprise sales data with 95%+ precision across hierarchical levels.
Sophisticated Model Context Protocol integration platform with multi-server coordination, tool orchestration, and intelligent agent creation.
Enterprise-grade multi-agent AI system using LangGraph for coordinated search & rescue operations with real-time decision making.
Advanced ML forecasting platform with time-series analysis, demand prediction, and supply chain optimization for enterprise operations.
Sophisticated multi-agent AI platform with LangGraph orchestration, distributed processing, and intelligent task coordination.
Enterprise AI-powered incident response orchestration platform with automated workflows, intelligent alerting, and comprehensive monitoring.
Sophisticated full-stack portfolio platform featuring AI integration, dynamic content management, responsive design, and comprehensive project showcase capabilities.
Comprehensive bike-sharing analytics platform processing 6,000+ bikes with spatial analysis, mobility patterns, and interactive visualizations.
Hi! I'm Shaurya's Digital Twin
Ask me anything about Shaurya's experience, skills, projects, or career journey. I'm equipped with his complete professional knowledge base.
Quick Start Questions
Click on any topic to get started
Academic Journey
Discover my educational background and academic achievements
Professional Experience
Explore my career path and professional achievements
Notable Projects
Discover my key projects and technical achievements
Technical Skills
Learn about my technical expertise and skill set