About Candidate


MS in Data Science for Public Policy MAY 2023

(Merit Scholarship, Experiential Learning Award) (GPA – 3.9/4) Relevant Coursework: Databases; Advanced Statistics; Time Series, Data Science I, II & III; Data Visualization; Social Network Analysis

Bachelor of Technology in Mechanical Engineering MAY 2018

Electives: Econometrics; Probability; Discrete Mathematical Structures; Operations Research; Financial Management

Work & Experience

Data Scientist (RA) 08/2022
CGSP, Georgetown University

* Implemented classification of Bill summaries by fine-tuning BERT using Pytorch framework on data extracted from API * Analyzed data on vocations schools by geocoding addresses, mapping using QGIS and visualizing data with geopandas, ggplot on Python

Data Science Research Scholar 12/2021
Massive Data Institute

* Obituaries Project: Assessed obituary texts using Named Entity Recognition, fuzzy matching, and record linkage techniques. (Git) * Evaluated name-based race prediction models through classification metrics and developed an ensemble model improving accuracy. * Gestational Diabetes Project: Combined and transformed datasets on R, and analyzed biomarkers data through visualizations. * Medicaid Project (ongoing): Developing an ETL pipeline to process claims of 90 million patients on Databricks with Spark SQL.

Data Science Intern 02/2022 - 08/2022
World Bank DIME, GUI2DE

* Predicted health risks by developing a Machine Learning pipeline for XGBoost, Random Forest algorithms using scikit-learn framework. * Constructed panel datasets from claims data of multiple health providers using R and generated insights through analysis.

Data Analyst 04/2021 - 06/2021

* Enhanced FDR’s 2021 Universal Healthcare proposal through data visualization aids on comparative analysis using Plotly in Python. * Built a Covid-19 surveillance dashboard on Tableau to analyze Covid case counts, deaths, oxygen supply requirements in India.

Global Planning analyst 07/2018 - 04/2021

* Streamlined data flow, created heat map tool using SQL and Tableau, identified 4 high-risk chemicals preventing a loss of $2.5MM p.a. * Created a SARIMA model on sales volume time series data using R to forecast Polyethylene sales, improving accuracy by 8%. * Developed customer segmentation tool based on K-means clustering; dynamic pricing tool using large data sets on costs. * Established framework to analyze risk contributing factors of high-risk assets and stood as finalist of EM Global Analytics challenge. * Led a cross-functional team of 12 and managed automation of tools, reducing manpower by an estimated 220 hrs/mth at IAC. * Identified gaps in fertilizer packaging and provided leads for EMCIPL’s entry into an estimated 160k Tons polymer market.