About Me
Hi, my name’s Yash and I’m a recent graduate student from University of Maryland. Thank you for visiting my website and hit me up to discuss about recent happenings in data science and new oppurtunities.
I am most skilled in: Python, SQL and Machine Learning.
You can reach out to me via email.
Experience
- Lead data engineering efforts and build social, financial, and eCommerce datasets ingesting 1M+ data points daily from 10+ sources resulting in insights and forecasting 1M+ trends and context spaces for Fortune 500 clients.
- Identified new data sources & executed full-cycle development serving 2 clients and long-term business use cases.
- Built event-driven and containerized data pipelines improving forecast and context-space generation time by 45s.
- Overhauled monitoring, automation, and validation of BigQuery datasets increasing processing efficiency by 85%.
- Technologies used: Python, SQL, GCP (BigQuery, Pub/Sub, Cloud Run/Functions), ETL pipelines, Airflow, Selenium, Git, HTML/CSS/JS
University of Maryland
Graduate Assistant (Social-Media Research)
September 2019 - December 2020
umd.edu
- Channelized online interactions of 30+ companies from 6 social media platforms showing 25% increase in customer satisfaction and positive feedback using sentiment understanding models.
- Investigated perception towards brand using statistical analysis reflecting 7% positive response.
- Technologies used: Python (pandas/scikit-learn/numpy), Selenium, PyTorch, SQL, OpenCV, AWS
- Designed data synchronization API between Inventory & eCommerce platform fastening order fulfillment by 60%.
- Organized data management from 10+ sources using ER diagrams & databases improving data retrieval by 80%.
- Integrated FedEx and UPS APIs in CRM system reducing label generation and shipment times by 3 days.
- Developed affiliate programs with WordPress and HubSpot for CERTIFY website recording monthly sales of $300,000+.
- Executed two-week Google Analytics campaign in collaboration with India and China teams leading to 100,000+ website impressions.
- Technologies used: Python, Excel, SQL, Tableau, PHP, Azure, Wordpress, Confluence, Lucidchart
Indian Institute of Information Technology, Sri City
Undergraduate Student Researcher
July 2017 - May 2019
iiits.ac.in
- Surveyed 25+ criterion functions used with Convolutional Neural Networks (CNN) for Face Recognition tasks leading to research publication at NCVPRIPG 2019 (http://bit.ly/2Mdi1NG).
- Conceptualized two deep-learning based Face Recognition methods with Hard-Mining Loss and Parametric-Sigmoid layers each showing 97% plus verification accuracy.
- Investigated 10+ dataset and architecture advancements in Visual-Question Answering domain using Deep Learning with a conference paper in CVIP 2020 (http://bit.ly/3amuFSs).
- Published two face recognition methods papers at CVIP 2020 (http://bit.ly/2NHT5Or) and CICT 2019 (http://bit.ly/3oy4IEp) receiving best paper award for Hard-Mining Loss.
- Technologies used: Python (pandas/scikit-learn/numpy), PyTorch, Keras, MxNet, OpenCV, Tensorboard
CERN (European Organization for Nuclear Research)
Software Engineer (Google Summer of Code)
May 2018 - August 2018
home.cern
- Spearheaded database migration from MySQL to Elasticsearch for DIRAC open-source project increasing process time efficiency of nuclear experiment data management and analysis by 70%.
- Revamped database access functions to support Elasticsearch backend overseeing 50% boost in response time.
- Programmed unit and integration tests producing 25+ tests for execution before production launch.
- Technologies used: Python, SQL, ElasticSearch, pytest, Django, Git, Travis CI, Jenkins
- Built the leaf-detection pipeline from live-farm image feed using CNNs and PlantCV with 70% detection accuracy.
- Surveyed 30+ deep-learning mechanisms for geographical nutrient supply analysis summarizing 3 top solutions for research and development.
- Technologies used: Python (numpy/pandas/scikit-learn/matplotlib), Keras, CouchDB, OpenCV
- Diagnosed 10 functional and performance bugs in the Bobble Keyboard application.
- Programmed test scenarios for 15+ modules of the application covering 20 boundary cases.
- Overhauled automation tests using Python-based Culebra tool fastening production testing time by 40%.
- Summarized 5000+ Play Store application reviews with data analytics and visualization and identified 8 customer pain points and preferences.
- Technologies used: Java, Python, UI Automator, Espresso, Android Studio, GitHub
Education
University of Maryland, College Park
M.S. Information Systems (GPA 3.93)
August 2019 - December 2020
Indian Institute of Information Technology, Sri City
B.S. (Honors) Computer Science and Engineering (GPA 3.6)
August 2015 - May 2019
Skills
-
Languages: Python, Java, R, HTML, JavaScript
-
Data: SQL, NoSQL, CSV/XML/JSON/Avro/Parquet, PyData (NumPy, Pandas, Matplotlib, Jupyter)
-
Frameworks: Postgres, BigQuery, ElasticSearch, Hadoop, Spark, Airflow, Scikit-Learn
-
Machine Learning: Classification, Regression, Clustering, Forecasting, Decision Trees, Neural Networks
-
Tools: Git, Linux, Google Cloud Platform, Docker, Pub/Sub, Selenium, Tableau
Projects
Airbnb Bookings Predictive and Business Analysis
R, Python, SQL, Google Cloud
- Investigated 25K Airbnb properties to determine high booking rate rentals against parameters including amenities and neighborhoods using R visualizations and exploratory analysis.
- Conducted predictive analysis to find significant factors affecting booking rates and obtained Kaggle AUC score of 94.018.
Reddit Flair Detector
Python, Machine Learning, Django, Heroku
- Developed a Python application for flair detection of India subreddit posts using natural language processing and exploratory analysis.
- Trained various machine learning models and achieved an accuracy of 78% with the Random Forest model.
Job Description and Candidate Success Prediction using BERT and Regression
Python, Machine Learning, BERT, TensorFlow
- Designed a description-based Job Classifier using the BERT model and text processing with 70% prediction accuracy.
- Built a Candidate-Success predictor for a given job based on candidate attributes and obtained 63% accuracy using Linear Regression.
Hard-Mining Loss for CNN-based Face Recognition
Python, PyTorch, Keras, Neural Networks
- Conceptualized Hard-Mining loss function to re-model loss distribution for inter-class and intra-class faces.
- Evaluated the function with ResNet and VGG CNN architectures on the PyTorch framework and obtained the best accuracy metric of 99.1% on the LFW dataset.
Visual Question Answering
Python, PyTorch, Keras, Neural Networks
- Studied and implemented various Computer Vision and NLP based deep learning architectures for the VQA task to answer textual questions based on a given image.
- Obtained the best accuracy of 67% over the stacked-attention model on VQA dataset.
English Premier League Insights
Python, Selenium, Pandas/NumPy/Scikit-Learn/Matplotlib, Google Cloud
- Collected and processed English Premier League data for 11 seasons (2007-08 to 2017-18) using Selenium and Pandas.
- Reported and visualized that good teams are improving at 5% higher rate than relegation-battling teams using offensive/defensive statistics, game points and predicted vs actual performance.
Terps Charge&Go
SQL, Python, ER Schema, Data Warehousing
- Analysed and designed database system for the University of Maryland’s charging stations with ER model, relational schema and MS SQL.
- Built web application over SQL to provide real-time analytics and visualization of users, vehicles, and stations.
A Little More About Me
Alongside my interests in data science and business intelligence some of my other interests and hobbies are:
- Liverpool Football Club
- Telling and listening stories
- Table Tennis
- Swimming
- Trying out new cuisines every week