Yash Srivastava

Data Engineer

About Me

Hi, my name’s Yash and I’m a recent graduate student from University of Maryland. Thank you for visiting my website and hit me up to discuss about recent happenings in data science and new oppurtunities.

I am most skilled in: Python, SQL and Machine Learning.

You can reach out to me via email.

Experience

NWO.AI

Data Engineer

March 2021 - Present

nwo.ai
  • Lead data engineering efforts and build social, financial, and eCommerce datasets ingesting 1M+ data points daily from 10+ sources resulting in insights and forecasting 1M+ trends and context spaces for Fortune 500 clients.
  • Identified new data sources & executed full-cycle development serving 2 clients and long-term business use cases.
  • Built event-driven and containerized data pipelines improving forecast and context-space generation time by 45s.
  • Overhauled monitoring, automation, and validation of BigQuery datasets increasing processing efficiency by 85%.
  • Technologies used: Python, SQL, GCP (BigQuery, Pub/Sub, Cloud Run/Functions), ETL pipelines, Airflow, Selenium, Git, HTML/CSS/JS

University of Maryland

Graduate Assistant (Social-Media Research)

September 2019 - December 2020

umd.edu
  • Channelized online interactions of 30+ companies from 6 social media platforms showing 25% increase in customer satisfaction and positive feedback using sentiment understanding models.
  • Investigated perception towards brand using statistical analysis reflecting 7% positive response.
  • Technologies used: Python (pandas/scikit-learn/numpy), Selenium, PyTorch, SQL, OpenCV, AWS

CERTIFY Health

Business Intelligence Engineer

June 2020 - August 2020

certify.me
  • Designed data synchronization API between Inventory & eCommerce platform fastening order fulfillment by 60%.
  • Organized data management from 10+ sources using ER diagrams & databases improving data retrieval by 80%.
  • Integrated FedEx and UPS APIs in CRM system reducing label generation and shipment times by 3 days.
  • Developed affiliate programs with WordPress and HubSpot for CERTIFY website recording monthly sales of $300,000+.
  • Executed two-week Google Analytics campaign in collaboration with India and China teams leading to 100,000+ website impressions.
  • Technologies used: Python, Excel, SQL, Tableau, PHP, Azure, Wordpress, Confluence, Lucidchart

Indian Institute of Information Technology, Sri City

Undergraduate Student Researcher

July 2017 - May 2019

iiits.ac.in
  • Surveyed 25+ criterion functions used with Convolutional Neural Networks (CNN) for Face Recognition tasks leading to research publication at NCVPRIPG 2019 (http://bit.ly/2Mdi1NG).
  • Conceptualized two deep-learning based Face Recognition methods with Hard-Mining Loss and Parametric-Sigmoid layers each showing 97% plus verification accuracy.
  • Investigated 10+ dataset and architecture advancements in Visual-Question Answering domain using Deep Learning with a conference paper in CVIP 2020 (http://bit.ly/3amuFSs).
  • Published two face recognition methods papers at CVIP 2020 (http://bit.ly/2NHT5Or) and CICT 2019 (http://bit.ly/3oy4IEp) receiving best paper award for Hard-Mining Loss.
  • Technologies used: Python (pandas/scikit-learn/numpy), PyTorch, Keras, MxNet, OpenCV, Tensorboard

CERN (European Organization for Nuclear Research)

Software Engineer (Google Summer of Code)

May 2018 - August 2018

home.cern
  • Spearheaded database migration from MySQL to Elasticsearch for DIRAC open-source project increasing process time efficiency of nuclear experiment data management and analysis by 70%.
  • Revamped database access functions to support Elasticsearch backend overseeing 50% boost in response time.
  • Programmed unit and integration tests producing 25+ tests for execution before production launch.
  • Technologies used: Python, SQL, ElasticSearch, pytest, Django, Git, Travis CI, Jenkins

Physiz

Data Scientist

August 2017 - September 2017

physiz.com
  • Built the leaf-detection pipeline from live-farm image feed using CNNs and PlantCV with 70% detection accuracy.
  • Surveyed 30+ deep-learning mechanisms for geographical nutrient supply analysis summarizing 3 top solutions for research and development.
  • Technologies used: Python (numpy/pandas/scikit-learn/matplotlib), Keras, CouchDB, OpenCV

Bobble AI

Software QA Engineer

May 2017 - June 2017

bobble.ai
  • Diagnosed 10 functional and performance bugs in the Bobble Keyboard application.
  • Programmed test scenarios for 15+ modules of the application covering 20 boundary cases.
  • Overhauled automation tests using Python-based Culebra tool fastening production testing time by 40%.
  • Summarized 5000+ Play Store application reviews with data analytics and visualization and identified 8 customer pain points and preferences.
  • Technologies used: Java, Python, UI Automator, Espresso, Android Studio, GitHub

Education

University of Maryland, College Park

M.S. Information Systems (GPA 3.93)

August 2019 - December 2020

Indian Institute of Information Technology, Sri City

B.S. (Honors) Computer Science and Engineering (GPA 3.6)

August 2015 - May 2019

Skills

  • Languages: Python, Java, R, HTML, JavaScript

  • Data: SQL, NoSQL, CSV/XML/JSON/Avro/Parquet, PyData (NumPy, Pandas, Matplotlib, Jupyter)

  • Frameworks: Postgres, BigQuery, ElasticSearch, Hadoop, Spark, Airflow, Scikit-Learn

  • Machine Learning: Classification, Regression, Clustering, Forecasting, Decision Trees, Neural Networks

  • Tools: Git, Linux, Google Cloud Platform, Docker, Pub/Sub, Selenium, Tableau

Projects

Airbnb Bookings Predictive and Business Analysis

R, Python, SQL, Google Cloud

  • Investigated 25K Airbnb properties to determine high booking rate rentals against parameters including amenities and neighborhoods using R visualizations and exploratory analysis.
  • Conducted predictive analysis to find significant factors affecting booking rates and obtained Kaggle AUC score of 94.018.

Reddit Flair Detector

Python, Machine Learning, Django, Heroku

  • Developed a Python application for flair detection of India subreddit posts using natural language processing and exploratory analysis.
  • Trained various machine learning models and achieved an accuracy of 78% with the Random Forest model.

Job Description and Candidate Success Prediction using BERT and Regression

Python, Machine Learning, BERT, TensorFlow

  • Designed a description-based Job Classifier using the BERT model and text processing with 70% prediction accuracy.
  • Built a Candidate-Success predictor for a given job based on candidate attributes and obtained 63% accuracy using Linear Regression.

Hard-Mining Loss for CNN-based Face Recognition

Python, PyTorch, Keras, Neural Networks

  • Conceptualized Hard-Mining loss function to re-model loss distribution for inter-class and intra-class faces.
  • Evaluated the function with ResNet and VGG CNN architectures on the PyTorch framework and obtained the best accuracy metric of 99.1% on the LFW dataset.

Visual Question Answering

Python, PyTorch, Keras, Neural Networks

  • Studied and implemented various Computer Vision and NLP based deep learning architectures for the VQA task to answer textual questions based on a given image.
  • Obtained the best accuracy of 67% over the stacked-attention model on VQA dataset.

English Premier League Insights

Python, Selenium, Pandas/NumPy/Scikit-Learn/Matplotlib, Google Cloud

  • Collected and processed English Premier League data for 11 seasons (2007-08 to 2017-18) using Selenium and Pandas.
  • Reported and visualized that good teams are improving at 5% higher rate than relegation-battling teams using offensive/defensive statistics, game points and predicted vs actual performance.

Terps Charge&Go

SQL, Python, ER Schema, Data Warehousing

  • Analysed and designed database system for the University of Maryland’s charging stations with ER model, relational schema and MS SQL.
  • Built web application over SQL to provide real-time analytics and visualization of users, vehicles, and stations.

A Little More About Me

Alongside my interests in data science and business intelligence some of my other interests and hobbies are:

  • Liverpool Football Club
  • Telling and listening stories
  • Table Tennis
  • Swimming
  • Trying out new cuisines every week