Sayantan Satpati

About Me

An aspiring Data Scientist / Engineer with a background in software development & testing, I am most passionate about solving problems in the intersection of technology, science, and domain, with newly acquired skills, an insatiable intellectual curiosity, and the ability to uncover patterns from large data sets.

I always believed if there were a career, which could integrate my engineering and programming background with my interest in Mathematics, Statistics, and Technology then that would be the career I would be most happy pursuing; Data Science seemed to fit my criteria from all angles & drove me to pursue a part-time Masters program in Information and Data Science (MIDS 2016) from UC Berkeley School of Information, where I graduated from recently.

A strong believer of continuous education, and eager & curious to keep up with the latest advances in this ever changing landscape of Data Science / Artifical Intelligence, I constantly try to keep myself up to date by taking courses, reading research papers, or simply playing with some data set just to have some fun!

In my spare time, I play (and watch) soccer and cricket (Indian version of Baseball, only better!). Love hiking, traveling, leisure reading, and off late trying my hand at digital photography.

Deep Learning Projects

Neural Network From Scratch

Purpose of this project was to build and train a Neural Network from scratch (Forward/Back Prop) in order to predict the number of bikeshare users on a given day.

Technologies: Python, NumPy, Matplotlib, Jupyter

Find out more

Capstone Project (MIDS)

SmartCam

Video Analytics in the Cloud

New

The goal of this project was to design an inexpensive, scalable surveillance system that harnesses the latest computer vision and machine learning techniques to power video analytics in the cloud. Videos were recorded using Raspberry Pi fitted with camera modules, which were then uploaded to cloud (S3) for further video analytics (Face Detection, Image Classification) once motion was detected

The architecture, illustrated above, consisents of three major blocks:

At least one Raspberry Pi with a camera module.
Amazon Web Services (AWS) cloud infrastructure, comprising storage for video files, a database for metadata (e.g., source of video, timestamp, and information about contents), all of our backend video processing.
An interactive website with user interface

Technologies

Raspberry Pi + Camera Module: Webcam runnning Motion Detection
Python and openCV: Image Processing Algorithms (Motion Detection, Face Detection, Image Capture)
S3: Store Video files
DynamoDB: Store Video metadata
EC2: Run Face Detection and Image Classification
TensorFlow: Image Classification using Inception-V3 (Transfer Learning)
Bootstrap and Google Charts: UX

Hosted on ischool.berkeley.edu

Back my project

Analyzing & Visualizing User Reviews on Yelp

Purpose of this project was to analyze the user ratings & reviews using the yelp challenge dataset (Round 6), and come up with analytic dashboard for businesses on what is working for them, where are they going wrong, and how their competitors are doing.

Technologies: Python, Elasticsearch, AWS, iPython, Pandas, NumPy, Bootstrap, Tableau, D3, DC.js, Crossfilter, Leaflet.js

Find out more

Strava Leaderboard and Weather Analysis

Strava is a social fitness app for bikers and riders that allows tracking, analyzing, and quantifying their performance, and allows comparison and competition with other athletes. The goal of this project was to enrich Strava’s leaderboard with external data such as weather (Climatic Data Center's QCLCD).

Technologies: Python, AWS, MongoDB, Map/Reduce, iPython, NumPy, SciPy, Pandas, Scikit Learn, Matplotlib, Seaborn etc.

Find out more

Kaggle - Bike Sharing Demand

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.

In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.

Exploratory data analysis; feature engineering; Supervised machine learning (OLS, SVM, Ensemble etc.). Final Rank was 22.

Technologies: iPython, NumPy, SciPy, Pandas, Scikit Learn, Matplotlib, Seaborn etc.

Find out more

Applied Machine Learning Projects

Handwritten digit classification on the MNIST dataset with KNN, Naïve Bayes etc. Find out more
Text classification of newsgroup dataset using KNN, Naive Bayes, and Logistic Regression Find out more
Cluster Analysis of Mushroom Dataset using PCA/GMM Find out more

Technologies: iPython, NumPy, SciPy, Pandas, Scikit Learn, Matplotlib, Seaborn etc.

Topic Modeling on the Enron email dataset

Purpose of this project was to study the Enron email dataset with an objective to learn the top N topics, and the associated emails and words in each.

Technologies:

Hadoop/Spark Cluster Setup using Python and Fabric on IBM Softlayer Cloud
Data cleaning and pre-processing using map/reduce (python)
LDA using Scala, Spark, Mllib, and GraphX

Find out more

Measuring the effects of knowing the Price of a Wine on its Perceived Taste

In this experiment, we aim to understand whether knowledge of price impacts the enjoyment of wine. Using a randomized controlled experiment, three wines of similar varietal and vintage were served to participants (treatment & control) randomly, but with differing price points (approximately $10, $20 and $45).

We hypothesized that consumers who are exposed to the price of wine will experience/record a higher enjoyment of expensive wines and a lower enjoyment of cheaper wines. The results, while not statistically significant, show price does impact the enjoyment of wine.

Technologies: Experimentation using RCT on Human Subjects in Milpitas & Sonoma. Statistical Analysis (Linear Regression) using R.

Find out more

Distributed and Scalable Data Mining/ML Projects

Spark: Logistic Regression and SVM

Spark: Logistic Regression and SVM (Python/pyspark/AWS)

View on GitHub

Spark: K-Means and Linear Regression

Spark: K-Means and Linear Regression (Python/pyspark/AWS)

View on GitHub

Scalable Page Rank

Scalable Page Rank using Spark (Python/pyspark/AWS)

View on GitHub

Scalable Page Rank using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Graphs

Scalable Graph Algorithms (Shortest Path etc) using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Analysing Google N-Gram Dataset

Analysing Google N-Gram Dataset using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Scalable K-Means

Scalable K-Means using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Apriori - Basket Analysis

Scalable Apriori using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Spam Filter - Scalable Naive Bayes

Implement Bernoulli and Multinomial Naive Bayes using Map/Reduce

View on GitHub

Work Experience

Staff Engineer - ebay (2010 - Present)

Data quality and statistical analysis of Training/Testing Data used by MLR models to train/validate Search Ranking (Search Science).

Data quality and statistical analysis of search query rewrites and recall (Search Science).

Data parity and statistical analysis of large datasets (Impressions, Clicks, Views, Sales) (Search Backend).

Senior Software Engineer - Cognizant Technology Solutions US (2008 - 2010)

Software Product Developement (Java/J2EE/Spring/Hibernate/WebApps/Oracle)

Senior Software Engineer - Wipro Technologies (2004 - 2008)

Software Product Developement, Enhancement, and Maintenance (Java/J2EE/Spring/Struts/WebApps/Oracle)

Module Lead - i-Flex Solutions (2004 - 2004)

Software Product Developement, Enhancement, and Maintenance (Java/J2EE/WebApps)

Senior Software Engineer - Ushacomm India Pvt Ltd (2002 - 2004)

Software Product Developement, Enhancement, and Maintenance (Java/C++/Corba)

My GitHub

GitHub Contributions

Loading the data just for you.

GitHub Feed

Loading the data just for you.

Skills

For an exhaustive set of skills refer to LinkedIn

Java,Python,Hadoop,Algo/DSExpert

R,Pandas,NumPy,SciPy,ScikitIntermediate

Spark,MllibIntermediate

Scala, Data-Viz, Story TellingBeginner

Testimonials

In the 2 years that Sayantan was in my team, he has been a dedicated and dependable engineer throughout. Always available and always ready to take on more responsibility when asked. He worked hard but played hard as well. I would be glad to hire Sayantan for my team and am sure he will be a valuable asset for any organization.

Ritesh Trivedi
iTunes Engineering at Apple