Sayantan Satpati

Sayantan Satpati

Data Scientist by heart, Software Engineer by profession!

Contact Me

About Me

An aspiring Data Scientist / Engineer with a background in software development & testing, I am most passionate about solving problems in the intersection of technology, science, and domain, with newly acquired skills, an insatiable intellectual curiosity, and the ability to uncover patterns from large data sets.

I always believed if there were a career, which could integrate my engineering and programming background with my interest in Mathematics, Statistics, and Technology then that would be the career I would be most happy pursuing; Data Science seemed to fit my criteria from all angles & drove me to pursue a part-time Masters program in Information and Data Science (MIDS 2016) from UC Berkeley School of Information, where I graduated from recently.

A strong believer of continuous education, and eager & curious to keep up with the latest advances in this ever changing landscape of Data Science / Artifical Intelligence, I constantly try to keep myself up to date by taking courses, reading research papers, or simply playing with some data set just to have some fun!

In my spare time, I play (and watch) soccer and cricket (Indian version of Baseball, only better!). Love hiking, traveling, leisure reading, and off late trying my hand at digital photography.

Deep Learning Projects

NN

Neural Network From Scratch

Purpose of this project was to build and train a Neural Network from scratch (Forward/Back Prop) in order to predict the number of bikeshare users on a given day.

Technologies: Python, NumPy, Matplotlib, Jupyter

Find out more

Capstone Project (MIDS)


Yelp

Analyzing & Visualizing User Reviews on Yelp

Purpose of this project was to analyze the user ratings & reviews using the yelp challenge dataset (Round 6), and come up with analytic dashboard for businesses on what is working for them, where are they going wrong, and how their competitors are doing.

Technologies: Python, Elasticsearch, AWS, iPython, Pandas, NumPy, Bootstrap, Tableau, D3, DC.js, Crossfilter, Leaflet.js

Find out more

Strava

Strava Leaderboard and Weather Analysis

Strava is a social fitness app for bikers and riders that allows tracking, analyzing, and quantifying their performance, and allows comparison and competition with other athletes. The goal of this project was to enrich Strava’s leaderboard with external data such as weather (Climatic Data Center's QCLCD).

Technologies: Python, AWS, MongoDB, Map/Reduce, iPython, NumPy, SciPy, Pandas, Scikit Learn, Matplotlib, Seaborn etc.

Find out more

project name

Kaggle - Bike Sharing Demand

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.

In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.

Exploratory data analysis; feature engineering; Supervised machine learning (OLS, SVM, Ensemble etc.). Final Rank was 22.

Technologies: iPython, NumPy, SciPy, Pandas, Scikit Learn, Matplotlib, Seaborn etc.

Find out more

Applied Machine Learning

Applied Machine Learning Projects

  1. Handwritten digit classification on the MNIST dataset with KNN, Naïve Bayes etc. Find out more
  2. Text classification of newsgroup dataset using KNN, Naive Bayes, and Logistic Regression Find out more
  3. Cluster Analysis of Mushroom Dataset using PCA/GMM Find out more

Technologies: iPython, NumPy, SciPy, Pandas, Scikit Learn, Matplotlib, Seaborn etc.

Topic Modeling

Topic Modeling on the Enron email dataset

Purpose of this project was to study the Enron email dataset with an objective to learn the top N topics, and the associated emails and words in each.

Technologies:

  • Hadoop/Spark Cluster Setup using Python and Fabric on IBM Softlayer Cloud
  • Data cleaning and pre-processing using map/reduce (python)
  • LDA using Scala, Spark, Mllib, and GraphX

Find out more

Wine Experiment

Measuring the effects of knowing the Price of a Wine on its Perceived Taste

In this experiment, we aim to understand whether knowledge of price impacts the enjoyment of wine. Using a randomized controlled experiment, three wines of similar varietal and vintage were served to participants (treatment & control) randomly, but with differing price points (approximately $10, $20 and $45).

We hypothesized that consumers who are exposed to the price of wine will experience/record a higher enjoyment of expensive wines and a lower enjoyment of cheaper wines. The results, while not statistically significant, show price does impact the enjoyment of wine.

Technologies: Experimentation using RCT on Human Subjects in Milpitas & Sonoma. Statistical Analysis (Linear Regression) using R.

Find out more

Distributed and Scalable Data Mining/ML Projects

Spark: Logistic Regression and SVM

Spark: Logistic Regression and SVM (Python/pyspark/AWS)

View on GitHub

Spark: K-Means and Linear Regression

Spark: K-Means and Linear Regression (Python/pyspark/AWS)

View on GitHub

Scalable Page Rank

Scalable Page Rank using Spark (Python/pyspark/AWS)

View on GitHub

Scalable Page Rank using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Graphs

Scalable Graph Algorithms (Shortest Path etc) using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Analysing Google N-Gram Dataset

Analysing Google N-Gram Dataset using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Scalable K-Means

Scalable K-Means using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Apriori - Basket Analysis

Scalable Apriori using Map/Reduce (Python/mrjob/AWS)

View on GitHub

Spam Filter - Scalable Naive Bayes

Implement Bernoulli and Multinomial Naive Bayes using Map/Reduce

View on GitHub

More on GitHub

Work Experience

Staff Engineer - ebay (2010 - Present)

Data quality and statistical analysis of Training/Testing Data used by MLR models to train/validate Search Ranking (Search Science).

Data quality and statistical analysis of search query rewrites and recall (Search Science).

Data parity and statistical analysis of large datasets (Impressions, Clicks, Views, Sales) (Search Backend).

Senior Software Engineer - Cognizant Technology Solutions US (2008 - 2010)

Software Product Developement (Java/J2EE/Spring/Hibernate/WebApps/Oracle)

Senior Software Engineer - Wipro Technologies (2004 - 2008)

Software Product Developement, Enhancement, and Maintenance (Java/J2EE/Spring/Struts/WebApps/Oracle)

Module Lead - i-Flex Solutions (2004 - 2004)

Software Product Developement, Enhancement, and Maintenance (Java/J2EE/WebApps)

Senior Software Engineer - Ushacomm India Pvt Ltd (2002 - 2004)

Software Product Developement, Enhancement, and Maintenance (Java/C++/Corba)

My GitHub

GitHub Contributions

Loading the data just for you.


GitHub Feed

Loading the data just for you.