A collection of personal projects that i have worked on. A data(tail) of everything from basic cleaning data, to simple data exploration to predictive analytics. Feel free to share your feedback or even contribute (Github) 😄.

NYC Schools (R)

2021 | 02 | 01 Last compiled: 2021-07-21

This project analyzes the relationship between high school performance and the demographic makeup of schools in New York City. Additionally, a survey of parents, students and teachers perceptions on the quality of schools is analyzed and compared with actual performance.



Music Store (SQL & R)

2021 | 02 | 20 Last compiled: 2021-07-21

This project combines both R and SQLite to showcase the robustness of querying data from a database and visualizing observations in R. While some data cleaning and manipulation can and will be done using query language (SQLite), majority of the data work will be done using R.



USA Seafood (R)

2021 | 03 | 10 Last compiled: 2021-07-21

This project takes a glimpse at Seafood industry in the USA. As more people are opting against red meat diet for plant based and sea food, this might be a good time to analyze aquatic food sources. The environmental impact of industrialized red meat production and health concerns has made sea food a favourable alternative diet also termed as “pescatarian diet”.

This project analyzes two things: how environmentally sustainable is seafood? e.g. is there over-fishing occurring?

Another aspect is: does seafood pack the nutrients to replace meat based diet?



Baseball Database (SQL & R)

2021 | 03 | 28 Last compiled: 2021-07-21

This project about utilizing the power of SQL to create a database out of several Excel CSV files that exist individually. The goal is to create a database that will house the several CSV files under one roof. This includes creating a database Schema, linking the tables with primary keys and foreign keys. This project will look at Baseball, America’s favorite past time. It looks at game information and stats from a record of over 170,000 games. The games are are chronologically ordered and occur between 1871 and 2016.



Spam Filter (ML)

2021 | 04 | 05 Last compiled: 2021-07-21

This project utilizes the Naives Bayes Theorem to create a spam filter for text messages. This machine learning process will be trained, tested and validated on a dataset of 5,574 SMS messages that have already been classified. The goal is to have an algorithm that can accurately detect spam messages from incoming stream of texts.



Fitness Tracking (R)

2021 | 04 | 17 Last compiled: 2021-07-21

This project looks at my past 6 months in Germany, under a pandemic lockdown. Utilizing data from my FitBit collected between October 2020 and March 2021, i explore visually how the pandemic might have affected my activity, sleep and heart rate.



Movie Ratings (R)

2021 | 05 | 05 Last compiled: 2021-07-21

This project explores various movie rating websites. From web scraping techniques to extract data from the websites to cleaning and manipulating the data and finally visualizing the data. I analyze the transparency, consistency and variation in movie ratings on these sites.



Mental Health (SQL & ML)

2021 | 05 | 20 Last compiled: 2021-07-21

This project looks into how mental health can be predicted using predictors such as age, family history and gender. I tried to see if remote working (before the pandemic) would also be a predictor of the mental health of the respondents (working in the tech industry). Employing Logistic Regression, Decision Tree and Random Forest techniques to train and test models to predict mental health of respondents.