This year was my senior year of high school and I did a senior seminar on statistical programming.
I followed the curriculum of Data Science Specialization by Johns Hopkins on Coursera. I spent most days going through course material in the form of lectures and short assignments.
I covered the following subjects:
- The Data Scientist’s Toolbox
- R Programming
- Getting and Cleaning Data
- Exploratory Data Analysis
- Reproducible Research
- Regression Models
Final Project: I used a database of real estate purchases in Mass to first try to assess the value of a home. I used different regression models that factored in numerous data points on each property including lot size, number of bedrooms, and town. Then I realized that one of the data points was the actual appraised value of the house by a realtor. Bad news: I wasted a lot of time trying to predict something that already existed. Good news: I was actually pretty close! I then used this data to make a very interesting graph. I plotted the expected value of a house versus the actual amount sold for and found that the more you list over the appraised price, the less you will receive on average. Take a look in the slides below!