# STAT 202 ## Lecture 1 ## Introduction Prof. Adam Knapp
### What is _statistics_? The science of learning through - collecting, - organizing, - analyzing, and - presenting *data*.
### Looking at Data _Example_ _1_: The top grossing films for the weekend of December 29-31, 2017.
![Top grossing films Dec 29-31, 2017](movies-dec29-31-2017.png) Source: [http://www.boxofficemojo.com/weekend/chart/?view=main&yr=2017&wknd=52&p=.htm](http://www.boxofficemojo.com/weekend/chart/?view=main&yr=2017&wknd=52&p=.htm)
### Looking at Data _Example_ _2_: House sales in in Ames, Iowa from 2006 to 2010.
![Sample of housing data from Ames, Iowa 2006-2010](housing-data-sample.png) Source: [Ames Housing dataset. Dean De Cock, Truman State University](https://ww2.amstat.org/publications/jse/v19n3/decock.pdf)
_Definition_: **Cases** (also known as **observations**) are objects described by a set of data. _Definition_: A **variable** is some characteristic of a case. _Definition_: A **label** is a special variable which distinguishes different cases.
### What counts as a case? Example 1: (Movies) Each top grossing movie (Dec 29-31) and its relevant information.
Example 2: (Houses) Houses sold in Ames, Iowa from 2006-2010 with relevant information from their _last_ sale. > "Additionally, approximately 100 homes > changed ownership multiple times during the 4-year time > period. As this gave a greater weight to these particular > homes, I elected to keep only the most recent sales data > on any property." -- Dean De Cock
### Variables Example 1: (Top Grossing Movies) Variables: TW (This week's rank), LW (Last week's rank), Title, Studio, WE Gross, % change, Theater Count / Change, Average, total Gross, Budget, week # Label: Title
Example 2: (Ames Housing) Variables (in full dataset):
Id, MSSubClass, MSZoning, LotFrontage, LotArea, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborhood, Condition1, Condition2, BldgType, HouseStyle, OverallQual, OverallCond, YearBuilt, YearRemodAdd, RoofStyle, RoofMatl, Exterior1st, Exterior2nd, MasVnrType, MasVnrArea, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinSF1, BsmtFinType2, BsmtFinSF2, BsmtUnfSF, TotalBsmtSF, Heating, HeatingQC, CentralAir, Electrical, 1stFlrSF, 2ndFlrSF, LowQualFinSF, GrLivArea, BsmtFullBath, BsmtHalfBath, FullBath, HalfBath, BedroomAbvGr, KitchenAbvGr, KitchenQual, TotRmsAbvGrd, Functional, Fireplaces, FireplaceQu, GarageType, GarageYrBlt, GarageFinish, GarageCars, GarageArea, GarageQual, GarageCond, PavedDrive, WoodDeckSF, OpenPorchSF, EnclosedPorch, 3SsnPorch, ScreenPorch, PoolArea, PoolQC, Fence, MiscFeature, MiscVal, MoSold, YrSold, SaleType, SaleCondition, SalePrice
Label: Id
## Types of Variables Categorical - Nominal - Ordinal Quantitative - consistent measurement units hh:mm -> #h or #m - is it measuring what you want? (weekend gross vs average gross)
Distribution of a variable - frequency count (categorical) ex: rating: pg, pg-13, r