Indicate the general size of the value
How different are the values in the data set from each other
Discard or insert substitute values.
From reading instruments or recording values
double blind, factorial
representative
World is full of uncertainty. law of large numbers: proportion get closer and closer to a particular value
degree of belief:
subjective/personal probability: depends on who is assessing the probability
frequentist interpretation:frequencies/counts => probability
classical approach: all events are composed of a collection of equally likely elementary events
independence:independent/dependent
joint probability: two events will both occur
conditional probability: an event will happen if another one has occurred
Bayes's theorem: relate 2 conditional probabilities
P(A)*P(B|A)=P(B)*P(A|B)=P(AB)
P(B|A)=P(A|B)*P(B) / ( P(A|B)*P(B)+P(A|~B)*P(~B) )
Sample: subset of the complete 'population' of values
Random Variables: e.g. outcome of a throw of a die
Describe Distribution:
Discrete Random Variables:
Uniform Distribution: a random variable can take values only within some finite interval and it's equally likely that will take any of the values in that interval(postman arrives between 10am and 11am in a totally unpredictable way)
Exponential Distribution: lifetimes of glass vases
Normal/Gaussian Distribution: bell shaped
Central Limit Theorem: the larger samples we take, the better estimate we make.