IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Note-Statistics: A Very Short Introduction

    Ivan Cai发表于 2015-03-28 17:04:00
    love 0

    Simple Summary Statistics

    Averages

    Indicate the general size of the value

    • Mean: extreme values sensitive
    • Median
    • Mode

    Dispersion:

    How different are the values in the data set from each other

    • Range: difference between the largest and smallest values in the data set (ignore most of the data)
    • Variance (mean squared deviation)
    • Standard deviation: square root of variance

    Skewness:

    • Right Skewed
    • Left Skewed

    Quantiles:

    • Deciles: divide into tenths
    • Percentiles: divide into 100ths

    Collecting Good Data

    Incomplete Data:

    Discard or insert substitute values.

    Incorrect Data:

    From reading instruments or recording values

    Error propagation

    Preprocessing

    Observational versus Experimental Data:

    • Observational: cannot interfere or intervene in the process of capturing data
    • Experimental: manipulate the objects in some way(effective at sorting out what causes what)

    Subdisciplines:

    Experimental Design:

    double blind, factorial

    Survey Sampling:

    representative

    • Law of large numbers
    • Central limit theorem

    Probability

    The Essence of Chance

    World is full of uncertainty. law of large numbers: proportion get closer and closer to a particular value

    Understand Probability

    degree of belief:

    • 1:certain
    • 0:impossible
    • 0~1:probability of happening

    subjective/personal probability: depends on who is assessing the probability

    frequentist interpretation:frequencies/counts => probability

    classical approach: all events are composed of a collection of equally likely elementary events

    Law of Chance

    independence:independent/dependent

    joint probability: two events will both occur

    conditional probability: an event will happen if another one has occurred

    Bayes's theorem: relate 2 conditional probabilities

    P(A)*P(B|A)=P(B)*P(A|B)=P(AB)

    P(B|A)=P(A|B)*P(B) / ( P(A|B)*P(B)+P(A|~B)*P(~B) )

    Random variables and Their Distributions

    Sample: subset of the complete 'population' of values

    Random Variables: e.g. outcome of a throw of a die

    Describe Distribution:

    • cumulative probability distribution
    • probability density curve

    Discrete Random Variables:

    • Bernouli distribution: toss a coin
    • binomial distribution: toss a coin 100 times
    • Poisson distribution: emails arriving at my computer(no upper limit)

    Uniform Distribution: a random variable can take values only within some finite interval and it's equally likely that will take any of the values in that interval(postman arrives between 10am and 11am in a totally unpredictable way)

    Exponential Distribution: lifetimes of glass vases

    Normal/Gaussian Distribution: bell shaped

    Central Limit Theorem: the larger samples we take, the better estimate we make.



沪ICP备19023445号-2号
友情链接