Introduction to statistics Models Inference Examples
MTHM501- Intro to Statistics
Mark Kelson,
Introduction to statistics Models Inference Examples
Introduction to statistics
Introduction to statistics Models Inference Examples
INTRODUCTION
Statistics is a tool that underpins the scientific method.
The truth is beyond our grasp, but knowledge grows through the gathering of evidence. Concepts are applicable almost universally, for example:
I populationdeclineofleatherbackturtles; I vaccineefficacy;
I economicdevelopment,etc.
Statistics is the branch of mathematics that is primarily concerned with the collection, summarisation and interpretation of data.
Introduction to statistics Models Inference Examples
The heat map shows number of cases per 100,000 people.
http://graphics.wsj.com/infectious- diseases- and- vaccines/
Introduction to statistics Models Inference Examples
INTRODUCTION TO STATISTICS
Statistics is a tool that underpins the scientific method.
“All models are wrong, but some are useful!”
– George E. P. Box
It helps us make sense of the world through the collection, analysis and interpretation of data.
Introduction to statistics Models Inference Examples
DETERMINISTIC MODELS
Many valuable mathematical models are purely deterministic. They explain data arising from a real-life situation via mathematical relationships which do not explicitly incorporate randomness or probabilistic behaviour. e.g.
Planetary/lunar motion: The motion of planets around the Sun and the moon around the Earth are well-understood e.g. in 1999, we knew exactly when and where the solar eclipse would occur over parts of Devon/Cornwall.
Mathematically, a deterministic model is one that produces a 1:1 mapping between inputs and outputs. e.g.
y = sin(x)
5.0 7.5 10.0
Introduction to statistics Models Inference Examples
TYPES OF UNCERTAINTY
Many real-world phenomena will involve a generating process that is stochastic i.e. one that involves probabilistic elements.
Aleotoric:
I the phenomena itself is subject to probabilistic behaviour;
I uncertainty. Epistemic:
I observation error; I lack of knowledge; I ignorance.
Introduction to statistics Models Inference Examples
TYPES OF UNCERTAINTY
“Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.” – Donald Rumsfeld https://www.youtube.com/watch? v=REWeBzGuzCc
Introduction to statistics Models Inference Examples
STOCHASTIC (STATISTICAL) MODELS
The fundamental concept is that measureable quantities are random variables, in the sense that they will vary within a population of individuals.
Paraphrasing http: //understandinguncertainty. org/David Spiegelhalter:
“Measurements are unpredictable, but in a predictable way.”
4.0 4.5 5.0
Yield (bushels / unit plot)
Introduction to statistics Models Inference Examples
STOCHASTIC (STATISTICAL) MODELS
For example:
I Growthrateofwheat:Growthofwheatplantsiswell-understood:plantsgrowrapidlyif young and stop growing when around a certain height. However, not all plants will grow in exactly the same way. Since each wheat plant is different, we would not expect to be able to predict the exact height at a certain time for each plant, but can only expect to do pretty well on average.
I Birthweightandmothers’smokinghabits:Itisgenerally(ifnotunanimously)acceptedthat mothers who smoke during pregnancy give birth to children with lower body weights. However, no simple mathematical model will describe this relationship exactly because we would expect random variation in individual birth weights even if the weights of children whose mothers smoked during pregnancy are lower on average.
Introduction to statistics Models Inference Examples
PROBABILITY MODELS
To model uncertainty, we need to develop a statistical model that can approximate the real-world phenomena of interest.
We do not know all of the necessary physical properties required to successfully predict a coin toss exactly e.g.
I gravity;
I air resistance; I force etc.
However, we can use a probability model to characterise the behaviour of the system under certain mathematical assumptions.
Introduction to statistics Models Inference Examples
PROBABILITY MODELS
For example, we could assume each coin toss is independent, with an equal probability of landing as either heads or tails (p = 0.5).
Under these assumptions, the number of heads from n tosses will follow a binomial distribution.
P(X = x) = npx(1 − p)n−x x
for x = 0,1,2,…,n. e.g. for p = 0.5 and n = 10:
P(X = 2) = 0.0439 3
0.25 0.20 0.15 0.10 0.05 0.00
p = 0.5, n = 10
0.0 2.5 5.0
Introduction to statistics Models Inference Examples
DEDUCTIVE INFERENCE VS. INDUCTIVE INFERENCE
Calculating P(X = 2) for given values of p and n is an example of a deductive argument. That is, we are starting with a general relationship, and derive a specific result.
The general relationship is described by the probability model, from which results of interest can be obtained.
Statistical inference is almost always inductive, that is we start with specific information (e.g. data), and try to come to some conclusion about the general process giving rise to the data (sometimes known as the data generating process).
Introduction to statistics Models Inference Examples
DEDUCTIVE INFERENCE VS. INDUCTIVE INFERENCE
For example, consider that we toss a coin 10 times and get the following result:
HHHHHTHTHH
Deductive: If the outcomes arise from a binomial model with n = 10 and p = 0.5, what is the probability of observing two tails?
Inductive: Given that we’ve observed two tails and eight heads, what can we say about the process generating the data? e.g.
I Are the data consistent with a binomial model with n = 10 and p = 0.5?
I What is the probability of getting a head (or, is the coin biased)?
In many real-world applications, we use inductive inference to derive a mathematical description of the underlying physical process, and then use this mathematical model to make deductive statements regarding applications of interest (e.g. prediction).
Introduction to statistics Models Inference Examples
STATISTICAL INFERENCE
“In order to make such statements, we need first to abstract the essence of the data-producing mechanism into a form that is amenable to mathematical and statistical treatment… This is the statistical model of the system. Fitting a model to a given set of data will then provide a framework for extrapolating the results to a wider context or for predicting future outcomes, and can often lead to an explanation of the system.”
Wojtek J. Krzanowski
Introduction to statistics Models Inference Examples
STOCHASTIC VS. DETERMINISTIC MODELS
Unlike deterministic models, a stochastic model does not have to predict the observed data values precisely to be valid. However, it’s focus is on specifying the generating process and not just describing the data values.
By its nature induction is a difficult business, there are often many mathematical relationships that could be used to describe a set of data.
If possible ‘good’ models should also be justifiable in a broader context (e.g. accord with the background science) as well as fit the data well.
The aim of stochastic models is not to ‘remove our uncertainty’, but rather to ‘explicitly specify the nature, and quantify the degree, of our uncertainty’.
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
These data consist of a small extract from https://www.jstor.org/stable/1965523Mauldin and Berelson (1978), courtesy of Germán Rodríguez.
These data show the crude birth rate decline (CBR—the number of births per thousand population) between the years 1965–1975 for different countries in South America.
Family planning effort combines 15 different program indicators into a single numerical score.
Does increased family planning effort lead to declines in crude birth rates?
0 5 10 15 20
Family Planning Effort
Decline in CBR
Code Help
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
0 5 10 15 20
Family Planning Effort
Decline in CBR
I Response variable: this is the variable of interest, the one we wish to make inference about.
I Explanatory (or predictor) variables: these are the ones we wish to use to predict the response.
(Sometimes the response and explanatory variables are known as the dependent and independent variables respectively, but I find this confusing.)
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
0 5 10 15 20
Family Planning Effort
Decline in CBR
We tend to denote the response variable using Y, and explanatory variables using X. Measurements are denoted using lower case letters e.g.
I y1,…,yn are n samples of a random variable Y.
I Y is decline in CBR;
I X is family planning effort.
Each point corresponds to a pair of observations (yi , xi ).
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
0 5 10 15 20
Family Planning Effort
Decline in CBR
Model is typically split into two components:
I A systematic or trend component: refers to the ‘average’ or ‘typical’ behaviour of the response variable y.
I A random or error component: refers to characteristics of the probability distribution other than its mean (e.g. family, type, variability etc.)
The systematic and random components are both equally important, since we want to know how phenomena behave on average, and also how volatile their behaviour is.
Programming Help
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
0 5 10 15 20
Family Planning Effort
Decline in CBR
After fitting the model to the data: I the data, yi are the actual
observations;
I the fitted values, yˆ are the estimates
i of y from the model;
I the residuals, εi = yi − ˆyi are the differences between the observed and fitted values.
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
In this case we could build a model that looks like:
yi = β0 + β1xi + εi,
whereεi ∼N 0,σ2.
0 5 10 15 20
Family Planning Effort
Decline in CBR
This assumes the mean line is linear:
y = β0 + β1x, and the residuals are normally distributed around the mean line (with unknown variance σ2).
The ‘best-fit’ line is given by estimates:
βˆ =2.34andβˆ =1.25to2d.p. 01
Code Help, Add WeChat: cstutorcs
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
The mean line is governed by:
yˆ = 2.34 + 1.25x . ii
According to the model, the average decline in crude birth weight (CBR) for countries with no family planning effort is βˆ = 2.34 births / 1000 population.
Also, for every unit increase in family
planning effort, the average crude birth
rate decreases by βˆ = 1.25 births / 1000 1
population.
Does this mean that family planning effort results in decreased CBRs?
0 5 10 15 20
Family Planning Effort
Decline in CBR
Introduction to statistics Models Inference Examples
EXAMPLE: BIRTH RATE DATA
Of course this is not the only stochastic model we could propose for these data. An alternative is the simpler model:
yi = β0 + εi, where εi ∼ N 0, σ2 (i.e. yi does not depend on xi).
According to the model, the average
decline in crude birth weight (CBR) for
countries is βˆ = 14.3 births / 1000 0
population.
Model assumes there is no relationship with family planning effort.
Is this a reasonable model, and how does it compare with the more complex one?
0 5 10 15 20
Family Planning Effort
Decline in CBR
Introduction to statistics Models Inference Examples
ACKNOWLEDGEMENTS
I Dr T.J.McKinley I Prof Trevor Bailey I Dr Dave Hodgson