STA106 Data Science Perspective

Hsieh Fushing
University of California, Davis
March 1, 2023
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 1 / 39
STA 106 Lecture Note 3.
Data Science Perspective

1 Reiterating the first word of data!
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 2 / 39

1 Reiterating the first word of data!
2 ANOVA and K-sample problem
K-sample problem from the predictive perspective. Contrasting ANOVA representations.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 2 / 39

1 Reiterating the first word of data!
2 ANOVA and K-sample problem
K-sample problem from the predictive perspective. Contrasting ANOVA representations.
3 Two-way ANOVA
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 2 / 39

1 Reiterating the first word of data!
2 ANOVA and K-sample problem
K-sample problem from the predictive perspective. Contrasting ANOVA representations.
3 Two-way ANOVA
4 Predictive entropy computations. HC-approach and entropy approaches Final Project
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 2 / 39

Hsieh Fushing
University of California, Davis
March 1, 2023
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 3 / 39
STA 106 Lecture Note 3.
Data Science Perspective

Reiterating the first word of data!
Data in Sciences.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 4 / 39

Reiterating the first word of data!
Data in Sciences.
Exploring is Learning.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 4 / 39

Reiterating the first word of data!
Data in Sciences.
Exploring is Learning. Modeling is Hijacking.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 4 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Data framework of K-sample problem
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 5 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Data framework of K-sample problem
The 1st sample of 1D data points: {z1i}n1
random variables {Z1i}n1 , i=1
observed from I.I.D
Hsieh Fushing (UC Davis)
March 1, 2023 5 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Data framework of K-sample problem
The 1st sample of 1D data points: {z1i}n1
random variables {Z1i}n1 , i=1
observed from I.I.D observed from I.I.D
the 2nd sample of 1D data points: {z2j}n2
random variables {Z2j}n2 . j=1
Hsieh Fushing (UC Davis)
March 1, 2023

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Data framework of K-sample problem
The 1st sample of 1D data points: {z1i}n1
observed from I.I.D observed from I.I.D
i=1 the 2nd sample of 1D data points: {z2j}n2
random variables {Z2j}n2 . j=1
random variables {Z1i}n1 , i=1
The K-th sample of 1D data points: {zKh}nK
observed from I.I.D random variables {ZKj } K . Our scientific questions are centered
potentially interesting and essential “communities” embraced by
these K observed samples of data points.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 5 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Scientific content in K-sample problems
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 6 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Scientific content in K-sample problems
What if the K population labels are categories of a categorical variable, say K?
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 6 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Scientific content in K-sample problems
What if the K population labels are categories of a categorical variable, say K?
Our scientific questions can be equivalently transformed into: Would we be able to describe characteristics of the distribution shape of random variable Zkj in comparable and relative fashions with respect to the rest of K − 1 distribution shapes of the K populations?
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 6 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Scientific content in K-sample problems
What if the K population labels are categories of a categorical variable, say K?
Our scientific questions can be equivalently transformed into: Would we be able to describe characteristics of the distribution shape of random variable Zkj in comparable and relative fashions with respect to the rest of K − 1 distribution shapes of the K populations?
Our scientific questions become even more explicit from the
“predictive” perspective: By knowing the information of K = k,
would you be able to better predict where a not-yet observed or
future data Z∗ from the k−th population could likely fall than your kj
prediction without incorporating such a piece of information?
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 6 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Scientific content in K-sample problems
What if the K population labels are categories of a categorical variable, say K?
Our scientific questions can be equivalently transformed into: Would we be able to describe characteristics of the distribution shape of random variable Zkj in comparable and relative fashions with respect to the rest of K − 1 distribution shapes of the K populations?
Our scientific questions become even more explicit from the
“predictive” perspective: By knowing the information of K = k,
would you be able to better predict where a not-yet observed or
future data Z∗ from the k−th population could likely fall than your kj
prediction without incorporating such a piece of information?
Under Normality, such a predictive perspective surely lead to the evident focus of location of “mean”, while beyond Normality, such as distributions having multiple modes, the locations of means are
not necessary meaningful at all.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 6 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
Scientific content in K-sample problems
What if the K population labels are categories of a categorical variable, say K?
Our scientific questions can be equivalently transformed into: Would we be able to describe characteristics of the distribution shape of random variable Zkj in comparable and relative fashions with respect to the rest of K − 1 distribution shapes of the K populations?
Our scientific questions become even more explicit from the
“predictive” perspective: By knowing the information of K = k,
would you be able to better predict where a not-yet observed or
future data Z∗ from the k−th population could likely fall than your kj
prediction without incorporating such a piece of information?
Under Normality, such a predictive perspective surely lead to the evident focus of location of “mean”, while beyond Normality, such as distributions having multiple modes, the locations of means are
not necessary meaningful at all.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 6 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An unified representation: Contingency Table.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 7 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An unified representation: Contingency Table.
Sample/bar Sample-1 Sample-2
Sample-K Col-sum
n[1, 1] n[2, 1] .
n[K,1] n[., 1]
n[1, 2] n[2, 2] .
n[K,2] n[., 2]
3 ··· n[1, 3] ··· n[2, 3] ··· . .
n[K,3] ··· n[., 3] ···
H Row-sum n[1,H] n1 n[2,H] n2
n[.,H] N(= 􏰀Kk=1 nk)
Table: A K × H contingency table representing a histogram of pooled data from K samples.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 7 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An unified representation: Contingency Table.
Sample/bar Sample-1 Sample-2
Sample-K Col-sum
n[1, 1] n[2, 1] .
n[K,1] n[., 1]
n[1, 2] n[2, 2] .
n[K,2] n[., 2]
3 ··· n[1, 3] ··· n[2, 3] ··· . .
n[K,3] ··· n[., 3] ···
H Row-sum n[1,H] n1 n[2,H] n2
n[.,H] N(= 􏰀Kk=1 nk)
Table: A K × H contingency table representing a histogram of pooled data from K samples.
This contingency table approach for comparing K samples works for all data types: categorical, discrete and continuous.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 7 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this unified representation.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 8 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this unified representation.
A well-built contingency table is approximately:
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 8 / 39

Programming Help
ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this unified representation.
A well-built contingency table is approximately: the “sufficient statistics”
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 8 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this unified representation.
A well-built contingency table is approximately: the “sufficient statistics”
of the K distribution shapes.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 8 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An unified representation: K × H matrix of proportions.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 9 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An unified representation: K × H matrix of proportions.
Sample/bar 1 2 3 ···
n[1,H ] n1 n1 n1
n[2,H ] n2
n[1,1] n[1,2] n[1,3] ··· n1 n1 n1
Sample-K n[K,1] n[K,2] n[K,3] ···
nK nK nK Col-sum n[.,1] n[.,2] n[.,3] ···
n[2,1] n[2,2] n[2,3] ··· n2 n2 n2
n[K,H] nK nK nK
n[.,H ] N NNN NN
Table: A K × H matrix of proportions represents K histograms of from K samples.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 9 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An unified representation: K × H matrix of proportions.
Sample/bar 1 2 3 ···
n[1,H ] n1 n1 n1
n[2,H ] n2
n[1,1] n[1,2] n[1,3] ··· n1 n1 n1
Sample-K n[K,1] n[K,2] n[K,3] ···
nK nK nK Col-sum n[.,1] n[.,2] n[.,3] ···
n[2,1] n[2,2] n[2,3] ··· n2 n2 n2
n[K,H] nK nK nK
n[.,H ] N NNN NN
Table: A K × H matrix of proportions represents K histograms of from K samples.
Each row of this K × H matrix of proportions offers a probabilistic solution for a prediction in comparison with the row-vector of column-sum-proportions.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 9 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this K × H matrix of proportions.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 10 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this K × H matrix of proportions.
Based on a well-built contingency table, this K × H matrix of proportions is equipped with:
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 10 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this K × H matrix of proportions.
Based on a well-built contingency table, this K × H matrix of proportions is equipped with:
explicit and visible “deterministic and stochastic” structures
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 10 / 39

ANOVA and K-sample problem
K-sample problem from the predictive perspective.
An idea underlying this K × H matrix of proportions.
Based on a well-built contingency table, this K × H matrix of proportions is equipped with:
explicit and visible “deterministic and stochastic” structures
of the K underlying distribution shapes and randomness revealed
through observed data.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 10 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 11 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
The 1st sample of 1D data points: {z1i}n1 i=1
observed from I.I.D
Normal N(μ1,σ2) random variables {Z1i}n1 , i=1
Hsieh Fushing (UC Davis) ANOVA
March 1, 2023 11 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
The 1st sample of 1D data points: {z1i}n1 i=1
observed from I.I.D ,
observed from I.I.D .
Normal N(μ1,σ2) random variables {Z1i}n1 i=1
the 2nd sample of 1D data points: {z2j}n2 j=1
Normal N(μ2,σ2)random variables {Z2j}n2 j=1
Hsieh Fushing (UC Davis) ANOVA
March 1, 2023

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
The 1st sample of 1D data points: {z1i}n1 i=1
observed from I.I.D ,
observed from I.I.D .
Normal N(μ1,σ2) random variables {Z1i}n1 i=1
the 2nd sample of 1D data points: {z2j}n2 j=1
Normal N(μ2,σ2)random variables {Z2j}n2 j=1
The K-th sample of 1D data points: {zKh}nK
Normal N(μK , σ2)random variables {ZKj } K . j=1
observed from I.I.D
Hsieh Fushing (UC Davis) ANOVA
March 1, 2023 11 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
The 1st sample of 1D data points: {z1i}n1 i=1
observed from I.I.D ,
observed from I.I.D .
Normal N(μ1,σ2) random variables {Z1i}n1 i=1
the 2nd sample of 1D data points: {z2j}n2 j=1
Normal N(μ2,σ2)random variables {Z2j}n2 j=1
The K-th sample of 1D data points: {zKh}nK
observed from I.I.D Should scientific questions be confined by the Normality and
constant variance assumptions in the K samples’?
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 11 / 39
Normal N(μK , σ2)random variables {ZKj } K . j=1

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 12 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Zkj =μk +εkj withεkj ∼N(0,σ2)forallk=1,···,K andj=1,..,nk.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 12 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Zkj =μk +εkj withεkj ∼N(0,σ2)forallk=1,···,K andj=1,..,nk. Each of the K sample variances σˆ2 = 1 􏰀nk (Zki − Z ̄k )2 is an
estimate of σ2.
k nk−1 i=1
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 12 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Zkj =μk +εkj withεkj ∼N(0,σ2)forallk=1,···,K andj=1,..,nk. Each of the K sample variances σˆ2 = 1 􏰀nk (Zki − Z ̄k )2 is an
k nk−1 i=1
Therefore, the common parameter σ2 across the K sample will be
estimated more precisely by:
1 K 1 K nk
estimate of σ2.
σˆ2 = 􏰁(nk −1)σˆk2 =
􏰁􏰁(Zki −Z ̄k)2. k=1 i=1
N−K N−K k=1
Hsieh Fushing (UC Davis)
March 1, 2023

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 13 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Under the null hypothesis Ho : μ = μ1 = μ2 = … = μK, the common parameter μ is more precisely estimated by:
1 K nk 1 K
Z ̄.. = 􏰁􏰁Zkh =
􏰁nkZ ̄k., k=1
N N−K k=1 h=1
Hsieh Fushing (UC Davis)
March 1, 2023

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 1st representation of One-way-layout ANOVA framework.
Under the null hypothesis Ho : μ = μ1 = μ2 = … = μK, the common parameter μ is more precisely estimated by:
1 K nk 1 K
Z ̄.. = 􏰁􏰁Zkh =
􏰁nkZ ̄k., k=1 h=1 k=1
than any individual sample average Z ̄k., which is an estimate of μk.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 13 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 2nd representation of One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 14 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 2nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 14 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 2nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
This is a “linear regression” representation.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 14 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 2nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
This is a “linear regression” representation.
Under the null hypothesis Ho :0=α1 =α2 =…=αK:
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 14 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 2nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
This is a “linear regression” representation.
Under the null hypothesis Ho :0=α1 =α2 =…=αK:
Intuitively, each of the K sample αˆko = Z ̄k − Z ̄.., which are also the ordinary least squared (OLS) estimates of (α1, α2, …., αK ) when n=nk for all k =1,..,K.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 14 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 2nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
This is a “linear regression” representation.
Under the null hypothesis Ho :0=α1 =α2 =…=αK:
Intuitively, each of the K sample αˆko = Z ̄k − Z ̄.., which are also the ordinary least squared (OLS) estimates of (α1, α2, …., αK ) when n=nk for all k =1,..,K.
However, when {nk |k = 1, .., K } are not all equal, then the constraint is not satisfied by {αˆo }. Therefore, the OLS estimation
in linear regression analysis is needed.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 14 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 3nd representation of One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 15 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 3nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 15 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The 3nd representation of One-way-layout ANOVA framework.
Zkj =μ+αk +εkj with 􏰀Kk=1αk =0, εkj ∼N(0,σ2) for all k = 1,··· ,K and j = 1,..,nk.
Another representation: a linear regression model: for i = 1, 2, …, N Yi =μ+α1×1,i +α2×2,i +···+αkxk,i +···+αK−1xK−1,i +εi,
where, for data points from k(< K)-th population, that is, i=􏰀k−1nh+jandkComputer Science Tutoring
ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
Y=X·θ+Ξ XTY=XTX·θ+XT ·Ξ
[XTX]−1 ·XTY=[XTX]−1[XTX]·θ+[XTX]−1XT ·Ξ. [XTX]−1 ·Y=θ+[XTX]−1 ·XTΞ.
[ X T X ] − 1 X T Y = θˆ .
where [XTX]−1XTY is called the least squared estimate of the K-dim
parameter vector θ = (μ,α1,α2,··· ,αK−1).
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 18 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 19 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
The estimated error vector, so-called residue vector is calculated as follows;
θˆ = [XTX]−1XTY
= [XTX]−1XT{X·θ+Ξ} = θ+[XTX]−1XT ·Ξ
∼ N(θ,σ2 ·[XTX]−1).
Hsieh Fushing
(UC Davis)
March 1, 2023

ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 20 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
The estimated error vector, so-called residue vector is calculated as follows;
Y − X[XT X]−1XT Y [I − X[XT X]−1XT ]Y.
Hsieh Fushing
(UC Davis)
March 1, 2023

ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 21 / 39

ANOVA and K-sample problem
Contrasting ANOVA representations.
The Least Squared Estimate (LSE) in One-way-layout ANOVA framework.
We have the sum of squared error calculated as:
ΞˆTΞˆ = YT[I −X[XTX]−1XT]T[I −X[XTX]−1XT]Y = YT [I − X[XT X]−1XT ]Y
= SSE∼σ2·χ2N−K.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 21 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 22 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Two ordinal or categorical variables: K has K categories and M has M categories, together defined a collection of K × M populations.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 22 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Two ordinal or categorical variables: K has K categories and M has M categories, together defined a collection of K × M populations.
Zkhj =μkmj +εkmj withεkmj ∼N(0,σ2)forallk=1,···,K, m = 1,··· ,M and j = 1,..,nkm.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 22 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Two ordinal or categorical variables: K has K categories and M has M categories, together defined a collection of K × M populations.
Zkhj =μkmj +εkmj withεkmj ∼N(0,σ2)forallk=1,···,K, m = 1,··· ,M and j = 1,..,nkm.
Each of the K × M sample variances σˆ2 = km
is an estimate of σ2.
1 􏰀nkm (Zkmi − Z ̄km)2 nkm−1 i=1
Hsieh Fushing (UC Davis) ANOVA
March 1, 2023 22 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 23 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Therefore, the common parameter σ2 across the K × M sample will be estimated more precisely by: N = 􏰀Kk=1 􏰀Mm=1 nkm,
􏰁 􏰁(n − 1)σˆ2
N−K×M km km k=1 m=1
1 KMnkm 􏰁􏰁􏰁(Zkmi −Z ̄km)2
N − K × M k=1 m=1 i=1
Hsieh Fushing
(UC Davis)
ANOVA March 1, 2023 23 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Therefore, the common parameter σ2 across the K × M sample will be estimated more precisely by: N = 􏰀Kk=1 􏰀Mm=1 nkm,
What kinds of null hypotheses are of interest? (Note: the numerical subscripts of k and m are purely for notational convenience. They are not necessary numerical, even not ordinal.
􏰁 􏰁(n − 1)σˆ2
N−K×M km km k=1 m=1
1 KMnkm 􏰁􏰁􏰁(Zkmi −Z ̄km)2
N − K × M k=1 m=1 i=1
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 23 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 24 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Supposed that the collection {μkm|k = 1,..,K;m = 1,..,M} can be arranged and presented on a 3D surface with ks and ms being arranged on the X− and Y− axes and values of μkm on Z− axis:
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 24 / 39

Two-way ANOVA
The 1st representation of Two-way-layout ANOVA framework.
Supposed that the collection {μkm|k = 1,..,K;m = 1,..,M} can be arranged and presented on a 3D surface with ks and ms being arranged on the X− and Y− axes and values of μkm on Z− axis:
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 24 / 39

Two-way ANOVA
The 2nd representation of two-way-layout ANOVA framework.
Hsieh Fushing (UC Davis) ANOVA March 1, 2023 25 / 39

Two-way ANOVA
The 2nd representation