MAST90083 2021 S2 exam paper

Semester 2 Assessment, 2021
School of Mathematics and Statistics
MAST90083 Computational Statistics & Data Science
Reading time: 30 minutes — Writing time: 3 hours — Upload time: 30 minutes This exam consists of 4 pages (including this page)
Permitted Materials
ˆ This exam and/or an offline electronic PDF reader, blank loose-leaf paper and a non- programmable calculator.
ˆ No books or other material are allowed. Only one double side A4 page note (handwritten or printed ) is allowed.
Instructions to Students
ˆ If you have a printer, print the exam. If using an electronic PDF reader to read the exam, it must be disconnected from the internet. Its screen must be visible in Zoom. No mathematical or other software on the device may be used. No file other than the exam paper may be viewed.
ˆ Ask the supervisor if you want to use the device running Zoom. Writing
ˆ There are 6 questions with marks as shown. The total number of marks available is 55.
ˆ Write your answers on A4 paper. Page 1 should only have your student number, the subject code and the subject name. Write on one side of each sheet only. Each question should be on a new page. The question number must be written at the top of each page.
ˆ Put the pages in question order and all the same way up. Use a scanning app to scan all pages to PDF. Scan directly from above. Crop pages to A4. Make sure that you upload the correct PDF file and that your PDF file is readable.
Submitting
ˆ You must submit while in the Zoom room. No submissions will be accepted after you have left the Zoom room.
ˆ Go to the Gradescope window. Choose the Canvas assignment for this exam. Submit your file. Wait for Gradescope email confirming your submission. Tell your supervisor when you have received it.
©University of Melbourne 2021 Page 1 of 4 pages Do not place in Baillieu Library
Student number
Code Help, Add WeChat: cstutorcs
MAST90083 Computational Statistics & Data Science Semester 2, 2021
Question 1 (10 marks)
Given the model
wherey∈Rn,X∈Rn×p isfullrankpandε∈Rn ∼N(0,σ2In). LetX=(x1,…,xp)bethe
y = Xβ + ε
column representation of X where we further assume that the columns are mutually orthogonal.
(a) Derive the expression of, βˆj the jth component of the least square estimate of β, as a function of xj
(b) Is the least square estimate of βj modified if any of the other components βl, (l ̸= j) are forced to zero ?
(c) Provide the expression of the residual sum of squares and discuss its change when a component is put to zero, βj = 0
Assume now that instead of having orthogonal columns, the xij are standardized so that for
j = 1,…,p
Xxij =0 and Xx2ij =c
(d) Derive the expression of the covariance of the least square estimator of β
(e) Derive the expression of Pp var βˆ  as a function of σ2 and λ , j = 1, …, p, the eigen- j=1j j
values of C = X⊤X
(f) Use these results to show that Pp varβˆ  is minimized when X is orthogonal
Note: For parts (a)-(c), the columns of X are assumed orthogonal and not orthonormal; i.e.,
x⊤i xj =0butx⊤i xi =∥xi∥2 ̸=1 Question 2 (9 marks)
Consider a positive sample x1, …, xn from an exponential distribution f(x|θ)=θe−θx, x≥0, θ>0.
Suppose we have observed x1 = y1,…,xm = ym and xm+1 > c,…,xn > c where m is given, m < n and y1, .., ym are given numerical values. This implies that x1, · · · , xm are completely observed whereas xm+1, · · · , xn are partially observed in that they are right-censored. We want to use an EM algorithm to find the MLE of θ. (a) Find the complete-data log-likelihood function l(θ) = log L(θ). (b) In the E-step, we calculate Q(θ,θ(k)) = E[lnL(θ) | x1 = y1,··· ,xm = ym,xm+1 > c,··· ,xn > c;θ(k)]
where θ(k) is the current estimate of θ. Show that “m!#
Q(θ,θ(k))=nlogθ−θ Xyi +(n−m) cθ(k) +1 e−θ(k)c
Page 2 of 4 pages
程序代写 CS代考 加QQ: 749389476
MAST90083 Computational Statistics & Data Science Semester 2, 2021
(c) In the M-step, we maximise Q(θ, θ(k)) with respect to θ to find an update θ(k+1) from θ(k). Show that
“m ! #−1 θ(k+1) =n Xyi +(n−m) cθ(k) +1 e−θ(k)c
(d) Suppose the sequence {θ(k); k=1,2,···} converges to the MLE θˆwhen k → ∞. Establish
the equation allowing the derivation of θˆ. Question 3 (9 marks)
Consider scatterplot data (xi, yi) , 1 ≤ i ≤ n such that yi =f(xi)+εi
where yi ∈ R, xi ∈ R, εi ∈ R ∼ N(0,σ2) and are i.i.d. The function f(x) = E(y|x)
characterizing the underlying trend in the data is some unspecified smooth function that needs to be estimated from (xi,yi), 1 ≤ i ≤ n. For approximating f we propose to use quadratic spline basis with truncated quadratic functions 1, x, x2, (x−k1)2+,…,(x−kK)2+.
(a) Provide the quadratic spline model for f and define the set of unknown parameters that need to be estimated
(b) Derive the matrix form of the model and the associated penalized spline fitting criterion
(c) Derive the expression for the penalized least squares estimator for the unknown parameters of the model and the associated expression for the best fitted values.
(d) Find the degrees of freedom of the fit (effective number of parameters) obtained with the proposed model and its extremes or limit values when the regularization parameter λ varies from 0 to +∞.
(e) Find the optimism of the fit and its relation with the degrees of freedom.
Question 4 (10 marks)
Let Y = (y1,…,yn) be a set of n vector observations of dimension q such that yi = (y1i,…,yqi)⊤ ∈ Rq. For modeling these observations we propose to use the parametric model given by
yi = Φ1yi−1 + Φ2yi−2 + … + Φpyi−p + εi
where εi are independent identically distributed normal random variables with mean vector zero and q × q variance-covariance matrix Σ modeling the approximation errors and the Φj, j = 1, …, p are q × q coefficient or parameter matrices.
(a) How many vector observations need to be lost to work with this model ? And what is the effective number of observation ?
(b) Provide a linear matrix form for the model where the parameters are represented in a (pq) × q matrix form Φ = [Φ1, …, Φp]⊤, derive the least square estimator of Φ and the maximum likelihood estimate of Σ
Page 3 of 4 pages

MAST90083 Computational Statistics & Data Science Semester 2, 2021
(c) What could you describe as an inconvenience of this model and find the number of pa- rameters involved in the model
(d) Derive the expression of the log-likelihood for this model
(e) Use the obtained log-likelihood expression to obtain the expressions of AIC and BIC
(f) What consequences this model has on selection criteria ?
Question 5 (9 marks)
Let x1,…,xn be a set of independent and identically distributed samples from a population distribution F0 and let
denotes the mean of this population assumed to be a scalar. We are interested in θ0 = θ(F0) = μ2
(a) Provide the form of the nonparametric estimator obtained from the empirical distribution
(b) Derive the expression of the bias b1 = E θ − θ0 .
(c) Derive the expression of the bootstrap estimate of b1.
(d) Use this expression to derive the bootstrap bias-reduced estimate θˆ1 of θ ˆ 
(e) Derive the expression of the bias b2 = E θ1 − θ0
(f) Compare b1 and b2
Question 6 (8 marks)
Suppose we have a two-layer network with r input nodes xm, m = 1, …, r, a single layer (L = 1) of t hidden nodes Zj, j = 1,…,t and s output nodes Yk, k = 1,…,s. Let βmj be the weight of the connection Xm → Zj with bias β0j and let αjk be the weight of the connection Zj → Yk with bias α0k . The functions fj (.), j = 1, …, t and gk (.), k = 1, …, s are the activation functions for the hidden and output layers nodes respectively.
(a) Derive the expression for the value of the kth output node of the network as function of α0k, gk, αjk, fj, β0j, βmj and Xm
(b) Derive the matrix form for the vector of output of the network
(c) Under which conditions this network becomes equivalent to a single-layer perceptron
(d) What is the special case model obtained when the activation functions for the hidden and output nodes are taken to be identity functions.
End of Exam — Total Available Marks = 55
Page 4 of 4 pages
程序代写 CS代考 加微信: cstutorcs