MATH191 hw01

Math 191 Wilkening
Spring 2023
Homework 1
due Sat, Jan. 28, 2:00 PM
(Upload your solutions to Gradescope)
1. (5 points) Find the “tiny-precision” floating-point number encoded by the bit pattern in part (A) below. In (B) and (C), carry out the tiny-precision floating-point arithmetic and then find the binary representation. The variables x and y retain their meaning in subsequent parts. Here ⊕, ⊖, ⊗, ⊘ stand for correctly rounded floating-point arithmetic with ‘round-to-even’ tie breaking. (1.0 is the floating-point number represented by 0 011 00).
(A)(1point) 011001 → x= (B)(2points) → y=1.0⊘x= (C)(2points) → (1.0⊘y)⊖x=
line ( tiny
31 Itf ) 2f
C=7,f=O c=7 , FFO
Overflow threshold
2311.75 ) ÷zoom !
f= 3T },÷,0 ,
[ [ z0 [-[
f =3, ,tz,÷ ,
normal (¥8) numbers

程序代写 CS代考 加QQ: 749389476
2. (5 points) (#2.5 p. 58 of Higham’s book.) Show that ∞
0.1 = 􏰌 􏰉2−4i + 2−4i−1􏰊 i=1
and deduce that 0.1 has the base 2 representation 0.0001100 (repeating the last 4 bits). Let x􏰎 = fl(0.1) be the rounded version of 0.1 obtained in binary IEEE single precision arithmetic (u = 2−24).
representation of fl(0.1). Write your answer in the form 1.f1f2f3 …f23 ×2m, where you figure out
Modification: instead of showing that x − x􏰎 /x = − 4 u, just work out the single-precision binary
the bits fi ∈ {0, 1} and m.
3. (5 points) (#2.6 p. 58 of Higham’s book.) What is the largest integer p such that all integers in the interval [−p, p] are exactly representable in IEEE double-precision arithmetic? What is the corresponding p for IEEE single-precision arithmetic? Clarification: you should show how each integer in the range is represented for the double-precision part. You can just say “similarly,” and give the result for single precision without giving details.
4. (6 points) (#2.7 p. 58 of Higham’s book.) Which of the following statements is true in IEEE arithmetic, assuming that a and b are normalized floating point numbers and that no exception occurs in the stated operations? (Provide brief justification or a counter-example). Modification: for part 2, just answer true or false without justifying your answer. A careful justification is quite involved.
1. fl(aopb)=fl(bopa), op =+,∗. 2. fl(b−a)=−fl(a−b).
3. fl(a+a)=fl(2∗a).
4. fl(0.5∗a)=fl(a/2).
5. fl((a+b)+c)=fl(a+(b+c)).
6. a≤fl((a+b)/2)≤b,giventhata≤b.
5. (5 points) (#2.20 p. 59 of Higham’s book.) Two requirements that we might ask of a routine
for computing √x in floating point arithmetic are that the identities √x2 = |x| and 􏰀􏰏|x|􏰁2 = |x|
be satisfied. Which, if either, of these is a reasonable requirement? Modification: just do the √
first part, i.e., show that fl( x2) = |x|. For definiteness, let’s use IEEE single-precision arithmetic
程序代写 CS代考 加微信: cstutorcs
with correct rounding. You may assume 1 ≤ x < 2. The algorithm involves starting with an IEEE number x, computing y = x2, rounding the result to yˆ, computing z = √yˆ, and rounding the result to zˆ. Your goal is to show that zˆ = x. You may assume linearization is valid, which is to say that if yˆ = y + ∆y, then z = (y + ∆y)1/2 ≈ y1/2 + ∆z with ∆z = 12 y−1/2∆y. How big can ∆y be in the two regimes1≤y<2and2≤y<4? Asaresult,howbigcan∆zbe? Showthatit’ssmallenough that rounding z brings you back to x. (Note that the spacing between consecutive single precision numbers is 2u for 1 ≤ x ≤ 2, 2u for 1 ≤ y ≤ 2, and 4u for 2 ≤ y ≤ 4, where u is the unit roundoff, equal to 2−24 for single-precision numbers.) 6. (4 points) (#3.8 p. 78 of Higham’s book.) Which is the more accurate way to compute x2 − y2: as x2 − y2, or as (x + y)(x − y)? (Note that this computation arises when squaring a complex number.) Modification: in my mind, there is no clear winner. The first method has a marginally better error constant if |x2 −y2| ≥ 32(x2 +y2), but the second method could be much better if |x2 − y2| ≪ (x2 + y2). So just derive error estimates for the two approaches and see if you can see why the second method is better when x2 ≈ y2. Programming Help