CMSC 23500 Section 1 2 (Rehman) Lecture 6 – Database Storage

Introduction to Database Systems
CMSC 23500 Section 1/2 (Rehman) Lecture 6 – Database Storage
Slides from Aaron Elmore, Andy Pavlo and Silberchatz et. al.

Agenda and Updates
• Database Physical Storage
• RecapofStorageSystems,MemoryHierarchy
• Howtoefficientlystoreandmanagebytesondisk • Reading:Chapter13.1–13.3
• HW2 – SQL Due Friday, April 5th at 11:59 AM (Noon) • Slight Schedule Adjustments
• Crusty 1 will be released this Thursday
• Rust Primer 2 shortly (sometime this weekend) • No change to due dates

Code Help, Add WeChat: cstutorcs
Recap: Under the Hood of a Typical DBMS
𝜎!”#$%$&& ⋈
SELECT * FROM table1 JOIN table2 ON table1.key = table2.key WHERE table2.col1 < 100; Query Execution Physical Plan ⋈Hash Join 𝜎!"#$%$&& AST & Logical Plan Query Optimizer Storage Manager Buffer Pool Index Scan Access Control Transaction Management Concurrency Control Logging Recovery Query Components • Storage (now) • Execution • Concurrency Control (Transactions) • Recovery (Logging) • Distributed Databases Operator Execution Access Methods Disk Manager Buffer Pool Manager Query Planning Storage Hierarchy Faster Smaller CPU Registers CPU Caches HDD Network Storage Random Access Byte-Addressable Non-Volatile Sequential Access Block-Addressable Slower Larger Access Times 1.7 days 16.5 weeks 11.4 months 31.7 years 0.5 ns L1 Cache Ref 7 ns L2 Cache Ref 100 ns DRAM 150,000 ns SSD 10,000,000 ns HDD ~30,000,000 ns Network Storage 1,000,000,000 ns Tape Archives Visualizing the Latency Gap Why is the (Magnetic) Disk Slow? NOTE: Diagram is schematic, and simplifies the structure of actual disk drives Spinning disks are disappearing – less relevant to teach every year Flash Storage vs Magnetic Storage Pricing How is Data Stored? • The DB stores data on disk. • Loads it into memory on • Storage Manager orchestrates file access • Keeps track of where data is: • Which file? • Which ”part” of a file (Page)? • Which ”part” of a Page? INSERT INTO student VALUES (1, 'John Smith', 90, 'Comp. Sci.', 5001); AST & Logical Plan Query Execution Storage Manager Physical Plan Query Optimizer OK, Tuple Stored in (File 1, Page 0, Slot 15) Database Pages • Store items in pages, pages in files • Pages are fixed size (typically 4096 bytes = 4KB) • Aligned to OS guarantees that the system can atomically read/write pages • Minimum Access Size for Storage devices is typically one Block* Page Offset = Page # * PAGE_SIZE *We will use the term Block and Page interchangeably. Textbook uses Block Get Page #2 Database Pages in a Single File Computer Science Tutoring
Disk-Oriented Database Systems
Get all pages for table
SELECT * FROM table;
Database File Buffer Pool

File Organization
• Sequential / Sorted Organization (ISAM) • Heap File Organization
• Tree File Organization
• Hashing File Organization

Sequential File Organization
• Suitable for applications that require sequential processing of the entire file
• The records in the file are ordered by a search-key
Srinivasan
Comp. Sci.
Comp. Sci.
Comp. Sci.
Elec. Eng.

CS Help, Email: tutorcs@163.com
Sequential File Organization
• Deletion – use pointer chains*
• Insertion –locate the position where the record is to be inserted
• if there is free space insert there
• if no free space, insert the record in
an overflow block
• In either case, pointer chain must be
• Need to reorganize the file from time to time to restore sequential order
*Note: Pointers are typically file offsets in files, not memory pointers.
Srinivasan
Comp. Sci.
Comp. Sci.
Comp. Sci.
Elec. Eng.

Heap File Organization
• Not related to the Heap Data structure (No ordering property)
• Records can be placed anywhere in the file where there is free space • Records usually do not move once allocated
• It is important to be able to efficiently find free space within file

How to Store Tuples in a Page?
• In Sequence?
• Let the Header store number of tuples
• Any other metadata that maybe needed
• Advantages and Disadvantages?

How to Store Tuples in a Page?
• In Sequence?
• Let the Header store number of tuples.
• Advantages and Disadvantages?
• What about variable-length tuples?

Slotted Pages
• Keep ”Slots” to tuples starting offsets in the page
• Keeps track of the number of slots
• The position of the last slot • Other page metadata
• Each Slot:
• Keeps track of the starting offset for the
• Size of the tuple
• Other tuple/slot metadata

Slotted Pages
• Keep ”Slots” to tuples starting offsets in the page
• Keeps track of the number of used
• The position of the last slot
• Other page metadata
• Each Slot:
• Keeps track of the starting offset for
• Size of the tuple
• Other tuple/slot metadata

Slotted Pages
• How to Delete?

Multitable Clustering File Organization
• Combine records from multiple tables into the same file
• Good if there the tables are frequently joined
• Bad if you have to read just one table
department
instructor
multitable clustering of department and instructor

Tuple Representation
• Why not store as text?

Tuple Representation
• Why not store as text? Reading 10000 floating point numbers from file:
~100x faster

What does a Tuple Look Like?
• All the information required for the tuple data
• Null Bitmap
• Offsets for the various fields

Tuple Organization
• 1. Store Tuples in Row Order (OLTP / Traditional Databases) • 2. Store Tuples as Columns (OLAP / Columnar Stores)

The Buffer Pool
• “A Page Cache”
• Store pages in memory while they are
being read/written to.
• Avoid repeated expensive disk access
• What does this remind you of?
Buffer Pool

Buffer Pool Considerations
• Optimize Page Access for Faster Query Execution
• Pin / Lock pages for efficiency and transactions
• Has more insight into the access patterns than a traditional cache/virtual memory manager
Buffer Pool
How does LRU perform for a typical Join?
for each tuple tr of r do for each tuple ts of s do
Alternate Strategies?
Toss Immediate?
if the tuples tr and ts match

Next Lecture
• Database Indexes I
• Readings: Database System Concepts: Chapter 14.1 – 14.2 and 14.5
• Watch for announcements
• CrustyDB 1 and Rust Primer 2