Introduction to Database Systems
CMSC 23500 Section 1/2 (Rehman) Lecture 6 – Database Storage
Slides from Aaron Elmore, Andy Pavlo and Silberchatz et. al.
Agenda and Updates
• Database Physical Storage
• RecapofStorageSystems,MemoryHierarchy
• Howtoefficientlystoreandmanagebytesondisk • Reading:Chapter13.1–13.3
• HW2 – SQL Due Friday, April 5th at 11:59 AM (Noon) • Slight Schedule Adjustments
• Crusty 1 will be released this Thursday
• Rust Primer 2 shortly (sometime this weekend) • No change to due dates
Code Help, Add WeChat: cstutorcs
Recap: Under the Hood of a Typical DBMS
𝜎!”#$%$&& ⋈
SELECT * FROM table1 JOIN table2 ON table1.key = table2.key WHERE table2.col1 < 100;
Query Execution
Physical Plan
⋈Hash Join 𝜎!"#$%$&&
AST & Logical Plan
Query Optimizer
Storage Manager
Buffer Pool
Index Scan
Access Control Transaction Management Concurrency Control Logging Recovery
Query Components
• Storage (now)
• Execution
• Concurrency Control (Transactions) • Recovery (Logging)
• Distributed Databases
Operator Execution Access Methods
Disk Manager
Buffer Pool Manager
Query Planning
Storage Hierarchy
Faster Smaller
CPU Registers CPU Caches
HDD Network Storage
Random Access
Byte-Addressable
Non-Volatile
Sequential Access
Block-Addressable
Slower Larger
Access Times
1.7 days 16.5 weeks 11.4 months 31.7 years
0.5 ns L1 Cache Ref
7 ns L2 Cache Ref
100 ns DRAM
150,000 ns SSD
10,000,000 ns HDD
~30,000,000 ns Network Storage
1,000,000,000 ns Tape Archives
Visualizing the Latency Gap
Why is the (Magnetic) Disk Slow?
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Spinning disks are disappearing
– less relevant to teach every year
Flash Storage vs Magnetic Storage Pricing
How is Data Stored?
• The DB stores data on disk.
• Loads it into memory on
• Storage Manager orchestrates file access
• Keeps track of where data is:
• Which file?
• Which ”part” of a file (Page)? • Which ”part” of a Page?
INSERT INTO student VALUES (1, 'John Smith',
90, 'Comp. Sci.', 5001);
AST & Logical Plan
Query Execution Storage Manager
Physical Plan
Query Optimizer
OK, Tuple Stored in
(File 1, Page 0, Slot 15)
Database Pages
• Store items in pages, pages in files
• Pages are fixed size (typically 4096 bytes = 4KB)
• Aligned to OS guarantees that the system can atomically read/write pages • Minimum Access Size for Storage devices is typically one Block*
Page Offset = Page # * PAGE_SIZE
*We will use the term Block and Page interchangeably. Textbook uses Block
Get Page #2
Database Pages in a Single File
Computer Science Tutoring
Disk-Oriented Database Systems
Get all pages for table
SELECT * FROM table;
Database File Buffer Pool
File Organization
• Sequential / Sorted Organization (ISAM) • Heap File Organization
• Tree File Organization
• Hashing File Organization
Sequential File Organization
• Suitable for applications that require sequential processing of the entire file
• The records in the file are ordered by a search-key
Srinivasan
Comp. Sci.
Comp. Sci.
Comp. Sci.
Elec. Eng.
CS Help, Email: tutorcs@163.com
Sequential File Organization
• Deletion – use pointer chains*
• Insertion –locate the position where the record is to be inserted
• if there is free space insert there
• if no free space, insert the record in
an overflow block
• In either case, pointer chain must be
• Need to reorganize the file from time to time to restore sequential order
*Note: Pointers are typically file offsets in files, not memory pointers.
Srinivasan
Comp. Sci.
Comp. Sci.
Comp. Sci.
Elec. Eng.
Heap File Organization
• Not related to the Heap Data structure (No ordering property)
• Records can be placed anywhere in the file where there is free space • Records usually do not move once allocated
• It is important to be able to efficiently find free space within file
How to Store Tuples in a Page?
• In Sequence?
• Let the Header store number of tuples
• Any other metadata that maybe needed
• Advantages and Disadvantages?
How to Store Tuples in a Page?
• In Sequence?
• Let the Header store number of tuples.
• Advantages and Disadvantages?
• What about variable-length tuples?
Slotted Pages
• Keep ”Slots” to tuples starting offsets in the page
• Keeps track of the number of slots
• The position of the last slot • Other page metadata
• Each Slot:
• Keeps track of the starting offset for the
• Size of the tuple
• Other tuple/slot metadata
Slotted Pages
• Keep ”Slots” to tuples starting offsets in the page
• Keeps track of the number of used
• The position of the last slot
• Other page metadata
• Each Slot:
• Keeps track of the starting offset for
• Size of the tuple
• Other tuple/slot metadata
Slotted Pages
• How to Delete?
Multitable Clustering File Organization
• Combine records from multiple tables into the same file
• Good if there the tables are frequently joined
• Bad if you have to read just one table
department
instructor
multitable clustering of department and instructor
Tuple Representation
• Why not store as text?
Tuple Representation
• Why not store as text? Reading 10000 floating point numbers from file:
~100x faster
What does a Tuple Look Like?
• All the information required for the tuple data
• Null Bitmap
• Offsets for the various fields
Tuple Organization
• 1. Store Tuples in Row Order (OLTP / Traditional Databases) • 2. Store Tuples as Columns (OLAP / Columnar Stores)
The Buffer Pool
• “A Page Cache”
• Store pages in memory while they are
being read/written to.
• Avoid repeated expensive disk access
• What does this remind you of?
Buffer Pool
Buffer Pool Considerations
• Optimize Page Access for Faster Query Execution
• Pin / Lock pages for efficiency and transactions
• Has more insight into the access patterns than a traditional cache/virtual memory manager
Buffer Pool
How does LRU perform for a typical Join?
for each tuple tr of r do for each tuple ts of s do
Alternate Strategies?
Toss Immediate?
if the tuples tr and ts match
Next Lecture
• Database Indexes I
• Readings: Database System Concepts: Chapter 14.1 – 14.2 and 14.5
• Watch for announcements
• CrustyDB 1 and Rust Primer 2