Cyan's Blog

Search

Search IconIcon to open search

CMU15-445_3_Lecture_Note

Last updated Oct 20, 2021 Edit Source

# 03 - Database Storage I

2021-10-20

Tags: #Database

# Outline

Different Layers of the whole system, One layer at a time, from bottom to top.

# Disk-Oriented Architecture

How it works: Syllabus:

# Why not use the OS?

# Virtual Memory?

The OS only sees a bunch of reads and writes, the DBMS (almost) always wants to control things itself and can do a better job at it.

mmap: the way virtual memory works:

Problem #1: How the DBMS represents the database in files on disk:

(Later) Problem #2: How the DBMS manages its memory and move data back-and-forth from disk.

# Problem #1

# File Storage

The DBMS stores a database as one or more files on disk, In the meanwhile, The OS doesn’t know anything about the contents of these files.

# Who do the work?

The storage manager → responsible for maintaining a database’s files. → Some do their own scheduling for reads and writes to improve spatial and temporal locality of pages.

# How?

It organizes the files as a collection of pages.

A page is a fixed-size block of data. → Most systems do NOT mix page types. → Some systems require a page to be self-contained.

Each page is given a unique identifier. → The DBMS uses an indirection layer to map page ids to physical locations.

There are three different notions of “pages” in a DBMS: → Hardware Page (usually 4KB) → OS Page (usually 4KB) → Database Page (512B-16KB)

# Page Storage Architecture

# Heap File Organization
# Linked List

# Page Directory

# Page Layout

# How to organize the data stored inside of the page?

Assume we only store tuples;

Two approaches: → Tuple-oriented → Log-structured

# Tuple-oriented
# Slotted Pages

# Log Structured File Organization

# Tuple Layout

# Tuple header

→ Visibility info (concurrency control) → Bit Map for NULL values.

# Tuple Data

# Denormalized Tuple Dataq

# Record IDs

In order to keep track of individual tuples, each tuple is assigned a unique record identifier. → Most common: page_id + offset/slot → Can also contain file location info.

An application cannot rely on these ids to mean anything.

It changes!