SimpleDB (Part 1): File Manager

Dec 2024

Every time you query a database, a complex series of actions begin behind the scenes. I’d like to peek behind the curtain and understand how databases work internally.

Recently I’ve been reading Edward Sciore’s Database Design and Implementation. In this series, I’ll try to answer this question using a Rust implementation of SimpleDB.

What we’ll cover

In this post in particular, we’ll build the foundation of a database system by implementing two core components: file management and page handling.

Please see the repo for the full implementation.

note: I am beginner in rust, so if you see anything that needs improvement, please let me know.

Database storage

There are two ways a database system could potentially access data. If you think of it like a library:

block-level access is like going directly to a specific shelf and picking up a specific volume
file-level access is like working with entire sections of the library at once

In a block-level interface, there is the concept of a block, which is mapped to several sectors of the disk. In order to modify the disk:

the sector contents of the block are read into a page
bytes are modified on the page
OS then writes the page back into the block on disk

On the other hand, a file-level interface is a higher level abstraction. The client views the file as a sequence of bytes, with no notion of a block. You can also read/write any number of bytes starting at any position in the file.

Most database engines use a compromise. They store all their data in one or more OS files, and treats each file as a raw ‘disk’. The database engine will access each ‘disk’ using logical file blocks. A logical file block tells you where the block is with respect to the file, but not where the block is on the disk. In comparison to a physical block reference that tells you where the block is on the disk.

The OS takes on the responsibility of mapping the logical block reference to the corresponding physical block. This gives us the best of both worlds: the convenience of file operations with the precision of block-level control.

Implementing core components

Database interface

First, let’s create our main database interface.

Here is the test case that we want to pass. We just want to test that the path we pass in exists and is a directory. Note that we’re using 400 as the block size and 8 as buffer size because Sciore recommends this for learning purposes. Real world database systems use much larger numbers.

use crate::simpledb::SimpleDB;
use tempfile::TempDir;

#[test]
fn test_simpledb_creation() {
  let temp_dir = tempDir::new().unwrap();
  let temp_path = temp_dir.path();

  let _db = SimpleDB::new(temp_path, 400, 8).unwrap();

  assert!(temp_path.exists());
  assert!(temp_path.is_dir());
}

Our SimpleDB struct will provide the entry point for all database interactions.

use crate::file::FileManager;
use std::path::Path;

pub struct SimpleDB {
  file_manager: FileManager,
}

impl SimpleDB {
  pub const BLOCK_SIZE: usize = 400;
  pub const BUFFER_SIZE: u32 = 8;
  pub const LOG_FILE: &'static str = "simpledb.log"

  pub fn new(
    dirname: impl AsRef<Path>,
    block_size: usize,
    buffer_size: u32,
  ) -> std::io {
    let file_manager = FileManager::new(dirname, block_size)?;

    Ok(SimpleDB { file_manager })
  }

  pub fn file_manager(&self) -> &FileManager {
    &self.file_manager
  }
}

Managing files

The FileManager is our bridge to the operating system. It handles three key responsibilities:

Creating and managing the database directory
Tracking open files
Reading and writing blocks of data to the Page

Here’s the basic structure.

use std::{
  collections::HashMap,
  fs::{self, File, OpenOptions},
  io::{self, Read, Seek, SeekFrom, Write},
  path::{Path, PathBuf},
  sync::Mutex,
}

use crate::file::{BlockId, Page}

pub struct FileManager {
  db_directory: PathBuf,
  block_size: usize,
  is_new: bool,
  open_files: Mutex<HashMap<String, File>>,
}

When creating a new FileManager, we need to:

set up the database directory
clean up any temporary files
initialize open files tracking

impl FileManager {
  pub fn new(db_directory: impl AsRef<Path>, block_size: usize) -> io::Result<Self> {
    let db_directory = db_directory.as_ref().to_path_buf();
    let is_new = !db_directory.exists();

    if is_new {
      fs::create_dir_all(&db_directory)?;
    }

    // Clean up temp files
    for let Ok(entries) = fs::read_dir(&db_directory) {
      for entry in entries.flatten() {
        let filename = entry.file_name();
        if filename.to_string_lossy().starts_with("temp") {
          let _ = fs::remove_file(entry.path());
        }
    }

    Ok(Self {
      db_directory,
      block_size,
      is_new,
      open_files: Mutex::new(HashMap::new()),
    })
  }
}

Note that we’re also using Mutex to provide thread-safe access to the open_files HashMap. The FileManager might be accessed from multiple threads in the application, so Mutex ensures that only one thread can access the HashMap at any one time.

Working with Blocks and Pages

To understand how data is stored and retrieved, we need to understand these two concepts:

BlockId: identifies where data lives on disks (files)
Page: holds the actual data in memory

Here’s how they work together:

Implementing BlockId

pub struct BlockId {
  filename: String,
  number: u64
}

impl BlockId {
  pub fn new(filename: impl Into<String>, number: u64) -> Self {
    Self {
      filename: filename.into(),
      number,
    }
  }
}

Implementing Page

The Page will have the following functions:

buffer (vec) to hold the contents of the block
setter functions to convert data into bytes and write it into the buffer
- set_int, set_string, set_bytes
and equivalent getter functions to convert bytes into the appropriate data types
- get_int, get_string, get_bytes
contents that returns a mutable buffer for writing into

use std::convert::TryInto;

pub struct Page {
  buffer: Vec<u8>,
}

impl Page {
  pub fn new(block_size: usize) -> Self {
    Self {
      buffer: vec![0; block_size],
    }
  }

  pub fn from_bytes(bytes: Vec<u8>) -> Self {
    Self { buffer: bytes }
  }

  pub fn get_int(&self, offset: usize) -> i32 {
    let bytes = &self.buffer[offset.. offset + 4];
    i32::from_be_bytes(bytes.try_into().unwrap())
  }

  pub fn set_int(&self, offset: usize, value: i32) {
    let bytes = value.to_be_bytes();
    self.buffer[offset..offset + 4].copy_from_slice(&bytes);
  }

  // Returns a mutable slice for writing
  pub(crate) fn contents(&mut self) -> &mut [u8] {
    &mut self.buffer[..]
  }

  // pub fn get_bytes
  // pub fn set_bytes
  // pub fn get_string
  // pub fn set_string
}

Now that we have our BlockId and Page implementations, we have the building blocks to finish the read and write functions in our FileManager.

Reading data

We want read to:

get the filename from BlockId
figure out the offset from BlockId
seek to the correct block position
read the contents into the page’s buffer

impl FileManager {
  // ...
  pub fn read(&self, block: &BlockId, page: &mut Page) -> io::Result<()> {
    let file = block.file_name();
    let offset = block.number() * self.block_size as u64;

    file.seek(SeekFrom::Start(offset))?;

    let buf = page.contents();
    file.read_exact(buf);

    Ok(())
  }
}

Writing data

And the write function does something similar, but writes to the file using the page’s buffer.

impl FileManager {
  // ...
  pub fn write(&self, block: &BlockId, page: &mut Page) -> io::Result<()> {
    let file = block.file_name();
    let offset = block.number() & self.block_size as u64;

    file.seek(SeekFrom::Start(offset));

    file.write_all(page.contents());
    file.sync_data();
    Ok(())
  }
}

Testing read and write

Now we can write a test to make sure that read/write work as we expect.

// ...
#[cfg(test)]
mod tests {
  use super::*;
  use tempfile::TempDir;

  fn setup() -> (TempDir, FileManager) {
    let temp_dir = TempDir::new().unwrap();
    let fm = FileManager::new(temp_dir.path(), 400).unwrap();
    (temp_dir, fm)
  }


  #[test]
  fn test_read_write_basic() {
    let (_temp_dir, fm) = setup();
    let block = BlockId::new("test.dat".to_string(), 0);

    // Write some data
    let mut write_page = Page::new(400);
    write_page.contens()[0..5].copy_from_slice(b"hello");
    fm.write(&block, &mut write_page).unwrap();

    // Read it back
    let mut read_page = Page::new(400);
    fm.read(&block, &mut read_page).unwrap();

    assert_eq!(&read_page.contents()[0..5], b"hello");
  }
}

With this, we now have a working implementation of a FileManager that interacts with the OS file system and Pager which contains the contents of each block of our ‘disk’ (file).

What we’ve built

In this first part, we’ve implemented these fundamental building blocks:

FileManager: handles disk operations, providing an interface between our database engine and the operating system
BlockId: maps logical blocks to physical blocks
Page: holds data content in memory

The next chapter will deal with memory management.

Have some thoughts on this post? Reply with an email.