SimpleDB (Part 1): File Manager
Contents
Every time you query a database, a complex series of actions begin behind the scenes. I’d like to peek behind the curtain and understand how databases work internally.
Recently I’ve been reading Edward Sciore’s Database Design and Implementation. In this series, I’ll try to answer this question using a Rust implementation of SimpleDB.
What we’ll cover
In this post in particular, we’ll build the foundation of a database system by implementing two core components: file management and page handling.
Please see the repo for the full implementation.
note: I am beginner in rust, so if you see anything that needs improvement, please let me know.
Database storage
There are two ways a database system could potentially access data. If you think of it like a library:
- block-level access is like going directly to a specific shelf and picking up a specific volume
- file-level access is like working with entire sections of the library at once
In a block-level interface, there is the concept of a block
, which is mapped to several sectors of the disk. In order to modify the disk:
- the sector contents of the block are read into a page
- bytes are modified on the page
- OS then writes the page back into the block on disk
On the other hand, a file-level interface is a higher level abstraction. The client views the file as a sequence of bytes, with no notion of a block. You can also read/write any number of bytes starting at any position in the file.
Most database engines use a compromise. They store all their data in one or more OS files, and treats each file as a raw ‘disk’. The database engine will access each ‘disk’ using logical file blocks. A logical file block tells you where the block is with respect to the file, but not where the block is on the disk. In comparison to a physical block reference that tells you where the block is on the disk.
The OS takes on the responsibility of mapping the logical block reference to the corresponding physical block. This gives us the best of both worlds: the convenience of file operations with the precision of block-level control.
Implementing core components
Database interface
First, let’s create our main database interface.
Here is the test case that we want to pass. We just want to test that the path we pass in exists and is a directory. Note that we’re using 400
as the block size and 8
as buffer size because Sciore recommends this for learning purposes. Real world database systems use much larger numbers.
use crate::simpledb::SimpleDB;use tempfile::TempDir;
#[test]fn test_simpledb_creation() { let temp_dir = tempDir::new().unwrap(); let temp_path = temp_dir.path();
let _db = SimpleDB::new(temp_path, 400, 8).unwrap();
assert!(temp_path.exists()); assert!(temp_path.is_dir());}
Our SimpleDB
struct will provide the entry point for all database interactions.
use crate::file::FileManager;use std::path::Path;
pub struct SimpleDB { file_manager: FileManager,}
impl SimpleDB { pub const BLOCK_SIZE: usize = 400; pub const BUFFER_SIZE: u32 = 8; pub const LOG_FILE: &'static str = "simpledb.log"
pub fn new( dirname: impl AsRef<Path>, block_size: usize, buffer_size: u32, ) -> std::io { let file_manager = FileManager::new(dirname, block_size)?;
Ok(SimpleDB { file_manager }) }
pub fn file_manager(&self) -> &FileManager { &self.file_manager }}
Managing files
The FileManager
is our bridge to the operating system. It handles three key responsibilities:
- Creating and managing the database directory
- Tracking open files
- Reading and writing blocks of data to the
Page
Here’s the basic structure.
use std::{ collections::HashMap, fs::{self, File, OpenOptions}, io::{self, Read, Seek, SeekFrom, Write}, path::{Path, PathBuf}, sync::Mutex,}
use crate::file::{BlockId, Page}
pub struct FileManager { db_directory: PathBuf, block_size: usize, is_new: bool, open_files: Mutex<HashMap<String, File>>,}
When creating a new FileManager
, we need to:
- set up the database directory
- clean up any temporary files
- initialize open files tracking
impl FileManager { pub fn new(db_directory: impl AsRef<Path>, block_size: usize) -> io::Result<Self> { let db_directory = db_directory.as_ref().to_path_buf(); let is_new = !db_directory.exists();
if is_new { fs::create_dir_all(&db_directory)?; }
// Clean up temp files for let Ok(entries) = fs::read_dir(&db_directory) { for entry in entries.flatten() { let filename = entry.file_name(); if filename.to_string_lossy().starts_with("temp") { let _ = fs::remove_file(entry.path()); } }
Ok(Self { db_directory, block_size, is_new, open_files: Mutex::new(HashMap::new()), }) }}
Note that we’re also using Mutex
to provide thread-safe access to the open_files
HashMap. The FileManager
might be accessed from multiple threads in the application, so Mutex
ensures that only one thread can access the HashMap at any one time.
Working with Blocks and Pages
To understand how data is stored and retrieved, we need to understand these two concepts:
- BlockId: identifies where data lives on disks (files)
- Page: holds the actual data in memory
Here’s how they work together:
Implementing BlockId
pub struct BlockId { filename: String, number: u64}
impl BlockId { pub fn new(filename: impl Into<String>, number: u64) -> Self { Self { filename: filename.into(), number, } }}
Implementing Page
The Page
will have the following functions:
- buffer (
vec
) to hold the contents of the block - setter functions to convert data into bytes and write it into the buffer
set_int
,set_string
,set_bytes
- and equivalent getter functions to convert bytes into the appropriate data types
get_int
,get_string
,get_bytes
contents
that returns a mutable buffer for writing into
use std::convert::TryInto;
pub struct Page { buffer: Vec<u8>,}
impl Page { pub fn new(block_size: usize) -> Self { Self { buffer: vec![0; block_size], } }
pub fn from_bytes(bytes: Vec<u8>) -> Self { Self { buffer: bytes } }
pub fn get_int(&self, offset: usize) -> i32 { let bytes = &self.buffer[offset.. offset + 4]; i32::from_be_bytes(bytes.try_into().unwrap()) }
pub fn set_int(&self, offset: usize, value: i32) { let bytes = value.to_be_bytes(); self.buffer[offset..offset + 4].copy_from_slice(&bytes); }
// Returns a mutable slice for writing pub(crate) fn contents(&mut self) -> &mut [u8] { &mut self.buffer[..] }
// pub fn get_bytes // pub fn set_bytes // pub fn get_string // pub fn set_string}
Now that we have our BlockId
and Page
implementations, we have the building blocks to finish the read
and write
functions in our FileManager
.
Reading data
We want read
to:
- get the filename from
BlockId
- figure out the offset from
BlockId
- seek to the correct block position
- read the contents into the page’s buffer
impl FileManager { // ... pub fn read(&self, block: &BlockId, page: &mut Page) -> io::Result<()> { let file = block.file_name(); let offset = block.number() * self.block_size as u64;
file.seek(SeekFrom::Start(offset))?;
let buf = page.contents(); file.read_exact(buf);
Ok(()) }}
Writing data
And the write
function does something similar, but writes to the file using the page’s buffer.
impl FileManager { // ... pub fn write(&self, block: &BlockId, page: &mut Page) -> io::Result<()> { let file = block.file_name(); let offset = block.number() & self.block_size as u64;
file.seek(SeekFrom::Start(offset));
file.write_all(page.contents()); file.sync_data(); Ok(()) }}
Testing read and write
Now we can write a test to make sure that read
/write
work as we expect.
// ...#[cfg(test)]mod tests { use super::*; use tempfile::TempDir;
fn setup() -> (TempDir, FileManager) { let temp_dir = TempDir::new().unwrap(); let fm = FileManager::new(temp_dir.path(), 400).unwrap(); (temp_dir, fm) }
#[test] fn test_read_write_basic() { let (_temp_dir, fm) = setup(); let block = BlockId::new("test.dat".to_string(), 0);
// Write some data let mut write_page = Page::new(400); write_page.contens()[0..5].copy_from_slice(b"hello"); fm.write(&block, &mut write_page).unwrap();
// Read it back let mut read_page = Page::new(400); fm.read(&block, &mut read_page).unwrap();
assert_eq!(&read_page.contents()[0..5], b"hello"); }}
With this, we now have a working implementation of a FileManager
that interacts with the OS file system and Pager
which contains the contents of each block of our ‘disk’ (file).
What we’ve built
In this first part, we’ve implemented these fundamental building blocks:
FileManager
: handles disk operations, providing an interface between our database engine and the operating systemBlockId
: maps logical blocks to physical blocksPage
: holds data content in memory
Next
The next chapter will deal with memory management.
Have some thoughts on this post? Reply with an email.