intro
这是关于数据库存储的第二部分. 这个学完就可以慢慢开始写proj1了;
tuple的组织方式还是像上一节所述, 需要注意的时slot array
和 tuple
的生长方向;
log structure file organization
与基于tuple的不同, 这种文件组织方式的DBMS
不存储tuples
, 而是存储log records
;
系统会在后面添加append添加logs, 这些log描述了数据库是怎么被修改的:
- insert store the entire tuple;
- deletes mark the tuple as deleted;
- updates contain the data of just the attributes that were modified;
这种文件组织方式, 在读数据的时候, 从后向前读log(也就是先读最后append的log)
来复原数据;
此外可以建立索引, 来可以随机跳到具体的locations in the log;
另外, 需要周期性得对log进行compact(紧凑)
compaction:Compaction coalesces larger log files into smaller files by removing unnecessary records.
The DBMS’s catalogs contain the schema information about tables that the system uses to figure out the tuple’s layout.
然后讲了一堆value得存储, int bool这些;
大值存储
大多数DBMS不允许一个tuple超过一页得大小(4kb, 对于sqlite来说). To store values that are larger than a page, the DBMS uses separate overflow storage pages.
上图这种方式, dbms不能操作external file得data, 因为没有持久化保护(不能实现原子性), 没有事务保护;
workload
The DBMS can store tuples in different ways that are better for either OLTP or OLAP workloads.
We have been assuming the n-ary storage model (aka “row storage”) so far this semester.
nsm的优缺点:
- 优点:
- 快速 插入, 更新, 删除
- 对于需要整个tuple的查询很好
- 缺点:
- Not good for scanning large portions of the table and/or a subset of the attributes.
DSM:
优缺点:
- 优点:
- Reduces the amount wasted I/O because the DBMS only reads the data that it needs.
- Better query processing and data compression
- 缺点:
- Slow for point queries, inserts, updates, and deletes because of tuple splitting/stitching.