A great talk by Dan Luu on files. My key takeaways.
Files are literally impossible to use safely and correctly on modern computers.
If you really care about data integrity, you should put the data in a database instead of on disk. sqlite
is particularly good. (Maybe link to their test / design doc.)
On big problem is fsync
, which serves the dual purpose of working as a fence instruction to prevent reordering, and also of clearing all caches. Programmers often need to prevent reordering, but rarely want to clear all caches. The result is that code needs to liberally call fsync
in order to be correct, but code that liberally calls fsync
often has poor performance.
Personally, I (Jon Shea, not Dan Luu) have found the sqlite “Technical and Design Documentation” to be exceptional. The doc for “Atomic Commit In SQLite” is the only in-depth tutorial I have ever found on designing practical on-disk data structures, and “SQLite Query Optimizer Overview” is an outstanding introduction to database query optimizers.
Two misguided workaround that are commonly suggest are 1) write your data in a new file and rename
it into place, or 2) only append
to existing files. Neither of these actions are safe. rename
is not atomic during system crashes, and append
does protect against reordering nor is it atomic.