A great talk by Dan Luu on files. My key takeaways.
Files are literally impossible to use safely and correctly on modern computers.
If you really care about data integrity, you should put the data in a database instead of on disk.
sqlite is particularly good. (Maybe link to their test / design doc.)
On big problem is
fsync, which serves the dual purpose of working as a fence instruction to prevent reordering, and also of clearing all caches. Programmers often need to prevent reordering, but rarely want to clear all caches. The result is that code needs to liberally call
fsync in order to be correct, but code that liberally calls
fsync often has poor performance.
Personally, I (Jon Shea, not Dan Luu) have found the sqlite “Technical and Design Documentation” to be exceptional. The doc for “Atomic Commit In SQLite” is the only in-depth tutorial I have ever found on designing practical on-disk data structures, and “SQLite Query Optimizer Overview” is an outstanding introduction to database query optimizers.
Two misguided workaround that are commonly suggest are 1) write your data in a new file and
rename it into place, or 2) only
append to existing files. Neither of these actions are safe.
rename is not atomic during system crashes, and
append does protect against reordering nor is it atomic.