On 2015-10-19 02:19, Erkki Seppala wrote: > Hugo Mills writes: >> It has to be disabled because if you enable it, there's a race >> condition: since you're overwriting existing data (rather than CoWing >> it), you can't update the checksums atomically. So, in the interests >> of consistency, checksums are disabled. > > I suppose this has been suggested before, but couldn't it store both the > new and the old checksums and be satisfied if either of them match? Actually, I don't think that's been suggested before, read on however for an explanation of why we don't do that. > > The user is probably not happy that a partial write is going to be > difficult to read from the device due to a checksum error, but there is > no promise of recently-overwritten data state with traditional > filesystems either in case of sudden powerdown, assuming there is no > data journaling.. And that is exactly the case with how things are now, when something is marked NOCOW, it has essentially zero guarantee of data consistency after a crash. As things are now though, there is a guarantee that you can still read the file, but using checksums like you suggest would result in it being unreadable most of the time, because it's statistically unlikely that we wrote the _whole_ block (IOW, we can't guarantee without COW that the data was completely written) because: a. While some disks do atomically write single sectors, most don't, and if the power dies during the disk writing a single sector, there is no certainty exactly what that sector will read back as. b. Assuming that item a is not an issue, one block in BTRFS is usually multiple sectors on disk, and a majority of disks have volatile write caches, thus it is not unlikely that the power will die during the process of writing the block. c. In the event that both items a and b are not an issue (for example, you have a storage controller with a non-volatile write cache, have write caching turned off on the disks, and it's a smart enough storage controller that it only removes writes from the cache after they return), then there is still the small but distinct possibility that the crash will cause either corruption in the write cache, or some other hardware related issue.