* read checksum verification
@ 2023-07-12 18:32 David Arendt
[not found] ` <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: David Arendt @ 2023-07-12 18:32 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi,
I recently had a bad experience with NILFS (not the fault of NILFS).
I used NILFS over ISCSI. I had random block corruption during one week,
silently destroying data until NILFS finally crashed. First of all, I
thought about a NILFS bug, so I created a BTRFS volume and restored the
backup from one week earlier to it. After minutes, the BTRFS volume gave
checksum errors, so the culrprit was found, the ISCSI server. For now I
will use BTRFS on my ISCSI volumes to not have the same situation again
even if I would prefer using NILFS due to continuous checkpointing. If I
can remember well, NILFS creates checksums on block writes. It would
really be a good addition to verify these checksums on read, so
corruption of this type would be noticed within minutes instead of days
or possible never if rare enough. I think it has been mentioned earlier
that NILFS checksum are not suitable for file verification but only for
block verification. I think the most important is to know that something
nasty is going on, even if the details aren't known, so I think it would
be a good addition the have some sort of data checksum verification on
read in NILFS.
Bye,
David Arendt
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>]
* Re: {WHAT?} read checksum verification [not found] ` <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> @ 2023-07-12 22:29 ` Peter Grandi [not found] ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Peter Grandi @ 2023-07-12 22:29 UTC (permalink / raw) To: list Linux fs NILFS > I used NILFS over ISCSI. I had random block corruption during > one week, silently destroying data until NILFS finally > crashed. First of all, I thought about a NILFS bug, so I > created a BTRFS volume I use both for main filesystem and backup for "diversity", and I value NILFS2 because it is very robust (I don't really use either filesystems snapshotting features). > and restored the backup from one week earlier to it. After > minutes, the BTRFS volume gave checksum errors, so the > culrprit was found, the ISCSI server. There used to be a good argument that checksumming (or compressing) data should be end-to-end and checksumming (or compressing) in the filesystem is a bit too much, but when LOGFS and NILFS/nILFS2 were designed I guess CPUs were too slow to checksum everything. Even excellent recent filesystems like F2FS don't do data integrity checking for various reasons though. In theory your iSCSI or its host-adapter should have told you about errors... Many can enable after-write verification (even if its quite expensive). Alternatively some people run regularly silent-corruption detecting daemons if their hardware does not report corruption or it escapes the relevant checks for various reasons: https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf https://storagemojo.com/2007/09/19/cerns-data-corruption-research/ > [...] NILFS creates checksums on block writes. It would really > be a good addition to verify these checksums on read [...] It would be interesting to have data integrity checking or compression in NILFS2, and log-structured filesystem makes that easier (Btrfs code is rather complex instead), but modifying mature and stable filesystems is a risky thing... My understanding is that these checksums are not quite suitable for data integrity checks but are designed for log-sequence recovery, a bit like journal checksums for journal-based filesystems. https://www.spinics.net/lists/linux-nilfs/msg01063.html "nilfs2 store checksums for all data. However, at least the current implementation does not verify it when reading. Actually, the main purpose of the checksums is recovery after unexpected reboot; it does not suit for per-file data verification because the checksums are given per ``log''." ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org>]
* Re: {WHAT?} read checksum verification [not found] ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org> @ 2023-07-13 4:23 ` David Arendt [not found] ` <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> 2023-07-14 19:22 ` David Arendt 1 sibling, 1 reply; 5+ messages in thread From: David Arendt @ 2023-07-13 4:23 UTC (permalink / raw) To: Peter Grandi, list Linux fs NILFS On 7/13/23 00:29, Peter Grandi wrote: >> I used NILFS over ISCSI. I had random block corruption during >> one week, silently destroying data until NILFS finally >> crashed. First of all, I thought about a NILFS bug, so I >> created a BTRFS volume > I use both for main filesystem and backup for "diversity", and I > value NILFS2 because it is very robust (I don't really use > either filesystems snapshotting features). So do I, therefor I said it was not NILFS fault. >> and restored the backup from one week earlier to it. After >> minutes, the BTRFS volume gave checksum errors, so the >> culrprit was found, the ISCSI server. > There used to be a good argument that checksumming (or > compressing) data should be end-to-end and checksumming (or > compressing) in the filesystem is a bit too much, but when LOGFS > and NILFS/nILFS2 were designed I guess CPUs were too slow to > checksum everything. Even excellent recent filesystems like F2FS > don't do data integrity checking for various reasons though. > > In theory your iSCSI or its host-adapter should have told you > about errors... Many can enable after-write verification (even > if its quite expensive). Alternatively some people run regularly > silent-corruption detecting daemons if their hardware does not > report corruption or it escapes the relevant checks for various > reasons: The host adapter can return errors if underlying the disk itself returns them. If bits randomly flip on disk after being written, the host adapter can't know (at least not in non raid scenarios). > https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf > https://storagemojo.com/2007/09/19/cerns-data-corruption-research/ > >> [...] NILFS creates checksums on block writes. It would really >> be a good addition to verify these checksums on read [...] > It would be interesting to have data integrity checking or > compression in NILFS2, and log-structured filesystem makes that > easier (Btrfs code is rather complex instead), but modifying > mature and stable filesystems is a risky thing... > > My understanding is that these checksums are not quite suitable > for data integrity checks but are designed for log-sequence > recovery, a bit like journal checksums for journal-based > filesystems. > > https://www.spinics.net/lists/linux-nilfs/msg01063.html > "nilfs2 store checksums for all data. However, at least the > current implementation does not verify it when reading. > Actually, the main purpose of the checksums is recovery after > unexpected reboot; it does not suit for per-file data > verification because the checksums are given per ``log''." It think exactly this would be interesting, if checksum per log already exist, it would be good to verify these on read. As already said, I am not expecting to know on which file corruption occurred, but it would be nice to know that something nasty is going on. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>]
* Re: {WHAT?} read checksum verification [not found] ` <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> @ 2023-07-13 20:29 ` Ryusuke Konishi 0 siblings, 0 replies; 5+ messages in thread From: Ryusuke Konishi @ 2023-07-13 20:29 UTC (permalink / raw) To: David Arendt; +Cc: Peter Grandi, list Linux fs NILFS On Thu, Jul 13, 2023 at 1:44 PM David Arendt wrote: > > On 7/13/23 00:29, Peter Grandi wrote: > >> I used NILFS over ISCSI. I had random block corruption during > >> one week, silently destroying data until NILFS finally > >> crashed. First of all, I thought about a NILFS bug, so I > >> created a BTRFS volume > > I use both for main filesystem and backup for "diversity", and I > > value NILFS2 because it is very robust (I don't really use > > either filesystems snapshotting features). > So do I, therefor I said it was not NILFS fault. > >> and restored the backup from one week earlier to it. After > >> minutes, the BTRFS volume gave checksum errors, so the > >> culrprit was found, the ISCSI server. > > There used to be a good argument that checksumming (or > > compressing) data should be end-to-end and checksumming (or > > compressing) in the filesystem is a bit too much, but when LOGFS > > and NILFS/nILFS2 were designed I guess CPUs were too slow to > > checksum everything. Even excellent recent filesystems like F2FS > > don't do data integrity checking for various reasons though. > > > > In theory your iSCSI or its host-adapter should have told you > > about errors... Many can enable after-write verification (even > > if its quite expensive). Alternatively some people run regularly > > silent-corruption detecting daemons if their hardware does not > > report corruption or it escapes the relevant checks for various > > reasons: > > The host adapter can return errors if underlying the disk itself returns > them. If bits randomly flip on disk after being written, the host > adapter can't know (at least not in non raid scenarios). > I recommend replacing unreliable block layers first. The reliability of the filesystem depends heavily on that of the block layer, and the block device must be sufficiently reliable. In general, the various checks and reliability measures that filesystems and operating systems have are insufficient to compensate for defective or unreliable block devices. Problem-prone devices are difficult to use regularly, even if errors can be detected. Putting that premise aside for a moment, if you want to take advantage of both properties of retroactive snapshotting (or robustness) that nilfs2 provides and data integrity checking, a short-term solution might be to try dm-integrity[1] and nilfs2 together. [1] https://docs.kernel.org/admin-guide/device-mapper/dm-integrity.html The block device provided by dm-integrity will return an I/O error when there is a problem with integrity, so nilfs2 should be able to detect it. For example, how to use dm-integrity and nilfs2 together is as follows: $ sudo integritysetup format /dev/<your-device> $ sudo integritysetup open /dev/<your-device> mydata $ sudo mkfs -t nilfs2 /dev/mapper/mydata $ sudo mount -t nilfs2 /dev/mapper/mydata /mnt/mydata (It might be worth mentioning this in the FAQ on the NILFS project web site.) Since dm-integrity is a dedicated function, you can specify detailed options according to the integrity requirements you want to achieve. It seems to work stably even when combined with nilfs2. I don't know of a convenient way to periodically check for device bit rot or sector data corruption, but I think that a somewhat forcible method is to read the block device with the dd command: $ sudo dd if=/dev/mapper/mydata of=/dev/null bs=8M > > https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf > > https://storagemojo.com/2007/09/19/cerns-data-corruption-research/ > > > >> [...] NILFS creates checksums on block writes. It would really > >> be a good addition to verify these checksums on read [...] > > It would be interesting to have data integrity checking or > > compression in NILFS2, and log-structured filesystem makes that > > easier (Btrfs code is rather complex instead), but modifying > > mature and stable filesystems is a risky thing... > > > > My understanding is that these checksums are not quite suitable > > for data integrity checks but are designed for log-sequence > > recovery, a bit like journal checksums for journal-based > > filesystems. > > > > https://www.spinics.net/lists/linux-nilfs/msg01063.html > > "nilfs2 store checksums for all data. However, at least the > > current implementation does not verify it when reading. > > Actually, the main purpose of the checksums is recovery after > > unexpected reboot; it does not suit for per-file data > > verification because the checksums are given per ``log''." > > It think exactly this would be interesting, if checksum per log already > exist, it would be good to verify these on read. As already said, I am > not expecting to know on which file corruption occurred, but it would be > nice to know that something nasty is going on. > It's tricky to use crc in logs for integrity checking on file data read, so I think we should think of other ways if we implement it. On the one hand, this might be useful for background checks. I mean block device data anomaly detection can be implemented as a background or one-shot running, user space or kernel module function. In any case, features that are not suitable for everyday use may end up being turned off on a regular basis, resulting in unused features that only add patterns for regression testing. I would like to avoid this. Also, I don't know about the future, but I'm still focused on solving the problems reported by syzbot, and even after that is over, I'd like to review and fix various implementations that have become obsolete for the latest Linux kernel, so honestly I don't have much energy to start such a function right now. However, I think it's good to have discussions like this from time to time as it will be a good confirmation of the current situation. Thank you. Ryusuke Konishi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: {WHAT?} read checksum verification [not found] ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org> 2023-07-13 4:23 ` David Arendt @ 2023-07-14 19:22 ` David Arendt 1 sibling, 0 replies; 5+ messages in thread From: David Arendt @ 2023-07-14 19:22 UTC (permalink / raw) To: list Linux fs NILFS Hi, Quoting was not possible as I deleted the original mail to fast, sorry for this. This reply is concering Ryusuke Konishis mail. First of all, you are 100% right, the underlying block layer should never return corrupted data, but bad things are unfortunately happing. In this case a bug in the iscsi server itself was the culprit and is now fixed. I did several tests by untaring the content a an elastalert container in order to compare real world performance if someone is interested: nilfs2 directly on iscsi: tar -xpf /tmp/elastalert.tar 0.91s user 5.19s system 6% cpu 1:25.71 total nilfs2 with underlying dm-integrity in journal mode: tar -xpf /tmp/elastalert.tar 1.04s user 5.23s system 3% cpu 3:18.90 total nilfs2 with underlying dm-integrity in bitmap mode: tar -xpf /tmp/elastalert.tar 1.00s user 5.17s system 4% cpu 2:15.04 total nilfs2 with underlying dm-integrity in direct mode: tar -xpf /tmp/elastalert.tar 1.10s user 5.33s system 7% cpu 1:26.27 total read performance in all for tests after an unmount/remount: tar -cf /dev/null . 1.11s user 1.80s system 5% cpu 51.300 total Another test was writing 1024 bytes random garbage on the dm-integrity underlying device and doing again a tar -cf /dev/null . Result: [ 6005.098464] device-mapper: integrity: dm-0: Checksum failed at sector 0x2200f [ 6005.098484] NILFS error (device dm-0): nilfs_readdir: bad page in #4031 [ 6005.170770] Remounting filesystem read-only So dm-integrity seems effectively a good choice. I don't now Ryusuke if you still remember me, I was the contributor of the "allow cleanerd to suspend GC based on the number of free segments patches" long time ago :-) Bye, David Arendt ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-07-14 19:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-12 18:32 read checksum verification David Arendt
[not found] ` <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2023-07-12 22:29 ` {WHAT?} " Peter Grandi
[not found] ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org>
2023-07-13 4:23 ` David Arendt
[not found] ` <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2023-07-13 20:29 ` Ryusuke Konishi
2023-07-14 19:22 ` David Arendt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).