linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* read checksum verification
@ 2023-07-12 18:32 David Arendt
       [not found] ` <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: David Arendt @ 2023-07-12 18:32 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I recently had a bad experience with NILFS (not the fault of NILFS).

I used NILFS over ISCSI. I had random block corruption during one week, 
silently destroying data until NILFS finally crashed. First of all, I 
thought about a NILFS bug, so I created a BTRFS volume and restored the 
backup from one week earlier to it. After minutes, the BTRFS volume gave 
checksum errors, so the culrprit was found, the ISCSI server. For now I 
will use BTRFS on my ISCSI volumes to not have the same situation again 
even if I would prefer using NILFS due to continuous checkpointing. If I 
can remember well, NILFS creates checksums on block writes. It would 
really be a good addition to verify these checksums on read, so 
corruption of this type would be noticed within minutes instead of days 
or possible never if rare enough. I think it has been mentioned earlier 
that NILFS checksum are not suitable for file verification but only for 
block verification. I think the most important is to know that something 
nasty is going on, even if the details aren't known, so I think it would 
be a good addition the have some sort of data checksum verification on 
read in NILFS.

Bye,

David Arendt


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: {WHAT?} read checksum verification
       [not found] ` <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2023-07-12 22:29   ` Peter Grandi
       [not found]     ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Grandi @ 2023-07-12 22:29 UTC (permalink / raw)
  To: list Linux fs NILFS

> I used NILFS over ISCSI. I had random block corruption during
> one week, silently destroying data until NILFS finally
> crashed. First of all, I thought about a NILFS bug, so I
> created a BTRFS volume

I use both for main filesystem and backup for "diversity", and I
value NILFS2 because it is very robust (I don't really use
either filesystems snapshotting features).

> and restored the backup from one week earlier to it. After
> minutes, the BTRFS volume gave checksum errors, so the
> culrprit was found, the ISCSI server.

There used to be a good argument that checksumming (or
compressing) data should be end-to-end and checksumming (or
compressing) in the filesystem is a bit too much, but when LOGFS
and NILFS/nILFS2 were designed I guess CPUs were too slow to
checksum everything. Even excellent recent filesystems like F2FS
don't do data integrity checking for various reasons though.

In theory your iSCSI or its host-adapter should have told you
about errors... Many can enable after-write verification (even
if its quite expensive). Alternatively some people run regularly
silent-corruption detecting daemons if their hardware does not
report corruption or it escapes the relevant checks for various
reasons:

https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf
https://storagemojo.com/2007/09/19/cerns-data-corruption-research/

> [...] NILFS creates checksums on block writes. It would really
> be a good addition to verify these checksums on read [...]

It would be interesting to have data integrity checking or
compression in NILFS2, and log-structured filesystem makes that
easier (Btrfs code is rather complex instead), but modifying
mature and stable filesystems is a risky thing...

My understanding is that these checksums are not quite suitable
for data integrity checks but are designed for log-sequence
recovery, a bit like journal checksums for journal-based
filesystems.

https://www.spinics.net/lists/linux-nilfs/msg01063.html
"nilfs2 store checksums for all data. However, at least the
current implementation does not verify it when reading.
Actually, the main purpose of the checksums is recovery after
unexpected reboot; it does not suit for per-file data
verification because the checksums are given per ``log''."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: {WHAT?} read checksum verification
       [not found]     ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org>
@ 2023-07-13  4:23       ` David Arendt
       [not found]         ` <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  2023-07-14 19:22       ` David Arendt
  1 sibling, 1 reply; 5+ messages in thread
From: David Arendt @ 2023-07-13  4:23 UTC (permalink / raw)
  To: Peter Grandi, list Linux fs NILFS

On 7/13/23 00:29, Peter Grandi wrote:
>> I used NILFS over ISCSI. I had random block corruption during
>> one week, silently destroying data until NILFS finally
>> crashed. First of all, I thought about a NILFS bug, so I
>> created a BTRFS volume
> I use both for main filesystem and backup for "diversity", and I
> value NILFS2 because it is very robust (I don't really use
> either filesystems snapshotting features).
So do I, therefor I said it was not NILFS fault.
>> and restored the backup from one week earlier to it. After
>> minutes, the BTRFS volume gave checksum errors, so the
>> culrprit was found, the ISCSI server.
> There used to be a good argument that checksumming (or
> compressing) data should be end-to-end and checksumming (or
> compressing) in the filesystem is a bit too much, but when LOGFS
> and NILFS/nILFS2 were designed I guess CPUs were too slow to
> checksum everything. Even excellent recent filesystems like F2FS
> don't do data integrity checking for various reasons though.
>
> In theory your iSCSI or its host-adapter should have told you
> about errors... Many can enable after-write verification (even
> if its quite expensive). Alternatively some people run regularly
> silent-corruption detecting daemons if their hardware does not
> report corruption or it escapes the relevant checks for various
> reasons:

The host adapter can return errors if underlying the disk itself returns 
them. If bits randomly flip on disk after being written, the host 
adapter can't know (at least not in non raid scenarios).

> https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf
> https://storagemojo.com/2007/09/19/cerns-data-corruption-research/
>
>> [...] NILFS creates checksums on block writes. It would really
>> be a good addition to verify these checksums on read [...]
> It would be interesting to have data integrity checking or
> compression in NILFS2, and log-structured filesystem makes that
> easier (Btrfs code is rather complex instead), but modifying
> mature and stable filesystems is a risky thing...
>
> My understanding is that these checksums are not quite suitable
> for data integrity checks but are designed for log-sequence
> recovery, a bit like journal checksums for journal-based
> filesystems.
>
> https://www.spinics.net/lists/linux-nilfs/msg01063.html
> "nilfs2 store checksums for all data. However, at least the
> current implementation does not verify it when reading.
> Actually, the main purpose of the checksums is recovery after
> unexpected reboot; it does not suit for per-file data
> verification because the checksums are given per ``log''."

It think exactly this would be interesting, if checksum per log already 
exist, it would be good to verify these on read. As already said, I am 
not expecting to know on which file corruption occurred, but it would be 
nice to know that something nasty is going on.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: {WHAT?} read checksum verification
       [not found]         ` <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2023-07-13 20:29           ` Ryusuke Konishi
  0 siblings, 0 replies; 5+ messages in thread
From: Ryusuke Konishi @ 2023-07-13 20:29 UTC (permalink / raw)
  To: David Arendt; +Cc: Peter Grandi, list Linux fs NILFS

On Thu, Jul 13, 2023 at 1:44 PM David Arendt wrote:
>
> On 7/13/23 00:29, Peter Grandi wrote:
> >> I used NILFS over ISCSI. I had random block corruption during
> >> one week, silently destroying data until NILFS finally
> >> crashed. First of all, I thought about a NILFS bug, so I
> >> created a BTRFS volume
> > I use both for main filesystem and backup for "diversity", and I
> > value NILFS2 because it is very robust (I don't really use
> > either filesystems snapshotting features).
> So do I, therefor I said it was not NILFS fault.
> >> and restored the backup from one week earlier to it. After
> >> minutes, the BTRFS volume gave checksum errors, so the
> >> culrprit was found, the ISCSI server.
> > There used to be a good argument that checksumming (or
> > compressing) data should be end-to-end and checksumming (or
> > compressing) in the filesystem is a bit too much, but when LOGFS
> > and NILFS/nILFS2 were designed I guess CPUs were too slow to
> > checksum everything. Even excellent recent filesystems like F2FS
> > don't do data integrity checking for various reasons though.
> >
> > In theory your iSCSI or its host-adapter should have told you
> > about errors... Many can enable after-write verification (even
> > if its quite expensive). Alternatively some people run regularly
> > silent-corruption detecting daemons if their hardware does not
> > report corruption or it escapes the relevant checks for various
> > reasons:
>
> The host adapter can return errors if underlying the disk itself returns
> them. If bits randomly flip on disk after being written, the host
> adapter can't know (at least not in non raid scenarios).
>

I recommend replacing unreliable block layers first.
The reliability of the filesystem depends heavily on that of the block
layer, and the block device must be sufficiently reliable.

In general, the various checks and reliability measures that
filesystems and operating systems have are insufficient to compensate
for defective or unreliable block devices.  Problem-prone devices are
difficult to use regularly, even if errors can be detected.

Putting that premise aside for a moment, if you want to take advantage
of both properties of retroactive snapshotting (or robustness) that
nilfs2 provides and data integrity checking, a short-term solution
might be to try dm-integrity[1] and nilfs2 together.

[1]  https://docs.kernel.org/admin-guide/device-mapper/dm-integrity.html

The block device provided by dm-integrity will return an I/O error
when there is a problem with integrity, so nilfs2 should be able to
detect it.

For example, how to use dm-integrity and nilfs2 together is as follows:

$ sudo integritysetup format /dev/<your-device>
$ sudo integritysetup open /dev/<your-device> mydata
$ sudo mkfs -t nilfs2 /dev/mapper/mydata
$ sudo mount -t nilfs2 /dev/mapper/mydata /mnt/mydata

(It might be worth mentioning this in the FAQ on the NILFS project web site.)

Since dm-integrity is a dedicated function, you can specify detailed
options according to the integrity requirements you want to achieve.
It seems to work stably even when combined with nilfs2.

I don't know of a convenient way to periodically check for device bit
rot or sector data corruption, but I think that a somewhat forcible
method is to read the block device with the dd command:

$ sudo dd if=/dev/mapper/mydata of=/dev/null bs=8M

> > https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf
> > https://storagemojo.com/2007/09/19/cerns-data-corruption-research/
> >
> >> [...] NILFS creates checksums on block writes. It would really
> >> be a good addition to verify these checksums on read [...]
> > It would be interesting to have data integrity checking or
> > compression in NILFS2, and log-structured filesystem makes that
> > easier (Btrfs code is rather complex instead), but modifying
> > mature and stable filesystems is a risky thing...
> >
> > My understanding is that these checksums are not quite suitable
> > for data integrity checks but are designed for log-sequence
> > recovery, a bit like journal checksums for journal-based
> > filesystems.
> >
> > https://www.spinics.net/lists/linux-nilfs/msg01063.html
> > "nilfs2 store checksums for all data. However, at least the
> > current implementation does not verify it when reading.
> > Actually, the main purpose of the checksums is recovery after
> > unexpected reboot; it does not suit for per-file data
> > verification because the checksums are given per ``log''."
>
> It think exactly this would be interesting, if checksum per log already
> exist, it would be good to verify these on read. As already said, I am
> not expecting to know on which file corruption occurred, but it would be
> nice to know that something nasty is going on.
>

It's tricky to use crc in logs for integrity checking on file data
read, so I think we should think of other ways if we implement it.

On the one hand, this might be useful for background checks.
I mean block device data anomaly detection can be implemented as a
background or one-shot running, user space or kernel module function.

In any case, features that are not suitable for everyday use may end
up being turned off on a regular basis, resulting in unused features
that only add patterns for regression testing.  I would like to avoid
this.

Also, I don't know about the future, but I'm still focused on solving
the problems reported by syzbot, and even after that is over, I'd like
to review and fix various implementations that have become obsolete
for the latest Linux kernel, so honestly I don't have much energy to
start such a function right now.   However, I think it's good to have
discussions like this from time to time as it will be a good
confirmation of the current situation.

Thank you.

Ryusuke Konishi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: {WHAT?} read checksum verification
       [not found]     ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org>
  2023-07-13  4:23       ` David Arendt
@ 2023-07-14 19:22       ` David Arendt
  1 sibling, 0 replies; 5+ messages in thread
From: David Arendt @ 2023-07-14 19:22 UTC (permalink / raw)
  To: list Linux fs NILFS

Hi,

Quoting was not possible as I deleted the original mail to fast, sorry 
for this. This reply is concering Ryusuke Konishis mail.

First of all, you are 100% right, the underlying block layer should 
never return corrupted data, but bad things are unfortunately happing. 
In this case a bug in the iscsi server itself was the culprit and is now 
fixed.

I did several tests by untaring the content a an elastalert container in 
order to compare real world performance if someone is interested:

nilfs2 directly on iscsi: tar -xpf /tmp/elastalert.tar  0.91s user 5.19s 
system 6% cpu 1:25.71 total

nilfs2 with underlying dm-integrity in journal mode: tar -xpf 
/tmp/elastalert.tar  1.04s user 5.23s system 3% cpu 3:18.90 total

nilfs2 with underlying dm-integrity in bitmap mode: tar -xpf 
/tmp/elastalert.tar  1.00s user 5.17s system 4% cpu 2:15.04 total

nilfs2 with underlying dm-integrity in direct mode: tar -xpf 
/tmp/elastalert.tar  1.10s user 5.33s system 7% cpu 1:26.27 total

read performance in all for tests after an unmount/remount: tar -cf 
/dev/null .  1.11s user 1.80s system 5% cpu 51.300 total

Another test was writing 1024 bytes random garbage on the dm-integrity 
underlying device and doing again a tar -cf /dev/null .

Result:

[ 6005.098464] device-mapper: integrity: dm-0: Checksum failed at sector 
0x2200f
[ 6005.098484] NILFS error (device dm-0): nilfs_readdir: bad page in #4031
[ 6005.170770] Remounting filesystem read-only

So dm-integrity seems effectively a good choice.

I don't now Ryusuke if you still remember me, I was the contributor of 
the "allow cleanerd to suspend GC based on the number of free segments 
patches" long time ago :-)

Bye,

David Arendt


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-14 19:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-12 18:32 read checksum verification David Arendt
     [not found] ` <174f995c-e794-74c4-24d6-52451f3f3f28-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2023-07-12 22:29   ` {WHAT?} " Peter Grandi
     [not found]     ` <25775.10549.41499.886957-Lv72GqZ7opur1GY8YIlKTvXRex20P6io@public.gmane.org>
2023-07-13  4:23       ` David Arendt
     [not found]         ` <b99d2029-9b96-a016-875e-09b208c0ab9c-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2023-07-13 20:29           ` Ryusuke Konishi
2023-07-14 19:22       ` David Arendt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).