All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hubert Kario <hka@qbs.com.pl>
To: Paul Millar <paul.millar@desy.de>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: A couple of questions
Date: Thu, 27 May 2010 16:56:00 +0200	[thread overview]
Message-ID: <201005271656.00398.hka@qbs.com.pl> (raw)
In-Reply-To: <201005271539.55644.paul.millar@desy.de>

On Thursday 27 May 2010 15:39:54 Paul Millar wrote:
> Hi,
>=20
> I've been looking at Btrfs and have a couple of naive questions that =
don't
> seem to be answered on the wiki or in the articles I've read on the
> filesystem.
>=20
>=20
> First: discovering a file's checksum value.
>=20
> Here's the scenario: software is writing some data as a fresh file.  =
This
> software happens to know (a priori) the checksum of this data; for ex=
ample,
> a storage server receives the file's data and checksum independently.
>=20
> I've some confidence that, once the data is stored in btrfs, any corr=
uption
> (from the storage fabric) will be spotted; however, the data may have
> became corrupt before being stored (e.g., from the network).  To catc=
h
> this, the checksum of the stored data needs to be calculated and chec=
ked.
>=20
> One approach is to calculate the checksum (in user-space) after the d=
ata is
> stored.  This adds extra IO- and CPU-load and there's also the possib=
ility
> of false-negative results due to the filesystem cache (although btrfs=
 may
> remove this risk).
>=20
> Another approach would be to ask btrfs for the checksum.  It seems th=
at
> it's possible to combine multiple CRC-32C values to figure out the
> checksum of the combined data [e.g., zlib's crc32_combine() function]=
=2E=20
> So, obtaining a file's checksum might be a light-weight operation.
>=20
> Yet another possibility would be to push the desired checksum value (=
via
> fcntl?) and have btrfs compare the desired checksum with the file's a=
ctual
> checksum on close(2), failing that call if the checksums don't match.
>=20
> Would any of this be possible (without an awful lot of work)?

IMO, if an application recieves data with checksum it can calculate the=
=20
checksum of data on the fly, as it writes it to the disk. It won't add =
any=20
additional IO to storage subsystem. It won't detect in-memory corruptio=
n=20
though, but if you want to be resilant to this, you should be looking a=
t ECC=20
RAM as subsequent checks can be affected by it to.

Second, you shouldn't tie application or network protocol to a CRC sche=
me used=20
by filesystem on server! Especially when there can be other CRC algorit=
hms=20
used, not only CRC-32C.

If the checksum algorithm used by FS was set in stone, then userspace c=
ould=20
employ it somehow, but if there can be different CRCs used, I see no re=
ason to=20
allow the userspace to read them.


--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=C3=B3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl

System Zarz=C4=85dzania Jako=C5=9Bci=C4=85
zgodny z norm=C4=85 ISO 9001:2000
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-05-27 14:56 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-27 13:39 A couple of questions Paul Millar
2010-05-27 14:56 ` Hubert Kario [this message]
2010-05-31 17:59   ` Paul Millar
2010-06-02 16:19     ` Hubert Kario
2010-05-27 16:00 ` Chris Mason
2010-05-31 18:06   ` Paul Millar
2010-05-31 20:33     ` Mike Fedyk
2010-06-02 11:56       ` Paul Millar
2010-06-01 13:39     ` Martin K. Petersen
2010-06-02 13:40       ` Paul Millar
2010-06-04  1:17         ` Martin K. Petersen
  -- strict thread matches above, loose matches on Subject: below --
2005-04-18 11:51 Imre Simon
2005-04-18 15:31 ` Linus Torvalds
2005-04-18 16:23   ` Paul Jackson
2002-05-17 15:27 Steve Pratt
2002-05-17 13:11 berthiaume_wayne
2002-05-17 16:03 ` Kuba Ober
2002-05-16 18:48 Steve Pratt
2002-05-16 18:44 Steve Pratt
2002-05-16 18:55 ` Oleg Drokin
2002-05-16 20:33 ` Hans Reiser
2002-05-16 21:23   ` Kuba Ober
2002-05-16 21:44     ` Lehmann 
2002-05-16 21:44     ` Lehmann 
2002-05-16 23:57       ` Hans Reiser
2002-05-17  0:45         ` Philipp Gühring
2002-05-17  1:06           ` Manuel Krause
2002-05-17 15:21           ` Kuba Ober
2002-05-17  0:17       ` Manuel Krause
2002-05-17 15:04       ` Kuba Ober
2002-05-18 20:40         ` Hans Reiser
2002-05-17 15:05       ` Kuba Ober
2002-05-17 13:10     ` Valdis.Kletnieks
2002-05-17 15:35       ` Kuba Ober
2002-05-16 15:11 Steve Pratt
2002-05-16 15:35 ` Oleg Drokin
2002-05-16 14:52 Steve Pratt
2002-05-16 15:13 ` Hans Reiser
2002-05-15 21:22 Steve Pratt
2002-05-16  5:20 ` Oleg Drokin
2002-05-16  9:42   ` Hans Reiser
2002-05-16 11:40     ` Oleg Drokin
2002-05-16 11:54       ` Hans Reiser
2001-10-10 11:28 Adil EL YOUSSEFI
2001-10-10 12:11 ` David Woodhouse
1999-03-02 13:11 Neil Booth
1999-03-15 18:58 ` Stephen C. Tweedie
1999-03-15 22:46   ` neil
1999-03-16 12:22     ` Stephen C. Tweedie
1999-03-16  2:11   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201005271656.00398.hka@qbs.com.pl \
    --to=hka@qbs.com.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=paul.millar@desy.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.