From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
linux-scsi@vger.kernel.org,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Bernd Schubert <bernd.schubert@fastmail.fm>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Sven Breuner <sven.breuner@itwm.fraunhofer.de>,
Chuck Lever <chuck.lever@oracle.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Gregory Farnum <gregory.farnum@dreamhost.com>,
lsf-pc@lists.linux-foundation.org,
Chris Mason <chris.mason@oracle.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] end-to-end data and metadata corruption detection
Date: Thu, 02 Feb 2012 23:52:42 +0100 [thread overview]
Message-ID: <4F2B13BA.1080804@itwm.fraunhofer.de> (raw)
In-Reply-To: <20120202192643.GC5873@redhat.com>
On 02/02/2012 08:26 PM, Andrea Arcangeli wrote:
> On Thu, Feb 02, 2012 at 10:04:59AM +0100, Bernd Schubert wrote:
>> I think the point for network file systems is that they can reuse the
>> disk-checksum for network verification. So instead of calculating a
>> checksum for network and disk, just use one for both. The checksum also
>> is supposed to be cached in memory, as that avoids re-calculation for
>> other clients.
>>
>> 1)
>> client-1: sends data and checksum
>>
>> server: Receives those data and verifies the checksum -> network
>> transfer was ok, sends data and checksum to disk
>>
>> 2)
>> client-2 ... client-N: Ask for those data
>>
>> server: send cached data and cached checksum
>>
>> client-2 ... client-N: Receive data and verify checksum
>>
>>
>> So the hole point of caching checksums is to avoid the server needs to
>> recalculate those for dozens of clients. Recalculating checksums simply
>> does not scale with an increasing number of clients, which want to read
>> data processed by another client.
>
> This makes sense indeed. My argument was only about the exposure of
> the storage hw format cksum to userland (through some new ioctl for
> further userland verification of the pagecache data in the client
> pagecache, done by whatever program is reading from the cache). The
> network fs client lives in kernel, the network fs server lives in
> kernel, so no need to expose the cksum to userland to do what you
> described above.
>
> I meant if we can't trust the pagecache to be correct (after the
> network fs client code already checked the cksum cached by the server
> and sent to the client along the server cached data), I don't see much
> value added through a further verification by the userland program
> running on the client and accessing pagecache in the client. If we
> can't trust client pagecache to be safe against memory bitflips or
> software bugs, we can hardly trust the anonymous memory too.
Well, now it gets a bit troublesome - not all file systems are in kernel
space. FhGFS uses kernel clients, but has user space daemons. I think
Ceph does it similarly. And although I'm not sure about the roadmap of
Gluster and if data verification is planned at all, but if it would like
to do that, even the clients would need get access to the checksums in
user space.
Now lets assume we ignore user space clients for now, what about using
the splice interface to also send checksums? So as basic concept file
systems servers are not interested at all about the real data, but only
do the management between disk and network. So a possible solution to
not expose checksums to user space daemons is to simply not expose data
to the servers at all. However, in that case the server side kernel
would need to do the checksum verification, so even for user space daemons.
Remaining issue with splice is that splice does not work with
inifiniband-ibverbs due to the missing socket fd.
Another solution that also might work is to expose checksums only
read-only to user space.
Cheers,
Bernd
next prev parent reply other threads:[~2012-02-02 22:52 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-17 20:15 [LSF/MM TOPIC] end-to-end data and metadata corruption detection Chuck Lever
2012-01-26 12:31 ` Bernd Schubert
2012-01-26 14:53 ` Martin K. Petersen
2012-01-26 16:27 ` Bernd Schubert
2012-01-26 23:21 ` James Bottomley
2012-01-31 19:16 ` Bernd Schubert
2012-01-31 19:21 ` Chuck Lever
2012-01-31 20:04 ` Martin K. Petersen
2012-01-31 2:10 ` Martin K. Petersen
2012-01-31 19:22 ` Bernd Schubert
2012-01-31 19:28 ` Gregory Farnum
2012-02-01 16:45 ` [Lsf-pc] " Chris Mason
2012-02-01 16:52 ` James Bottomley
2012-02-01 17:41 ` Chris Mason
2012-02-01 17:59 ` Bernd Schubert
2012-02-01 18:16 ` James Bottomley
2012-02-01 18:30 ` Andrea Arcangeli
2012-02-02 9:04 ` Bernd Schubert
2012-02-02 19:26 ` Andrea Arcangeli
2012-02-02 19:46 ` Andreas Dilger
2012-02-02 22:52 ` Bernd Schubert [this message]
2012-02-01 18:15 ` Martin K. Petersen
2012-02-01 23:03 ` Boaz Harrosh
[not found] ` <DE0353DF-83EA-480E-9C42-1EE760D6EE41@dilger.ca>
2012-01-31 2:22 ` Martin K. Petersen
2012-01-26 15:36 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F2B13BA.1080804@itwm.fraunhofer.de \
--to=bernd.schubert@itwm.fraunhofer.de \
--cc=James.Bottomley@HansenPartnership.com \
--cc=aarcange@redhat.com \
--cc=bernd.schubert@fastmail.fm \
--cc=chris.mason@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=gregory.farnum@dreamhost.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.petersen@oracle.com \
--cc=sven.breuner@itwm.fraunhofer.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).