From: Neil Brown <neilb@suse.de>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: Steven Haigh <netwiz@crc.id.au>, Bill Davidsen <davidsen@tmr.com>,
Bryan Mesich <bryan.mesich@ndsu.edu>,
Jon@eHardcastle.com, linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?
Date: Sat, 20 Feb 2010 09:02:08 +1100 [thread overview]
Message-ID: <20100220090208.06c1130f@notabene.brown> (raw)
In-Reply-To: <20100219151809.GB4995@lazy.lzy>
On Fri, 19 Feb 2010 16:18:09 +0100
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:
> Hi,
>
> > When memory changes between being written to one device and to another, this
> > does not cause corruption, only inconsistency. Either the block will be
> > written again consistently soon, or it will never be read.
>
> well, is this for sure?
> I mean, by design of the md subsystem.
>
> Or it is like that because we trust the filesystem?
It is because we trust the filesystem.
>
> And why it is like that? Why not to use the good old
> readers-writer mechanism to make sure all blocks are
> the same, when they're are written (namely lock).
md is not in a position to lock the page - there is simply no way it can stop
the filesystem from changing it.
The only thing it could do would be to make a copy, then write the copy out.
This would incur a performance cost.
>
> It seems to me, maybe I'm wrong, not a so safe design.
I think you are wrong.
>
> I assume, it should not be possible to cause this
> situation, unless there is a crash or a bug in the
> md layer.
I'm not sure what situation you are referring to...
>
> What if a new filesystem will write a block, changing
> on the fly, i.e. during RAID-1 writes, and then, later,
> reading this block again?
>
> It will get, maybe, not the correct data.
This is correct. However it would be equally correct if you were talking
about s normal disk drive rather than a RAID1 pair.
If the filesystem changes the page (or allows it to change) while a write is
pending, then it cannot know what actual data was written. So it must write
the block out again before it ever reads it in.
RAID1 is no different to any other device in this respect.
>
> In other words, would it be better, for the md layer,
> to be robust against these kind of threats?
>
Possibly, but at what cost?
There are two ways that I can imagine to 'solve' this issue.
1/ always copy the page before writing. This would incur a significant
overhead, both in the complexity of pre-allocation memory and in the
delay taken to perform the copy. And it would very rarely be actually
needed.
2/ Have the filesystem protect the page from changes while it is being
written. This is quite possible for the filesystem to do (while it
is impossible for md to do). There could be some performance
cost with memory-mapped pages as they would need to be unmapped,
but there would be no significant cost for reads, writes, and filesystem
metadata operations.
Further, any filesystem that wants to make use of the integrity checks
that newer drives provide (where the filesystem provides a 'checksum' for
the block which gets passed all the way down and written to storage, and
returned on a read) will need to do this anyway. So it is likely the in
the near future all significant filesystems will provide all the
guarantees md needs or order to simply do nothing different.
So my feeling is that md is doing the best thing already.
I believe 'swap' will always be an issue as unmapping swap pages during write
could be a serious performance cost. It might be that the best thing to do
with swap is to somehow mark the area of an array used for swap as "don't
care" so md never bothers to resync it, and never reports inconsistencies
there, as they really are not an issue.
NeilBrown
next prev parent reply other threads:[~2010-02-19 22:02 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
2010-01-22 18:13 ` Goswin von Brederlow
2010-01-24 17:40 ` Jon Hardcastle
2010-01-24 21:52 ` Roger Heflin
2010-01-24 23:13 ` Goswin von Brederlow
2010-01-25 10:07 ` Jon Hardcastle
2010-01-25 10:37 ` Goswin von Brederlow
2010-01-25 10:52 ` Jon Hardcastle
2010-01-25 17:32 ` Goswin von Brederlow
2010-01-25 19:32 ` Iustin Pop
2010-02-01 21:18 ` Bill Davidsen
2010-02-01 22:37 ` Neil Brown
2010-02-02 15:11 ` Bill Davidsen
2010-02-03 11:17 ` Goswin von Brederlow
2010-02-11 5:14 ` Neil Brown
2010-02-11 17:51 ` Bryan Mesich
2010-02-16 21:25 ` Bill Davidsen
2010-02-16 21:38 ` Steven Haigh
2010-02-17 3:19 ` Bryan Mesich
2010-02-17 23:05 ` Neil Brown
2010-02-19 15:18 ` Piergiorgio Sartor
2010-02-19 22:02 ` Neil Brown [this message]
2010-02-19 22:37 ` Piergiorgio Sartor
2010-02-19 23:34 ` Asdo
2010-02-20 4:27 ` Goswin von Brederlow
2010-02-20 11:12 ` Asdo
2010-02-21 11:13 ` Goswin von Brederlow
[not found] ` <8754A21825504719B463AD9809E54349@m5>
[not found] ` <20100221194400.GA2570@lazy.lzy>
2010-02-22 13:01 ` Asdo
2010-02-22 13:30 ` Piergiorgio Sartor
2010-02-22 13:44 ` Piergiorgio Sartor
2010-02-24 19:42 ` Bill Davidsen
2010-02-20 4:23 ` Goswin von Brederlow
2010-02-24 14:54 ` Bill Davidsen
2010-02-24 21:37 ` Neil Brown
2010-02-26 20:48 ` Bill Davidsen
2010-02-26 21:09 ` Neil Brown
2010-02-26 22:01 ` Piergiorgio Sartor
2010-02-26 22:15 ` Bill Davidsen
2010-02-26 22:21 ` Piergiorgio Sartor
2010-02-26 22:20 ` Asdo
2010-02-27 6:01 ` Michael Evans
2010-02-28 0:01 ` Bill Davidsen
2010-02-24 14:46 ` Bill Davidsen
2010-02-24 16:12 ` Martin K. Petersen
2010-02-24 18:51 ` Piergiorgio Sartor
2010-02-24 22:21 ` Neil Brown
2010-02-25 8:41 ` Piergiorgio Sartor
2010-03-02 4:57 ` Neil Brown
2010-03-02 18:49 ` Piergiorgio Sartor
2010-02-24 21:39 ` Neil Brown
[not found] ` <4B8640A2.4060307@shiftmail.org>
2010-02-25 10:41 ` Neil Brown
2010-02-28 8:09 ` Luca Berra
2010-03-02 5:01 ` Neil Brown
2010-03-02 7:36 ` Luca Berra
2010-03-02 10:04 ` Michael Evans
2010-03-02 11:02 ` Luca Berra
2010-03-02 12:13 ` Michael Evans
2010-03-02 18:14 ` Asdo
2010-03-02 18:52 ` Piergiorgio Sartor
2010-03-02 23:27 ` Asdo
2010-03-03 9:13 ` Piergiorgio Sartor
2010-03-03 11:42 ` Asdo
2010-03-03 12:03 ` Piergiorgio Sartor
2010-03-02 20:17 ` Neil Brown
2010-02-24 21:32 ` Neil Brown
2010-02-25 7:22 ` Goswin von Brederlow
2010-02-25 7:39 ` Neil Brown
2010-02-25 8:47 ` John Robinson
2010-02-25 9:07 ` Neil Brown
2010-02-11 18:12 ` Piergiorgio Sartor
-- strict thread matches above, loose matches on Subject: below --
2010-02-01 23:14 Jon Hardcastle
2010-01-25 20:43 greg
2010-01-25 22:49 ` Steven Haigh
2010-01-27 21:54 ` Tirumala Reddy Marri
2010-01-28 9:16 ` Jon Hardcastle
2010-01-28 10:29 ` Asdo
2010-01-28 17:20 ` Tirumala Reddy Marri
2010-01-28 18:23 ` Goswin von Brederlow
2010-01-28 19:03 ` Tirumala Reddy Marri
2010-01-28 20:24 ` Goswin von Brederlow
2010-01-29 15:37 ` Jon Hardcastle
2010-01-29 23:52 ` Goswin von Brederlow
2010-01-30 10:39 ` Jon Hardcastle
2010-02-01 21:10 ` Bill Davidsen
2010-01-20 15:03 Jon Hardcastle
2010-01-20 15:34 ` Brett Russ
2010-01-20 20:44 ` Majed B.
2010-01-20 22:25 ` Brett Russ
2010-01-20 22:30 ` Majed B.
2010-01-20 22:43 ` Brett Russ
2010-01-20 23:01 ` Christopher Chen
2010-01-21 4:17 ` Steven Haigh
2010-01-21 8:08 ` Asdo
2010-01-21 10:52 ` Steven Haigh
2010-01-21 11:48 ` Farkas Levente
2010-01-21 12:15 ` Jon Hardcastle
2010-01-19 10:04 Jon Hardcastle
2010-01-20 14:19 ` Brett Russ
2010-01-20 14:34 ` Jon Hardcastle
2010-01-20 14:46 ` Brett Russ
2010-02-01 20:48 ` Bill Davidsen
2010-01-22 16:22 ` Jon Hardcastle
2010-01-22 16:34 ` Asdo
2010-01-22 17:41 ` Brett Russ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100220090208.06c1130f@notabene.brown \
--to=neilb@suse.de \
--cc=Jon@eHardcastle.com \
--cc=bryan.mesich@ndsu.edu \
--cc=davidsen@tmr.com \
--cc=linux-raid@vger.kernel.org \
--cc=netwiz@crc.id.au \
--cc=piergiorgio.sartor@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).