From: Michael Evans <mjevans1983@gmail.com>
To: linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?
Date: Tue, 2 Mar 2010 02:04:47 -0800 [thread overview]
Message-ID: <4877c76c1003020204r477e942fo8ada66e1e9426295@mail.gmail.com> (raw)
In-Reply-To: <20100302073624.GA28827@maude.comedia.it>
On Mon, Mar 1, 2010 at 11:36 PM, Luca Berra <bluca@comedia.it> wrote:
> On Tue, Mar 02, 2010 at 04:01:00PM +1100, Neil Brown wrote:
>>
>> On Sun, 28 Feb 2010 09:09:49 +0100
>> Luca Berra <bluca@comedia.it> wrote:
>>
>>> On Thu, Feb 25, 2010 at 08:39:36AM +1100, Neil Brown wrote:
>>> >On Wed, 24 Feb 2010 11:12:09 -0500
>>> >"Martin K. Petersen" <martin.petersen@oracle.com> wrote:
>>> >
>>> >> So realistically both disk blocks are wrong and there's a window until
>>> >> the new, correct block is written. That window will only cause
>>> >> problems
>>> >> if there is a crash and we'll need to recover. My main concern here
>>> >> is
>>> >> how big the discrepancy between the disks can get, and whether we'll
>>> >> end
>>> >> up corrupting the filesystem during recovery because we could
>>> >> potentially be matching metadata from one disk with journal entries
>>> >> from
>>> >> another.
>>> >
>>> >After a crash, md will only read from one of the devices (the first)
>>> > until a
>>> >resync has completed. So there should be no room for more confusion
>>> > than you
>>> >would expect on a single device.
>>>
>>> After thinking more about this i could come up with another concern
>>> about write ordering.
>>>
>>> example
>>> app writes block A, B, C
>>> md writes A on both disks
>>> md writes B on disk1
>>> app writes B again (B')
>>> md writes B' on disk2
>>> now md would write B' again on both disks, but the system crashes
>>> (note, C is never written due to crash)
>>>
>>> Disk 1 contains A and B in the correct order, it is missing C and B' but
>>> we
>>> dont care, app should be able to recover from a crash
>>>
>>> Disk 2 contains A and B', but they are wrongly ordered because C is
>>> missing
>>>
>>> If in the above case A and C are data blocks and B contains a journal
>>> related to A and C, booting from disk 2 could result in inconsistent
>>> data.
>>>
>>> can the above really happen?
>>> would using barriers remove the above concern?
>>> am i missing something else?
>>
>> These is no inconsistency here that a filesystem would not equally expect
>> from a single device.
>> After the crash-while-writing B', it should expect to see either B or B',
>> and it does, depending on which device is primary.
>>
>> Nothing to see here.
>
> I will try to explain better,
> the problem is not related to the confusion between B or B'
>
> the problem is that on one disk we have B' _without_ C.
>
> Regards,
> L.
>
> --
> Luca Berra -- bluca@comedia.it
> Communication Media & Services S.r.l.
> /"\
> \ / ASCII RIBBON CAMPAIGN
> X AGAINST HTML MAIL
> / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
You're demanding full atomic commits; this is precisely what journals
and /barriers/ are for.
Are you are bypassing them in a quest for performance and paying for
it on crashes?
Or is this a hardware bug?
Or is it some glitch in the block device layering leading to barrier
requests not being honored?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-02 10:04 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
2010-01-22 18:13 ` Goswin von Brederlow
2010-01-24 17:40 ` Jon Hardcastle
2010-01-24 21:52 ` Roger Heflin
2010-01-24 23:13 ` Goswin von Brederlow
2010-01-25 10:07 ` Jon Hardcastle
2010-01-25 10:37 ` Goswin von Brederlow
2010-01-25 10:52 ` Jon Hardcastle
2010-01-25 17:32 ` Goswin von Brederlow
2010-01-25 19:32 ` Iustin Pop
2010-02-01 21:18 ` Bill Davidsen
2010-02-01 22:37 ` Neil Brown
2010-02-02 15:11 ` Bill Davidsen
2010-02-03 11:17 ` Goswin von Brederlow
2010-02-11 5:14 ` Neil Brown
2010-02-11 17:51 ` Bryan Mesich
2010-02-16 21:25 ` Bill Davidsen
2010-02-16 21:38 ` Steven Haigh
2010-02-17 3:19 ` Bryan Mesich
2010-02-17 23:05 ` Neil Brown
2010-02-19 15:18 ` Piergiorgio Sartor
2010-02-19 22:02 ` Neil Brown
2010-02-19 22:37 ` Piergiorgio Sartor
2010-02-19 23:34 ` Asdo
2010-02-20 4:27 ` Goswin von Brederlow
2010-02-20 11:12 ` Asdo
2010-02-21 11:13 ` Goswin von Brederlow
[not found] ` <8754A21825504719B463AD9809E54349@m5>
[not found] ` <20100221194400.GA2570@lazy.lzy>
2010-02-22 13:01 ` Asdo
2010-02-22 13:30 ` Piergiorgio Sartor
2010-02-22 13:44 ` Piergiorgio Sartor
2010-02-24 19:42 ` Bill Davidsen
2010-02-20 4:23 ` Goswin von Brederlow
2010-02-24 14:54 ` Bill Davidsen
2010-02-24 21:37 ` Neil Brown
2010-02-26 20:48 ` Bill Davidsen
2010-02-26 21:09 ` Neil Brown
2010-02-26 22:01 ` Piergiorgio Sartor
2010-02-26 22:15 ` Bill Davidsen
2010-02-26 22:21 ` Piergiorgio Sartor
2010-02-26 22:20 ` Asdo
2010-02-27 6:01 ` Michael Evans
2010-02-28 0:01 ` Bill Davidsen
2010-02-24 14:46 ` Bill Davidsen
2010-02-24 16:12 ` Martin K. Petersen
2010-02-24 18:51 ` Piergiorgio Sartor
2010-02-24 22:21 ` Neil Brown
2010-02-25 8:41 ` Piergiorgio Sartor
2010-03-02 4:57 ` Neil Brown
2010-03-02 18:49 ` Piergiorgio Sartor
2010-02-24 21:39 ` Neil Brown
[not found] ` <4B8640A2.4060307@shiftmail.org>
2010-02-25 10:41 ` Neil Brown
2010-02-28 8:09 ` Luca Berra
2010-03-02 5:01 ` Neil Brown
2010-03-02 7:36 ` Luca Berra
2010-03-02 10:04 ` Michael Evans [this message]
2010-03-02 11:02 ` Luca Berra
2010-03-02 12:13 ` Michael Evans
2010-03-02 18:14 ` Asdo
2010-03-02 18:52 ` Piergiorgio Sartor
2010-03-02 23:27 ` Asdo
2010-03-03 9:13 ` Piergiorgio Sartor
2010-03-03 11:42 ` Asdo
2010-03-03 12:03 ` Piergiorgio Sartor
2010-03-02 20:17 ` Neil Brown
2010-02-24 21:32 ` Neil Brown
2010-02-25 7:22 ` Goswin von Brederlow
2010-02-25 7:39 ` Neil Brown
2010-02-25 8:47 ` John Robinson
2010-02-25 9:07 ` Neil Brown
2010-02-11 18:12 ` Piergiorgio Sartor
-- strict thread matches above, loose matches on Subject: below --
2010-02-01 23:14 Jon Hardcastle
2010-01-25 20:43 greg
2010-01-25 22:49 ` Steven Haigh
2010-01-27 21:54 ` Tirumala Reddy Marri
2010-01-28 9:16 ` Jon Hardcastle
2010-01-28 10:29 ` Asdo
2010-01-28 17:20 ` Tirumala Reddy Marri
2010-01-28 18:23 ` Goswin von Brederlow
2010-01-28 19:03 ` Tirumala Reddy Marri
2010-01-28 20:24 ` Goswin von Brederlow
2010-01-29 15:37 ` Jon Hardcastle
2010-01-29 23:52 ` Goswin von Brederlow
2010-01-30 10:39 ` Jon Hardcastle
2010-02-01 21:10 ` Bill Davidsen
2010-01-20 15:03 Jon Hardcastle
2010-01-20 15:34 ` Brett Russ
2010-01-20 20:44 ` Majed B.
2010-01-20 22:25 ` Brett Russ
2010-01-20 22:30 ` Majed B.
2010-01-20 22:43 ` Brett Russ
2010-01-20 23:01 ` Christopher Chen
2010-01-21 4:17 ` Steven Haigh
2010-01-21 8:08 ` Asdo
2010-01-21 10:52 ` Steven Haigh
2010-01-21 11:48 ` Farkas Levente
2010-01-21 12:15 ` Jon Hardcastle
2010-01-19 10:04 Jon Hardcastle
2010-01-20 14:19 ` Brett Russ
2010-01-20 14:34 ` Jon Hardcastle
2010-01-20 14:46 ` Brett Russ
2010-02-01 20:48 ` Bill Davidsen
2010-01-22 16:22 ` Jon Hardcastle
2010-01-22 16:34 ` Asdo
2010-01-22 17:41 ` Brett Russ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4877c76c1003020204r477e942fo8ada66e1e9426295@mail.gmail.com \
--to=mjevans1983@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).