All of lore.kernel.org
 help / color / mirror / Atom feed
From: Helge Hafting <helge.hafting@aitel.hist.no>
To: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	device-mapper development <dm-devel@redhat.com>,
	agk@redhat.com, mingo@redhat.com, neilb@suse.de
Subject: Re: Data corruption on software RAID
Date: Tue, 08 Apr 2008 12:22:54 +0200	[thread overview]
Message-ID: <47FB477E.40502@aitel.hist.no> (raw)
In-Reply-To: <Pine.LNX.4.64.0804080033430.13352@artax.karlin.mff.cuni.cz>

Mikulas Patocka wrote:
> Hi
>
> During source code review, I found an unprobable but possible data 
> corruption on RAID-1 and on DM-RAID-1. (I'm not sure about RAID-4,5,6).
>
> The RAID code was enhanced with bitmaps in 2.6.13.
>
> The bitmap tracks regions on the device that may be possibly out-of-sync. 
> The purpose of the bitmap is to avoid resynchronizing the whole array in 
> the case of crash. DM-raid uses similar bitmap too.
>
> The write sequnce is usually:
> 1. turn on bit in the bitmap (if it hasn't been on before).
> 2. update the data.
> 3. when writes to all devices finish, turn the bit may be turned off.
>
> The developers assume that when all writes to the region finish, the 
> region is in-sync.
>
> This assumption is wrong.
>
> Kernel writes data while they may be modified in many places. For example, 
> the pdflush daemon writes periodically pages and buffers without locking 
> them. Similarly, pages may be written while they are mapped for write to 
> the processes.
>
> Normally, there is no problem with modify-while-write. The write sequence 
> is something like:
> * turn off Dirty bit
> * write the buffer or page
> --- and if the buffer or page is modified while it's being written, the 
> Dirty bit is turned on again and the correct data are written later.
>
> But with RAID (since 2.6.13), it can produce corruption because when the 
> buffer is modified while being written, different versions of data can be 
> written to devices in the RAID array. For example:
>
> 1. pdflush turns off a dirty bit on Ext2 bitmap buffer and starts writing 
> the buffer to RAID-1
> 2. the kernel allocates some blocks in that Ext2 bitmap. One of RAID-1 
> devices writes new data, the other one gets old data.
> 3. The kernel turns on the buffer dirty bit, so this buffer is scheduled 
> for next write.
> 4. RAID-1 subsystem sees that both writes finished, it thinks that this 
> region is in-sync, turns off its dirty bit in its region bitmap and writes 
> the bitmap to disk.
>   
Would this help:
RAID-1 sees that both writes finished. It checks the dirty bits on all
relevant buffers/pages. If none got re-dirtied, then it is ok to
turn off the dirty bit in the region bitmap and write that. Otherwise, 
it is not!

Or is such a check too time-consuming?

Helge Hafting

  reply	other threads:[~2008-04-08 10:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-07 23:43 Data corruption on software RAID Mikulas Patocka
2008-04-08 10:22 ` Helge Hafting [this message]
2008-04-08 11:14   ` Mikulas Patocka
2008-04-09 18:33 ` Bill Davidsen
2008-04-09 18:33   ` Bill Davidsen
2008-04-10  3:07   ` Mikulas Patocka
2008-04-10  3:07     ` Mikulas Patocka
2008-04-10 14:21     ` Bill Davidsen
2008-04-11  2:55       ` Mikulas Patocka
2008-04-11  2:55         ` Mikulas Patocka
2008-04-10  6:14 ` Mario 'BitKoenig' Holbe
  -- strict thread matches above, loose matches on Subject: below --
2007-03-18 13:16 Data corruption on software raid Sander Smeenk
2007-03-18 14:02 ` Justin Piszcz
2007-03-18 16:50   ` Bill Davidsen
2007-03-18 17:38     ` Sander Smeenk
     [not found]       ` <45FD870C.3020403@tmr.com>
2007-03-18 22:00         ` Sander Smeenk
2007-03-18 15:17 ` Wolfgang Denk
2007-03-18 17:09 ` Bill Davidsen
2007-03-18 22:16   ` Neil Brown
2007-03-18 22:19 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47FB477E.40502@aitel.hist.no \
    --to=helge.hafting@aitel.hist.no \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=mingo@redhat.com \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.