interesting failure scenario

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* interesting failure scenario
@ 2005-04-03 21:59 Michael Tokarev
       [not found] ` <62b0912f050404001813448d3d@mail.gmail.com>
  2005-04-04 22:15 ` Luca Berra
  0 siblings, 2 replies; 3+ messages in thread
From: Michael Tokarev @ 2005-04-03 21:59 UTC (permalink / raw)
  To: linux-raid

I just come across an interesting situation, here's the
scenario.

0. Have a RAID1 array composed of two components, d1 and d2.
   The array was running, clean, event counter was 10.
1. d1 failed (eg, hotplug-removed).
2. on d2's superblock we now have event=11, and d1 is marked
    as failed.
3. the array is running in degraded mode. Some writes happened.
4. stop the array => d2 event counter = 12, clean.
5. hotplug-remove d2, hotplug-add d1.
6. start the array.  Now it is started off from d1, which is
   clean with event count = 10.  Since d2 is unaccessible, it
   is marked as faulty in d1's superblock.  The whole operation
   changes event count to 11.
7. do some writes to the array which is running on degraded
   mode (writing to d1).
8. Stop the array.  Event count on d1 is set to 12.
9. Hotplug-add d2 back.

Now we have an interesting situation.  Both superblocks in d1
and d2 are identical, event counts are the same, both are clean.
Things wich are different:
   utime - on d1 it is "more recent" (provided we haven't touched
     the system clock ofcourse)
   on d1, d2 is marked as faulty
   on d2, d1 is marked as faulty.

Neither of the conditions are checked by mdadm.

So, mdadm just starts a clean RAID1 array composed of two drives
with different data on them.  And noone noticies this fact (fsck
which is reading from one disk goes ok), until some time later when
some app reports data corruption (reading from another disk); you
go check what's going on, notice there's no data corruption (reading
from 1st disk), suspects memory and.. it's quite a long list of
possible bad stuff which can go on here... ;)

The above scenario is just a theory, but the theory with some quite
non-null probability.  Instead of hotplugging the disks, one can do
a reboot having flaky ide/scsi cables or whatnot, so that disks will
be detected on/off randomly...

Probably it is a good idea to test utime too, in additional to event
counters, in mdadm's Assemble.c (as comments says but code disagrees).
Maybe list of faulty components in every superblock too.  And refuse
to assemble the array if an inconsistency like this is detected
(unelss --force is specified)...  But the logic becomes quite..
problematic...

/mjt

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: interesting failure scenario
       [not found] ` <62b0912f050404001813448d3d@mail.gmail.com>
@ 2005-04-04  7:22   ` Molle Bestefich
  0 siblings, 0 replies; 3+ messages in thread
From: Molle Bestefich @ 2005-04-04  7:22 UTC (permalink / raw)
  To: linux-raid

Michael Tokarev wrote:
> I just come across an interesting situation, here's the
> scenario.

 [snip] 
 
> Now we have an interesting situation.  Both superblocks in d1
> and d2 are identical, event counts are the same, both are clean.
> Things wich are different:
>    utime - on d1 it is "more recent" (provided we haven't touched
>      the system clock ofcourse)
>    on d1, d2 is marked as faulty
>    on d2, d1 is marked as faulty.
> 
> Neither of the conditions are checked by mdadm.
> 
> So, mdadm just starts a clean RAID1 array composed of two drives
> with different data on them.  And noone noticies this fact (fsck
> which is reading from one disk goes ok), until some time later when
> some app reports data corruption (reading from another disk); you
> go check what's going on, notice there's no data corruption (reading
> from 1st disk), suspects memory and.. it's quite a long list of
> possible bad stuff which can go on here... ;)
> 
> The above scenario is just a theory, but the theory with some quite
> non-null probability.  Instead of hotplugging the disks, one can do
> a reboot having flaky ide/scsi cables or whatnot, so that disks will
> be detected on/off randomly...
> 
> Probably it is a good idea to test utime too, in additional to event
> counters, in mdadm's Assemble.c (as comments says but code disagrees).

Humn, please don't.
 
I rely on MD assembling arrays if their event counters match but the
utimes don't all the time.  Happens quite often that a controller
fails or something like that and you accidentally loose 2 disks in a
raid5.
 
I still want to be able to force the array to be assembled in these cases.
I'm still on 2.4 btw, don't know if there's a better way to do it in
2.6 than manipulating the event counters.
 
(Thinking about it, it would be perfect if the array would instantly
go into read-only mode whenever it is degraded to a non-redundant
state.  That way there's a higher chance of assembling a working array
afterwards?)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: interesting failure scenario
  2005-04-03 21:59 interesting failure scenario Michael Tokarev
       [not found] ` <62b0912f050404001813448d3d@mail.gmail.com>
@ 2005-04-04 22:15 ` Luca Berra
  1 sibling, 0 replies; 3+ messages in thread
From: Luca Berra @ 2005-04-04 22:15 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 04, 2005 at 01:59:09AM +0400, Michael Tokarev wrote:
>I just come across an interesting situation, here's the
>scenario.
>
C'mon, there's plenty of ways for you to shoot yourself in the feet. :)

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-04-04 22:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-03 21:59 interesting failure scenario Michael Tokarev
     [not found] ` <62b0912f050404001813448d3d@mail.gmail.com>
2005-04-04  7:22   ` Molle Bestefich
2005-04-04 22:15 ` Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).