Re: interesting failure scenario

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Molle Bestefich <molle.bestefich@gmail.com>
To: linux-raid@vger.kernel.org
Subject: Re: interesting failure scenario
Date: Mon, 4 Apr 2005 09:22:55 +0200	[thread overview]
Message-ID: <62b0912f05040400227dab7428@mail.gmail.com> (raw)
In-Reply-To: <62b0912f050404001813448d3d@mail.gmail.com>

Michael Tokarev wrote:
> I just come across an interesting situation, here's the
> scenario.

 [snip] 
 
> Now we have an interesting situation.  Both superblocks in d1
> and d2 are identical, event counts are the same, both are clean.
> Things wich are different:
>    utime - on d1 it is "more recent" (provided we haven't touched
>      the system clock ofcourse)
>    on d1, d2 is marked as faulty
>    on d2, d1 is marked as faulty.
> 
> Neither of the conditions are checked by mdadm.
> 
> So, mdadm just starts a clean RAID1 array composed of two drives
> with different data on them.  And noone noticies this fact (fsck
> which is reading from one disk goes ok), until some time later when
> some app reports data corruption (reading from another disk); you
> go check what's going on, notice there's no data corruption (reading
> from 1st disk), suspects memory and.. it's quite a long list of
> possible bad stuff which can go on here... ;)
> 
> The above scenario is just a theory, but the theory with some quite
> non-null probability.  Instead of hotplugging the disks, one can do
> a reboot having flaky ide/scsi cables or whatnot, so that disks will
> be detected on/off randomly...
> 
> Probably it is a good idea to test utime too, in additional to event
> counters, in mdadm's Assemble.c (as comments says but code disagrees).

Humn, please don't.
 
I rely on MD assembling arrays if their event counters match but the
utimes don't all the time.  Happens quite often that a controller
fails or something like that and you accidentally loose 2 disks in a
raid5.
 
I still want to be able to force the array to be assembled in these cases.
I'm still on 2.4 btw, don't know if there's a better way to do it in
2.6 than manipulating the event counters.
 
(Thinking about it, it would be perfect if the array would instantly
go into read-only mode whenever it is degraded to a non-redundant
state.  That way there's a higher chance of assembling a working array
afterwards?)

next prev parent reply	other threads:[~2005-04-04  7:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-03 21:59 interesting failure scenario Michael Tokarev
     [not found] ` <62b0912f050404001813448d3d@mail.gmail.com>
2005-04-04  7:22   ` Molle Bestefich [this message]
2005-04-04 22:15 ` Luca Berra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62b0912f05040400227dab7428@mail.gmail.com \
    --to=molle.bestefich@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).