Re: What the heck happened to my array?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Brad Campbell <lists2009@fnarfbargle.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: What the heck happened to my array?
Date: Tue, 05 Apr 2011 17:02:43 +0800	[thread overview]
Message-ID: <4D9ADAB3.7040205@fnarfbargle.com> (raw)
In-Reply-To: <20110405161043.00d54901@notabene.brown>

On 05/04/11 14:10, NeilBrown wrote:
 >> - Reboot required to get system back.
 >> - Restarted reshape with 9 drives.
 >> - sdl suffered IO error and was kicked
 >
 > Very sad.

I'd say pretty damn unlucky actually.

 >> - Array froze all IO.
 >
 > Same thing...
 >
 >> - Reboot required to get system back.
 >> - Array will no longer mount with 8/10 drives.
 >> - Mdadm 3.1.5 segfaults when trying to start reshape.
 >
 > Don't know why it would have done that... I cannot reproduce it easily.

No. I tried numerous incantations. The system version of mdadm is Debian 
3.1.4. This segfaulted so I downloaded and compiled 3.1.5 which did the 
same thing. I then composed most of this E-mail, made *really* sure my 
backups were up to date and tried 3.2.1 which to my astonishment worked. 
It's been ticking along _slowly_ ever since.

 >>     Naively tried to run it under gdb to get a backtrace but was unable
 >> to stop it forking
 >
 > Yes, tricky .... an "strace -o /tmp/file -f mdadm ...." might have been
 > enough, but to late to worry about that now.

I wondered about using strace but for some reason got it into my head 
that a gdb backtrace would be more useful. Then of course I got it 
started with 3.2.1 and have not tried again.

 >> - Got array started with mdadm 3.2.1
 >> - Attempted to re-add sdd/sdl (now marked as spares)
 >
 > Hmm... it isn't meant to do that any more.  I thought I fixed it so 
that it
 > if a device looked like part of the array it wouldn't add it as a 
spare...
 > Obviously that didn't work.  I'd better look in to it again.

Now the chain of events that led up to this was along these lines.
- Rebooted machine.
- Tried to --assemble with 3.1.4
- mdadm told me it did not really want to continue with 8/10 devices and 
I should use --force if I really wanted it to try.
- I used --force
- I did a mdadm --add /dev/md0 /dev/sdd and the same for sdl
- I checked and they were listed as spares.

So this was all done with Debian's mdadm 3.1.4, *not* 3.1.5

 >
 > No, you cannot give it extra redundancy.
 > I would suggest:
 >    copy anything that you need off, just in case - if you can.
 >
 >    Kill the mdadm that is running in the back ground.  This will mean 
that
 >    if the machine crashes your array will be corrupted, but you are 
thinking
 >    of rebuilding it any, so that isn't the end of the world.
 >    In /sys/block/md0/md
 >       cat suspend_hi>  suspend_lo
 >       cat component_size>  sync_max
 >
 >    That will allow the reshape to continue without any backup.  It 
will be
 >    much faster (but less safe, as I said).

Well, I have nothing to lose, but I've just picked up some extra drives 
so I'll make second backups and then give this a whirl.

 >    If something goes wrong, you will need to scrap the array, 
recreate it, and
 >    copy data back from where-ever you copied it to (or backups).

I did go into this with the niggling feeling that something bad might 
happen, so I made sure all my backups were up to date before I started. 
No biggie if it does die.

The very odd thing is I did a complete array check, plus SMART long 
tests on all drives literally hours before I started the reshape. Goes 
to show how ropey these large drives can be in big(iash) arrays.

 > If anything there doesn't make sense, or doesn't seem to work - 
please ask.
 >
 > Thanks for the report.  I'll try to get those mdadm issues addressed -
 > particularly if you can get me the mdadm file which caused the segfault.
 >

Well, luckily I preserved the entire build tree then. I was planning on 
running nm over the binary and have a two thumbs type of look into it 
with gdb, but seeing as you probably have a much better idea what you 
are looking for I'll just send you the binary!

Thanks for the help Neil. Much appreciated.

next prev parent reply	other threads:[~2011-04-05  9:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-03 13:32 What the heck happened to my array? (No apparent data loss) Brad Campbell
2011-04-03 15:47 ` Roberto Spadim
2011-04-04  5:59   ` Brad Campbell
2011-04-04 16:49     ` Roberto Spadim
2011-04-05  0:47       ` What the heck happened to my array? Brad Campbell
2011-04-05  6:10         ` NeilBrown
2011-04-05  9:02           ` Brad Campbell [this message]
2011-04-05 11:31             ` NeilBrown
2011-04-05 11:47               ` Brad Campbell
2011-04-08  1:19           ` Brad Campbell
2011-04-08  9:52             ` NeilBrown
2011-04-08 15:27               ` Roberto Spadim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9ADAB3.7040205@fnarfbargle.com \
    --to=lists2009@fnarfbargle.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).