All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brad Campbell <lists2009@fnarfbargle.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: What the heck happened to my array?
Date: Tue, 05 Apr 2011 17:02:43 +0800	[thread overview]
Message-ID: <4D9ADAB3.7040205@fnarfbargle.com> (raw)
In-Reply-To: <20110405161043.00d54901@notabene.brown>

On 05/04/11 14:10, NeilBrown wrote:
 >> - Reboot required to get system back.
 >> - Restarted reshape with 9 drives.
 >> - sdl suffered IO error and was kicked
 >
 > Very sad.

I'd say pretty damn unlucky actually.

 >> - Array froze all IO.
 >
 > Same thing...
 >
 >> - Reboot required to get system back.
 >> - Array will no longer mount with 8/10 drives.
 >> - Mdadm 3.1.5 segfaults when trying to start reshape.
 >
 > Don't know why it would have done that... I cannot reproduce it easily.

No. I tried numerous incantations. The system version of mdadm is Debian 
3.1.4. This segfaulted so I downloaded and compiled 3.1.5 which did the 
same thing. I then composed most of this E-mail, made *really* sure my 
backups were up to date and tried 3.2.1 which to my astonishment worked. 
It's been ticking along _slowly_ ever since.

 >>     Naively tried to run it under gdb to get a backtrace but was unable
 >> to stop it forking
 >
 > Yes, tricky .... an "strace -o /tmp/file -f mdadm ...." might have been
 > enough, but to late to worry about that now.

I wondered about using strace but for some reason got it into my head 
that a gdb backtrace would be more useful. Then of course I got it 
started with 3.2.1 and have not tried again.

 >> - Got array started with mdadm 3.2.1
 >> - Attempted to re-add sdd/sdl (now marked as spares)
 >
 > Hmm... it isn't meant to do that any more.  I thought I fixed it so 
that it
 > if a device looked like part of the array it wouldn't add it as a 
spare...
 > Obviously that didn't work.  I'd better look in to it again.

Now the chain of events that led up to this was along these lines.
- Rebooted machine.
- Tried to --assemble with 3.1.4
- mdadm told me it did not really want to continue with 8/10 devices and 
I should use --force if I really wanted it to try.
- I used --force
- I did a mdadm --add /dev/md0 /dev/sdd and the same for sdl
- I checked and they were listed as spares.

So this was all done with Debian's mdadm 3.1.4, *not* 3.1.5

 >
 > No, you cannot give it extra redundancy.
 > I would suggest:
 >    copy anything that you need off, just in case - if you can.
 >
 >    Kill the mdadm that is running in the back ground.  This will mean 
that
 >    if the machine crashes your array will be corrupted, but you are 
thinking
 >    of rebuilding it any, so that isn't the end of the world.
 >    In /sys/block/md0/md
 >       cat suspend_hi>  suspend_lo
 >       cat component_size>  sync_max
 >
 >    That will allow the reshape to continue without any backup.  It 
will be
 >    much faster (but less safe, as I said).

Well, I have nothing to lose, but I've just picked up some extra drives 
so I'll make second backups and then give this a whirl.

 >    If something goes wrong, you will need to scrap the array, 
recreate it, and
 >    copy data back from where-ever you copied it to (or backups).

I did go into this with the niggling feeling that something bad might 
happen, so I made sure all my backups were up to date before I started. 
No biggie if it does die.

The very odd thing is I did a complete array check, plus SMART long 
tests on all drives literally hours before I started the reshape. Goes 
to show how ropey these large drives can be in big(iash) arrays.

 > If anything there doesn't make sense, or doesn't seem to work - 
please ask.
 >
 > Thanks for the report.  I'll try to get those mdadm issues addressed -
 > particularly if you can get me the mdadm file which caused the segfault.
 >

Well, luckily I preserved the entire build tree then. I was planning on 
running nm over the binary and have a two thumbs type of look into it 
with gdb, but seeing as you probably have a much better idea what you 
are looking for I'll just send you the binary!

Thanks for the help Neil. Much appreciated.


  reply	other threads:[~2011-04-05  9:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-03 13:32 What the heck happened to my array? (No apparent data loss) Brad Campbell
2011-04-03 15:47 ` Roberto Spadim
2011-04-04  5:59   ` Brad Campbell
2011-04-04 16:49     ` Roberto Spadim
2011-04-05  0:47       ` What the heck happened to my array? Brad Campbell
2011-04-05  6:10         ` NeilBrown
2011-04-05  9:02           ` Brad Campbell [this message]
2011-04-05 11:31             ` NeilBrown
2011-04-05 11:47               ` Brad Campbell
2011-04-08  1:19           ` Brad Campbell
2011-04-08  9:52             ` NeilBrown
2011-04-08 15:27               ` Roberto Spadim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9ADAB3.7040205@fnarfbargle.com \
    --to=lists2009@fnarfbargle.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.