linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Mike Hartman <mike@hartmanipulation.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: New RAID causing system lockups
Date: Sun, 12 Sep 2010 06:43:08 +1000	[thread overview]
Message-ID: <20100912064308.46d96742@notabene> (raw)
In-Reply-To: <AANLkTimTvPupaPuFbfpcCGC=0Ys9fQWihAgaADsrowqq@mail.gmail.com>

On Sat, 11 Sep 2010 14:20:40 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> PART 3:
> 
> Update:
> 
> I'm even more concerned about this now, because I just started the
> newest reshaping to add a new drive with:
> 
> mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0
> 
> And the system output:
> 
> mdadm: Need to backup 768K of critical section..
> 
> cat /proc/mdstat shows the reshaping is proceeding,
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
>       2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU]
>       [>....................]  reshape =  0.0% (56576/1464845568)
> finish=2156.9min speed=11315K/sec
> 
> md1 : active raid0 sdg1[0] sdk1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>
> 
> but I've checked for /grow_md0.bak and it's not there. So it looks
> like for some reason it ignored my backup file option.

It didn't.

When you making an array larger, you only need the backup file for a small
'critical region' at the beginning of the reshape - 768K worth in your case.

Once that is complete the backup-file is not needed and so is removed.

So your current situation is no worse that before.

[When making an array smaller, the critical section happen and the very end,
so mdadm keeps the backup file around - unused - until then.  Then uses it
quickly and completes.  When reshaping an array without changing the size the
'critical section' lasts for the entire time so a backup file is needed and
is very heavily used]

I don't know yet what is causing the lock-up.  A quick look at your logs
suggest that it could be related to the barrier handling.  Maybe trying to
handle a barrier during a reshape is prone to races of some sort - I wouldn't
be very surprised by that.

I'll have a look at the code and see what I can find.

Thanks for the report,
NeilBrown


> 
> This scares me, because if I experience the lockup again and am forced
> to reboot, without a backup file I'm afraid my array will be hosed.
> I'm also afraid to stop it cleanly right now for the same reason.
> 
> So in addition to fixing the lockup itself, does anyone know if
> there's a way to either cancel this reshaping or belatedly add the
> backup file in a different way so it will be recoverable? It's only at
> 1% and says it will take another 2193 minutes.
> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  parent reply	other threads:[~2010-09-11 20:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-11 18:20 New RAID causing system lockups Mike Hartman
2010-09-11 18:45 ` Mike Hartman
2010-09-11 20:43 ` Neil Brown [this message]
2010-09-11 20:56   ` Mike Hartman
2010-09-13  6:28     ` Mike Hartman
2010-09-13 15:57       ` Mike Hartman
2010-09-13 23:51         ` Neil Brown
     [not found]           ` <AANLkTin=jy=xJTtN5mQ6U=rYw3p+_4-nmkhO7zqR0KLP@mail.gmail.com>
2010-09-14  1:11             ` Mike Hartman
2010-09-14  1:35               ` Neil Brown
2010-09-14  2:50                 ` Mike Hartman
2010-09-14  3:35                   ` Mike Hartman
2010-09-14  3:48                     ` Neil Brown
     [not found]                       ` <AANLkTimXabL-TyjqJ81syrx-Oxn50qexbA8q9p22sxJt@mail.gmail.com>
2010-09-15 21:49                         ` Mike Hartman
2010-09-21  2:26                           ` Neil Brown
2010-09-21 11:28                             ` Mike Hartman
  -- strict thread matches above, loose matches on Subject: below --
2010-09-11 18:13 Mike Hartman
2010-09-11 18:12 Mike Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100912064308.46d96742@notabene \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=mike@hartmanipulation.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).