All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Mike Hartman <mike@hartmanipulation.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: New RAID causing system lockups
Date: Tue, 21 Sep 2010 12:26:44 +1000	[thread overview]
Message-ID: <20100921122644.517fcab0@notabene> (raw)
In-Reply-To: <AANLkTinYaE07MzZ7LSPY18jtrsSdvj5sf7rnDqGEQP=z@mail.gmail.com>

On Wed, 15 Sep 2010 17:49:44 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> >> Hmmm..
> >>  Can you try mounting with
> >>    -o barrier=0
> >>
> >> just to see if my theory is at all correct?
> >>
> >> Thanks,
> >> NeilBrown
> >>
> >
> 
> Progress report:
> 
> I made the barrier change shortly after sending my last message (about
> 40 hours ago). With that in place, I was able to finish emptying one
> of the non-assimilated drives onto the array, after which I added that
> drive as a hot spare and started the process to grow the array onto it
> - the same procedure I've been applying since I created the RAID the
> other week. No problems so far, and the reshape is at 46%.
> 
> It's hard to be positive that the barrier deactivation is responsible
> yet though - while the last few lockups have only been 1-16 hours
> apart, I believe the first two had at least 2 or 3 days between them.
> I'll keep the array busy to enhance the chances of a lockup though -
> each one so far has been during a reshape or a large batch of writing
> to the array's partition. If I make it another couple days (meaning
> time for this reshape to complete, another drive to be emptied onto
> the array, and another reshape at least started) I'll be pretty
> confident the problem has been identified.

Thanks for the update.

> 
> Assuming the barrier is the culprit (and I'm pretty sure you're right)
> what are the consequences of just leaving it off? I gather the idea of
> the barrier is to prevent journal corruption in the event of a power
> failure or other sudden shutdown, which seems pretty important, but it
> also doesn't seem like it was enabled by default in ext3/4 until 2008,
> which makes it seem less critical.

Correct.  Without the barriers the chance of corruption during powerfail is
higher.  I don't really know how much higher, it depends a lot on the
filesystem design and the particular implementation.  I think ext4 tends to
be fairly safe - after all some devices don't support barriers and it has to
do best-effort on those too.

> 
> Even if the ultimate solution for me is to just leave it disabled I'm
> happy to keep trying patches if you want to get it properly fixed in
> md. We may have to come up with an alternate way to work the array
> hard enough to trigger the lockups though - my last 1.5TB drive is
> what's being merged in now. After that completes I only have one more
> pair of 750GBs (that will have to be shoehorned in using RAID0 again).
> I do have a single 750GB left over, so I'll probably find a mate for
> it and get it added to. After that we're maxed out on hardware for a
> while.
> 
> Mike

I'll stare at the code a bit more and see if anything jumps out at me.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-09-21  2:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-11 18:20 New RAID causing system lockups Mike Hartman
2010-09-11 18:45 ` Mike Hartman
2010-09-11 20:43 ` Neil Brown
2010-09-11 20:56   ` Mike Hartman
2010-09-13  6:28     ` Mike Hartman
2010-09-13 15:57       ` Mike Hartman
2010-09-13 23:51         ` Neil Brown
     [not found]           ` <AANLkTin=jy=xJTtN5mQ6U=rYw3p+_4-nmkhO7zqR0KLP@mail.gmail.com>
2010-09-14  1:11             ` Mike Hartman
2010-09-14  1:35               ` Neil Brown
2010-09-14  2:50                 ` Mike Hartman
2010-09-14  3:35                   ` Mike Hartman
2010-09-14  3:48                     ` Neil Brown
     [not found]                       ` <AANLkTimXabL-TyjqJ81syrx-Oxn50qexbA8q9p22sxJt@mail.gmail.com>
2010-09-15 21:49                         ` Mike Hartman
2010-09-21  2:26                           ` Neil Brown [this message]
2010-09-21 11:28                             ` Mike Hartman
  -- strict thread matches above, loose matches on Subject: below --
2010-09-11 18:13 Mike Hartman
2010-09-11 18:12 Mike Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100921122644.517fcab0@notabene \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=mike@hartmanipulation.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.