Re: Raid-6 hang on write.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Brad Campbell <brad@wasp.net.au>
To: Brad Campbell <brad@wasp.net.au>
Cc: lkml <linux-kernel@vger.kernel.org>,
	RAID Linux <linux-raid@vger.kernel.org>
Subject: Re: Raid-6 hang on write.
Date: Fri, 25 Feb 2005 19:37:13 +0400	[thread overview]
Message-ID: <421F4629.5080309@wasp.net.au> (raw)
In-Reply-To: <421DE9A9.4090902@wasp.net.au>

Brad Campbell wrote:
> G'day all,
> 
> I have a painful issue with a RAID-6 box. It only manifests itself on a 
> fully complete and synced up array, and I can't reproduce it on an array 
> smaller than the entire drives which means after every attempt at 
> debugging I have to endure a 12 hour resync before I try again.
> 

Having done a dodgy and managed to get a clean set of superblocks to play with, I have now managed 
to narrow it down a lot more.

It's going to lunch in get_active_stripe at
                                 wait_event_lock_irq(conf->wait_for_stripe,
                                                     !list_empty(&conf->inactive_list) &&
                                                     (atomic_read(&conf->active_stripes) < 
(NR_STRIPES *3/4)
                                                      || !conf->inactive_blocked),
                                                     conf->device_lock,
                                                     unplug_slaves(conf->mddev);
                                         );

Feb 25 16:50:06 storage1 kernel: conf->active_stripes 256
Feb 25 16:50:06 storage1 kernel: conf->inactive_blocked 1
Feb 25 16:50:06 storage1 kernel: list_empty(conf->inactive_list) 1
Feb 25 16:50:06 storage1 kernel: NR_STRIPES *3/4 192

So it appears it's unable to get a free stripe, and it just sits with the device_lock held, which 
prevents anything else from happening.

On further investigation it occurs when raid6d is calls md_check_recovery and this never returns, 
preventing raid6d from handling any stripes.

Turning on debugging in raid6main.c and md.c make it much harder to hit. So I'm assuming something 
timing related.

raid6d --> md_check_recovery --> generic_make_request --> make_request --> get_active_stripe

We are now out of stripes. Deadlock here. I put some debugging counters in md_check_recovery and it 
calls generic_make_request ~244 times and then deadlocks. (Depending on how many stripes were free 
before of course)

I have tried just increasing the number of stripes to 2048 but that just took longer to hit and when 
it does the machine hard locks. (Whereas at least with 256 it just locks the md subsystem)

I'm now at a loss.

I guess the main issue is lots of drives on a slow IO bus, but there must be something I'm missing.
Pointers or thumps with the clue bat would be appreciated.

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

next prev parent reply	other threads:[~2005-02-25 15:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-24 14:50 Raid-6 hang on write Brad Campbell
2005-02-25 15:37 ` Brad Campbell [this message]
2005-02-28  5:49   ` Neil Brown
2005-03-01  5:34     ` Brad Campbell
2005-03-01  6:58     ` Brad Campbell
2005-03-01  9:18       ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=421F4629.5080309@wasp.net.au \
    --to=brad@wasp.net.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).