All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Lessem <Jeff@Lessem.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "BERTRAND Joël" <joel.bertrand@systella.fr>,
	"Justin Piszcz" <jpiszcz@lucidpixels.com>,
	"Neil Brown" <neilb@suse.de>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state
Date: Tue, 06 Nov 2007 22:00:03 -0700	[thread overview]
Message-ID: <47314653.80905@Lessem.org> (raw)
In-Reply-To: <1194398700.2970.18.camel@dwillia2-linux.ch.intel.com>

Dan Williams wrote:
 > The following patch, also attached, cleans up cases where the code looks
 > at sh->ops.pending when it should be looking at the consistent
 > stack-based snapshot of the operations flags.

I tried this patch (against a stock 2.6.23), and it did not work for
me.  Not only did I/O to the effected RAID5 & XFS partition stop, but
also I/O to all other disks.  I was not able to capture any debugging
information, but I should be able to do that tomorrow when I can hook
a serial console to the machine.

I'm not sure if my problem is identical to these others, as mine only
seems to manifest with RAID5+XFS.  The RAID rebuilds with no problem,
and I've not had any problems with RAID5+ext3.

 >
 >
 > ---
 >
 >  drivers/md/raid5.c |   16 +++++++++-------
 >  1 files changed, 9 insertions(+), 7 deletions(-)
 >
 > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
 > index 496b9a3..e1a3942 100644
 > --- a/drivers/md/raid5.c
 > +++ b/drivers/md/raid5.c
 > @@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
 >  }
 >
 >  static struct dma_async_tx_descriptor *
 > -ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
 > +ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx,
 > +		 unsigned long pending)
 >  {
 >  	int disks = sh->disks;
 >  	int pd_idx = sh->pd_idx, i;
 > @@ -701,7 +702,7 @@ ops_run_biodrain(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
 >  	/* check if prexor is active which means only process blocks
 >  	 * that are part of a read-modify-write (Wantprexor)
 >  	 */
 > -	int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
 > +	int prexor = test_bit(STRIPE_OP_PREXOR, &pending);
 >
 >  	pr_debug("%s: stripe %llu\n", __FUNCTION__,
 >  		(unsigned long long)sh->sector);
 > @@ -778,7 +779,8 @@ static void ops_complete_write(void *stripe_head_ref)
 >  }
 >
 >  static void
 > -ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
 > +ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx,
 > +		unsigned long pending)
 >  {
 >  	/* kernel stack size limits the total number of disks */
 >  	int disks = sh->disks;
 > @@ -786,7 +788,7 @@ ops_run_postxor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
 >
 >  	int count = 0, pd_idx = sh->pd_idx, i;
 >  	struct page *xor_dest;
 > -	int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
 > +	int prexor = test_bit(STRIPE_OP_PREXOR, &pending);
 >  	unsigned long flags;
 >  	dma_async_tx_callback callback;
 >
 > @@ -813,7 +815,7 @@ ops_run_postxor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
 >  	}
 >
 >  	/* check whether this postxor is part of a write */
 > -	callback = test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ?
 > +	callback = test_bit(STRIPE_OP_BIODRAIN, &pending) ?
 >  		ops_complete_write : ops_complete_postxor;
 >
 >  	/* 1/ if we prexor'd then the dest is reused as a source
 > @@ -901,12 +903,12 @@ static void raid5_run_ops(struct stripe_head *sh, 
unsigned long pending)
 >  		tx = ops_run_prexor(sh, tx);
 >
 >  	if (test_bit(STRIPE_OP_BIODRAIN, &pending)) {
 > -		tx = ops_run_biodrain(sh, tx);
 > +		tx = ops_run_biodrain(sh, tx, pending);
 >  		overlap_clear++;
 >  	}
 >
 >  	if (test_bit(STRIPE_OP_POSTXOR, &pending))
 > -		ops_run_postxor(sh, tx);
 > +		ops_run_postxor(sh, tx, pending);
 >
 >  	if (test_bit(STRIPE_OP_CHECK, &pending))
 >  		ops_run_check(sh);
 >
 >


  reply	other threads:[~2007-11-07  5:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-04 12:03 2.6.23.1: mdadm/raid5 hung/d-state Justin Piszcz
2007-11-04 12:39 ` 2.6.23.1: mdadm/raid5 hung/d-state (md3_raid5 stuck in endless loop?) Justin Piszcz
2007-11-04 12:48 ` 2.6.23.1: mdadm/raid5 hung/d-state Michael Tokarev
2007-11-04 12:52   ` Justin Piszcz
2007-11-04 14:55     ` Michael Tokarev
2007-11-04 14:59       ` Justin Piszcz
2007-11-04 18:17       ` BERTRAND Joël
2007-11-04 21:40       ` David Greaves
2007-11-04 13:40 ` BERTRAND Joël
2007-11-04 13:42   ` Justin Piszcz
2007-11-04 21:49 ` Neil Brown
2007-11-04 21:51   ` Justin Piszcz
2007-11-05 18:35     ` Dan Williams
2007-11-05 18:35       ` Dan Williams
2007-11-05 18:35       ` Justin Piszcz
2007-11-06  0:19         ` Dan Williams
2007-11-06 10:19           ` BERTRAND Joël
2007-11-06 11:29             ` Justin Piszcz
2007-11-06 11:39               ` BERTRAND Joël
2007-11-06 11:39                 ` BERTRAND Joël
2007-11-06 11:42                 ` Justin Piszcz
2007-11-06 12:20                   ` BERTRAND Joël
2007-11-06 12:20                     ` BERTRAND Joël
2007-11-07  1:25             ` Dan Williams
2007-11-07  5:00               ` Jeff Lessem [this message]
2007-11-08 17:45                 ` Bill Davidsen
2007-11-08 18:02                   ` Dan Williams
2007-11-09 20:36                     ` Jeff Lessem
2007-11-08 21:40                 ` Carlos Carvalho
2007-11-09  9:14                   ` Justin Piszcz
2007-11-09 14:09                     ` Fabiano Silva
2007-11-07 11:20               ` BERTRAND Joël
2007-11-07 11:20                 ` BERTRAND Joël
2007-11-06 23:18       ` Jeff Lessem
2007-11-05  8:36   ` BERTRAND Joël
2007-11-07 16:39     ` Chuck Ebbert
2007-11-07 16:39       ` Chuck Ebbert
2007-11-07 16:48       ` BERTRAND Joël
2007-11-07 16:48         ` BERTRAND Joël
2007-11-08 11:42         ` BERTRAND Joël
2007-11-08 11:42           ` BERTRAND Joël
2007-11-08 12:44           ` Justin Piszcz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47314653.80905@Lessem.org \
    --to=jeff@lessem.org \
    --cc=dan.j.williams@intel.com \
    --cc=joel.bertrand@systella.fr \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.