All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Dan Williams <djbw@fb.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	Vinod Koul <vinod.koul@intel.com>,
	Tomasz Figa <t.figa@samsung.com>,
	Kyungmin Park <kyungmin.park@samsung.com>
Subject: Re: [PATCH] raid5: panic() on dma_wait_for_async_tx() error
Date: Tue, 20 Nov 2012 09:18:24 +1100	[thread overview]
Message-ID: <20121120091824.58ed4565@notabene.brown> (raw)
In-Reply-To: <84A937D219C2B44EB8EA44831ACA1E49166C741F@SC-MBX02-3.TheFacebook.com>

[-- Attachment #1: Type: text/plain, Size: 2402 bytes --]

On Mon, 19 Nov 2012 05:22:25 +0000 Dan Williams <djbw@fb.com> wrote:

> 
> 
> On 11/18/12 5:06 PM, "NeilBrown" <neilb@suse.de> wrote:
> 
> >
> >Hi Dan,
> > could you comment on this please?  Would it make sense to arrange for
> >errors
> > to propagate up?  Or should we arrange to do a software-fallback in the
> >dma
> > engine is a problem?  What sort of things can cause error here anyway?
> 
> Propagating up is missing reliable "dma abort" operation.
> 
> In these cases the engine failed to complete due to hardware hang / driver
> bug, or has hit a memory error (uncorrectable even with software
> fallback).  This originally should have been using async_tx_quiesce()
> which also does the panic.
> 
> The engines that I have worked with have either lacked support for
> aborting, or were otherwise unable to recover from a hardware hang.
> However, for engines that do support error recovery they should be able to
> hide the failure from the upper layers.
>

So maybe I could:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ac09fa4..ffbf0ca 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3268,7 +3268,7 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
 	/* done submitting copies, wait for them to complete */
 	if (tx) {
 		async_tx_ack(tx);
-		dma_wait_for_async_tx(tx);
+		async_tx_quiesce(&tx);
 	}
 }
 


and then the panic would be somebody else's problem?

I note that handle_stripe_expansion has:

 		async_tx_ack(tx);
		dma_wait_for_async_tx(tx);

while async_tx_quiesce() has:

		if (dma_wait_for_async_tx(*tx) == DMA_ERROR)
			panic("DMA_ERROR waiting for transaction\n");
		async_tx_ack(*tx);


i.e. the same two functions called in the reverse order.  Is the order
important?  Is handle_stripe_expansion wrong?   Should the patch I apply
actually be:


diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ac09fa4..e51d903 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3266,10 +3266,7 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
 
 		}
 	/* done submitting copies, wait for them to complete */
-	if (tx) {
-		async_tx_ack(tx);
-		dma_wait_for_async_tx(tx);
-	}
+	async_tx_quiesce(&tx);
 }
 
 /*


because async_tx_quiesce() does the NULL test too???

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-11-19 22:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-08 10:06 [PATCH] raid5: panic() on dma_wait_for_async_tx() error Bartlomiej Zolnierkiewicz
2012-11-08 11:15 ` Alan Cox
2012-11-08 11:20   ` Bartlomiej Zolnierkiewicz
2012-11-19  1:06     ` NeilBrown
2012-11-19  5:22       ` Dan Williams
2012-11-19 22:18         ` NeilBrown [this message]
2012-11-20  2:23           ` Dan Williams
2012-11-20  3:13             ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121120091824.58ed4565@notabene.brown \
    --to=neilb@suse.de \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=b.zolnierkie@samsung.com \
    --cc=djbw@fb.com \
    --cc=kyungmin.park@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=t.figa@samsung.com \
    --cc=vinod.koul@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.