From: NeilBrown <neilb@suse.de>
To: Dan Williams <djbw@fb.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
Vinod Koul <vinod.koul@intel.com>,
Tomasz Figa <t.figa@samsung.com>,
Kyungmin Park <kyungmin.park@samsung.com>
Subject: Re: [PATCH] raid5: panic() on dma_wait_for_async_tx() error
Date: Tue, 20 Nov 2012 14:13:43 +1100 [thread overview]
Message-ID: <20121120141343.25d84c71@notabene.brown> (raw)
In-Reply-To: <1353378237.26735.11.camel@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 4138 bytes --]
On Mon, 19 Nov 2012 18:23:57 -0800 Dan Williams <djbw@fb.com> wrote:
> On Tue, 2012-11-20 at 09:18 +1100, NeilBrown wrote:
> > On Mon, 19 Nov 2012 05:22:25 +0000 Dan Williams <djbw@fb.com> wrote:
> >
> > >
> > >
> > > On 11/18/12 5:06 PM, "NeilBrown" <neilb@suse.de> wrote:
> > >
> > > >
> > > >Hi Dan,
> > > > could you comment on this please? Would it make sense to arrange for
> > > >errors
> > > > to propagate up? Or should we arrange to do a software-fallback in the
> > > >dma
> > > > engine is a problem? What sort of things can cause error here anyway?
> > >
> > > Propagating up is missing reliable "dma abort" operation.
> > >
> > > In these cases the engine failed to complete due to hardware hang / driver
> > > bug, or has hit a memory error (uncorrectable even with software
> > > fallback). This originally should have been using async_tx_quiesce()
> > > which also does the panic.
> > >
> > > The engines that I have worked with have either lacked support for
> > > aborting, or were otherwise unable to recover from a hardware hang.
> > > However, for engines that do support error recovery they should be able to
> > > hide the failure from the upper layers.
> > >
> >
> > So maybe I could:
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index ac09fa4..ffbf0ca 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -3268,7 +3268,7 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
> > /* done submitting copies, wait for them to complete */
> > if (tx) {
> > async_tx_ack(tx);
> > - dma_wait_for_async_tx(tx);
> > + async_tx_quiesce(&tx);
> > }
> > }
> >
> >
> >
> > and then the panic would be somebody else's problem?
> >
> > I note that handle_stripe_expansion has:
> >
> > async_tx_ack(tx);
> > dma_wait_for_async_tx(tx);
> >
> > while async_tx_quiesce() has:
> >
> > if (dma_wait_for_async_tx(*tx) == DMA_ERROR)
> > panic("DMA_ERROR waiting for transaction\n");
> > async_tx_ack(*tx);
> >
> >
> > i.e. the same two functions called in the reverse order. Is the order
> > important? Is handle_stripe_expansion wrong? Should the patch I apply
> > actually be:
> >
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index ac09fa4..e51d903 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -3266,10 +3266,7 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
> >
> > }
> > /* done submitting copies, wait for them to complete */
> > - if (tx) {
> > - async_tx_ack(tx);
> > - dma_wait_for_async_tx(tx);
> > - }
> > + async_tx_quiesce(&tx);
> > }
> >
>
> Yes, this one, handles it like the other cases of needing to do a
> synchronous wait and does not care if tx is NULL.
Thanks. Following is now in my for-next branch.
NeilBrown
From e25a8de38d6584ffd042dbef3a5a8eb518b8813b Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 20 Nov 2012 14:11:15 +1100
Subject: [PATCH] md/raid5: use async_tx_quiesce() instead of open-coding it.
handle_stripe_expansion contains:
if (tx) {
async_tx_ack(tx);
dma_wait_for_async_tx(tx);
}
which is very similar to the body of async_tx_quiesce(),
except that the later handles an error from dma_wait_for_async_tx()
(admittedly by panicing, but that decision belongs in the dma
code, not the md code).
So just us async_tx_quiesce().
Acked-by: Dan Williams <djbw@fb.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ac09fa4..e51d903 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3266,10 +3266,7 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
}
/* done submitting copies, wait for them to complete */
- if (tx) {
- async_tx_ack(tx);
- dma_wait_for_async_tx(tx);
- }
+ async_tx_quiesce(&tx);
}
/*
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
prev parent reply other threads:[~2012-11-20 3:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-08 10:06 [PATCH] raid5: panic() on dma_wait_for_async_tx() error Bartlomiej Zolnierkiewicz
2012-11-08 11:15 ` Alan Cox
2012-11-08 11:20 ` Bartlomiej Zolnierkiewicz
2012-11-19 1:06 ` NeilBrown
2012-11-19 5:22 ` Dan Williams
2012-11-19 22:18 ` NeilBrown
2012-11-20 2:23 ` Dan Williams
2012-11-20 3:13 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121120141343.25d84c71@notabene.brown \
--to=neilb@suse.de \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=b.zolnierkie@samsung.com \
--cc=djbw@fb.com \
--cc=kyungmin.park@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=t.figa@samsung.com \
--cc=vinod.koul@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.