* Bug in processing dependencies by async_tx_submit() ? @ 2007-10-31 16:21 Yuri Tikhonov 2007-11-01 3:01 ` Dan Williams 0 siblings, 1 reply; 6+ messages in thread From: Yuri Tikhonov @ 2007-10-31 16:21 UTC (permalink / raw) To: Williams, Dan J Cc: Dan Williams, Neil Brown, Wolfgang Denk, Detlev Zundel, linux-raid Hello Dan, I've run into a problem with the h/w accelerated RAID-5 driver (on the ppc440spe-based board). After some investigations I've come to conclusion that the issue is with the async_tx_submit() implementation in ASYNC_TX. Let's we have the following 3 dependent operations: OP1, OP2, OP3. The OP1 operation has to be run on one, DMA0, channel, OP2 and OP3 operations have to be run on another, DMA1, channel. OP2 has to be run after completion of OP1, OP3 - after completion of OP2. I. e. : OP1 (DMA0) <--- OP2 (DMA1) <--- OP3 (DMA1). With the current implementation of async_tx_submit() we have the following picture when we sequentially call async_tx_submit() to submit these 3 operations: - the OP2 operation will *not be* submitted to the h/w channel, but, because of the (OP1->chan != OP2->chan) condition, will be put to depend_list of OP1; and it's RIGHT; - the OP3 operation will *be* submitted to the h/w channel, because OP2 & OP3 do not conform the (OP2->chan != OP3->chan) condition. Thus OP3 will be put to the h/w channel *before* its parent, OP2; and it's WRONG. It looks more logical for me to check in async_tx_submit() - does the dep_tx operation (on which the current tx operation is dependent on) depend on some other operation itself. If dep_tx depends on some other operation then, regardless of the channel on which tx has to be run, we should not put tx to the h/w but bind it to depend_list of dep_tx instead. Why there are no such algorithms in async_tx_submit() ? As far as the RAID-5 h/w accelerated driver is concerned this means that with the current implementation of async_tx_submit(), for example, the (PREXOR + BIODRAIN) combination works incorrectly for the cases when BIODRAIN is split into a sequence of MEMCPY operations and there are two different channels for XOR and COPY operations: only one, the first, MEMCPY will wait until completion of PREXOR, but all the others MEMCPYs of this BIODRAIN will be put into the h/w channel right away. Is it correct ? I think - not. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug in processing dependencies by async_tx_submit() ? 2007-10-31 16:21 Bug in processing dependencies by async_tx_submit() ? Yuri Tikhonov @ 2007-11-01 3:01 ` Dan Williams 2007-11-01 10:00 ` Yuri Tikhonov 0 siblings, 1 reply; 6+ messages in thread From: Dan Williams @ 2007-11-01 3:01 UTC (permalink / raw) To: Yuri Tikhonov Cc: Dan Williams, Neil Brown, Wolfgang Denk, Detlev Zundel, linux-raid On Wed, 2007-10-31 at 09:21 -0700, Yuri Tikhonov wrote: > > Hello Dan, > > I've run into a problem with the h/w accelerated RAID-5 driver (on the > ppc440spe-based board). After some investigations I've come to conclusion > that the issue is with the async_tx_submit() implementation in ASYNC_TX. > Unfortunately this is correct, async_tx_submit() will let the third operation pass the second in the scenario you describe. I propose the fix (untested) below. I'll test this out tomorrow when I am back in the office. --- async_tx: fix successive dependent operation submission From: Dan Williams <dan.j.williams@intel.com> async_tx_submit() tried to use the hardware descriptor chain to maintain transaction ordering. However before falling back to hardware-channel dependency ordering async_tx_submit() must first check if the entire chain is waiting on another channel. OP1 (DMA0) <--- OP2 (DMA1) <--- OP3 (DMA1) OP3 must be submitted as an OP2 dependency if it is submitted before OP1 completes. Otherwise if OP1 is complete, OP3 can use the natural sequence of DMA1's hardware chain to satisfy that it runs after OP2. The fix is to check if the ->parent field of the dependency is non-NULL. This also requires that the parent field be cleared at dependency submission time. Found-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- crypto/async_tx/async_tx.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c index bc18cbb..eb1afb9 100644 --- a/crypto/async_tx/async_tx.c +++ b/crypto/async_tx/async_tx.c @@ -125,6 +125,9 @@ async_tx_run_dependencies(struct dma_async_tx_descriptor *tx) list_del(&dep_tx->depend_node); tx->tx_submit(dep_tx); + /* we no longer have a parent */ + tx->parent = NULL; + /* we need to poke the engine as client code does not * know about dependency submission events */ @@ -409,8 +412,9 @@ async_tx_submit(struct dma_chan *chan, struct dma_async_tx_descriptor *tx, /* set this new tx to run after depend_tx if: * 1/ a dependency exists (depend_tx is !NULL) * 2/ the tx can not be submitted to the current channel + * 3/ the depend_tx has a parent */ - if (depend_tx && depend_tx->chan != chan) { + if (depend_tx && (depend_tx->chan != chan || depend_tx->parent)) { /* if ack is already set then we cannot be sure * we are referring to the correct operation */ ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Bug in processing dependencies by async_tx_submit() ? 2007-11-01 3:01 ` Dan Williams @ 2007-11-01 10:00 ` Yuri Tikhonov 2007-11-02 0:36 ` Dan Williams 0 siblings, 1 reply; 6+ messages in thread From: Yuri Tikhonov @ 2007-11-01 10:00 UTC (permalink / raw) To: Dan Williams Cc: Dan Williams, Neil Brown, Wolfgang Denk, Detlev Zundel, linux-raid Hi Dan, Honestly I tried to fix this quickly using the approach similar to proposed by you, with one addition though (in fact, deletion of BUG_ON(chan == tx->chan) in async_tx_run_dependencies()). And this led to "Kernel stack overflow". This happened because of the recurseve calling async_tx_submit() from async_trigger_callback() and vice verse. So, then I made the interrupt scheduling in async_tx_submit() only for the cases when it is really needed: i.e. when dependent operations are to be run on different channels. The resulted kernel locked-up during processing of the mkfs command on the top of the RAID-array. The place where it is spinning is the dma_sync_wait() function. This is happened because of the specific implementation of dma_wait_for_async_tx(). The "iter", we finally waiting for there, corresponds to the last allocated but not-yet-submitted descriptor. But if the "iter" we are waiting for is dependent from another descriptor which has cookie > 0, but is not yet submitted to the h/w channel because of the fact that threshold is not achieved to this moment, then we may wait in dma_wait_for_async_tx() infinitely. I think that it makes more sense to get the first descriptor which was submitted to the channel but probably is not put into the h/w chain, i.e. with cookie > 0 and do dma_sync_wait() of this descriptor. When I modified the dma_wait_for_async_tx() in such way, then the kernel locking had disappeared. But nevertheless the mkfs processes hangs-up after some time. So, it looks like something is still missing in support of the chaining dependencies feature... Below is the final patch I used during my tests: diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c index bc18cbb..b7d3b2b 100644 --- a/crypto/async_tx/async_tx.c +++ b/crypto/async_tx/async_tx.c @@ -92,7 +92,7 @@ dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx) /* find the root of the unsubmitted dependency chain */ while (iter->cookie == -EBUSY) { parent = iter->parent; - if (parent && parent->cookie == -EBUSY) + if (parent) iter = iter->parent; else break; @@ -120,11 +120,12 @@ async_tx_run_dependencies(struct dma_async_tx_descriptor *tx) depend_node) { chan = dep_tx->chan; dev = chan->device; - /* we can't depend on ourselves */ - BUG_ON(chan == tx->chan); list_del(&dep_tx->depend_node); tx->tx_submit(dep_tx); + /* we no longer have a parent */ + tx->parent = NULL; + /* we need to poke the engine as client code does not * know about dependency submission events */ @@ -409,8 +410,9 @@ async_tx_submit(struct dma_chan *chan, struct dma_async_tx_descriptor *tx, /* set this new tx to run after depend_tx if: * 1/ a dependency exists (depend_tx is !NULL) * 2/ the tx can not be submitted to the current channel + * 3/ the depend_tx has a parent */ - if (depend_tx && depend_tx->chan != chan) { + if (depend_tx && (depend_tx->chan != chan || depend_tx->parent)) { /* if ack is already set then we cannot be sure * we are referring to the correct operation */ @@ -427,7 +429,8 @@ async_tx_submit(struct dma_chan *chan, struct dma_async_tx_descriptor *tx, spin_unlock_bh(&depend_tx->lock); /* schedule an interrupt to trigger the channel switch */ - async_trigger_callback(ASYNC_TX_ACK, depend_tx, NULL, NULL); + if (depend_tx->chan != chan) + async_trigger_callback(ASYNC_TX_ACK, depend_tx, NULL, NULL); } else { tx->parent = NULL; tx->tx_submit(tx); Regards, Yuri On Thursday 01 November 2007 06:01, Dan Williams wrote: > On Wed, 2007-10-31 at 09:21 -0700, Yuri Tikhonov wrote: > > > > Hello Dan, > > > > I've run into a problem with the h/w accelerated RAID-5 driver (on the > > ppc440spe-based board). After some investigations I've come to conclusion > > that the issue is with the async_tx_submit() implementation in ASYNC_TX. > > > Unfortunately this is correct, async_tx_submit() will let the third > operation pass the second in the scenario you describe. I propose the > fix (untested) below. I'll test this out tomorrow when I am back in the > office. > > --- > async_tx: fix successive dependent operation submission > > From: Dan Williams <dan.j.williams@intel.com> > > async_tx_submit() tried to use the hardware descriptor chain to maintain > transaction ordering. However before falling back to hardware-channel > dependency ordering async_tx_submit() must first check if the entire chain > is waiting on another channel. > > OP1 (DMA0) <--- OP2 (DMA1) <--- OP3 (DMA1) > > OP3 must be submitted as an OP2 dependency if it is submitted before OP1 > completes. Otherwise if OP1 is complete, OP3 can use the natural sequence > of DMA1's hardware chain to satisfy that it runs after OP2. > > The fix is to check if the ->parent field of the dependency is non-NULL. > This also requires that the parent field be cleared at dependency > submission time. > > Found-by: Yuri Tikhonov <yur@emcraft.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > > crypto/async_tx/async_tx.c | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c > index bc18cbb..eb1afb9 100644 > --- a/crypto/async_tx/async_tx.c > +++ b/crypto/async_tx/async_tx.c > @@ -125,6 +125,9 @@ async_tx_run_dependencies(struct dma_async_tx_descriptor *tx) > list_del(&dep_tx->depend_node); > tx->tx_submit(dep_tx); > > + /* we no longer have a parent */ > + tx->parent = NULL; > + > /* we need to poke the engine as client code does not > * know about dependency submission events > */ > @@ -409,8 +412,9 @@ async_tx_submit(struct dma_chan *chan, struct dma_async_tx_descriptor *tx, > /* set this new tx to run after depend_tx if: > * 1/ a dependency exists (depend_tx is !NULL) > * 2/ the tx can not be submitted to the current channel > + * 3/ the depend_tx has a parent > */ > - if (depend_tx && depend_tx->chan != chan) { > + if (depend_tx && (depend_tx->chan != chan || depend_tx->parent)) { > /* if ack is already set then we cannot be sure > * we are referring to the correct operation > */ > > -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Bug in processing dependencies by async_tx_submit() ? 2007-11-01 10:00 ` Yuri Tikhonov @ 2007-11-02 0:36 ` Dan Williams 2007-11-02 8:13 ` Yuri Tikhonov 2007-11-12 8:46 ` Re[2]: " Yuri Tikhonov 0 siblings, 2 replies; 6+ messages in thread From: Dan Williams @ 2007-11-02 0:36 UTC (permalink / raw) To: Yuri Tikhonov; +Cc: Neil Brown, Wolfgang Denk, Detlev Zundel, linux-raid On 11/1/07, Yuri Tikhonov <yur@emcraft.com> wrote: > > Hi Dan, > > Honestly I tried to fix this quickly using the approach similar to proposed > by you, with one addition though (in fact, deletion of BUG_ON(chan == > tx->chan) in async_tx_run_dependencies()). And this led to "Kernel stack > overflow". This happened because of the recurseve calling async_tx_submit() > from async_trigger_callback() and vice verse. > I had a feeling the fix could not be that easy... > So, then I made the interrupt scheduling in async_tx_submit() only for the > cases when it is really needed: i.e. when dependent operations are to be run > on different channels. > > The resulted kernel locked-up during processing of the mkfs command on the > top of the RAID-array. The place where it is spinning is the dma_sync_wait() > function. > > This is happened because of the specific implementation of > dma_wait_for_async_tx(). So I take it you are not implementing interrupt based callbacks in your driver? > The "iter", we finally waiting for there, corresponds to the last allocated > but not-yet-submitted descriptor. But if the "iter" we are waiting for is > dependent from another descriptor which has cookie > 0, but is not yet > submitted to the h/w channel because of the fact that threshold is not > achieved to this moment, then we may wait in dma_wait_for_async_tx() > infinitely. I think that it makes more sense to get the first descriptor > which was submitted to the channel but probably is not put into the h/w > chain, i.e. with cookie > 0 and do dma_sync_wait() of this descriptor. > > When I modified the dma_wait_for_async_tx() in such way, then the kernel > locking had disappeared. But nevertheless the mkfs processes hangs-up after > some time. So, it looks like something is still missing in support of the > chaining dependencies feature... > I am preparing a new patch that replaces ASYNC_TX_DEP_ACK with ASYNC_TX_CHAIN_ACK. The plan is to make the entire chain of dependencies available up until the last transaction is submitted. This allows the entire dependency chain to be walked at async_tx_submit time so that we can properly handle these multiple dependency cases. I'll send it out when it passes my internal tests... -- Dan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug in processing dependencies by async_tx_submit() ? 2007-11-02 0:36 ` Dan Williams @ 2007-11-02 8:13 ` Yuri Tikhonov 2007-11-12 8:46 ` Re[2]: " Yuri Tikhonov 1 sibling, 0 replies; 6+ messages in thread From: Yuri Tikhonov @ 2007-11-02 8:13 UTC (permalink / raw) To: Dan Williams; +Cc: Neil Brown, Wolfgang Denk, Detlev Zundel, linux-raid Hi Dan, On Friday 02 November 2007 03:36, Dan Williams wrote: > > This is happened because of the specific implementation of > > dma_wait_for_async_tx(). > > So I take it you are not implementing interrupt based callbacks in your driver? Why not ? I have interrupt based callbacks in my driver. An INTERRUPT descriptor, implemented for both (COPY and XOR) channels, does the callback upon its completion. Here is an example where your implementation of dma_wait_for_async_tx() will not work as expected. Let's we have OP1 <--depends on-- OP2 <--depends on-- OP3, where OP1: cookie = -EBUSY, channel = DMA0; <- not submitted OP2: cookie = -EBUSY, channel = DMA0; <- not submitted OP3: cookie = 101, channel = DMA1; <- submitted, but not linked to h/w where cookie == 101 is some valid, positive cookie; and this fact means that OP3 *was submitted* to the DMA1 channel but *perhaps was not linked* to the h/w chain, for example, because the threshold for DMA1 was not achieved yet. With your implementation of dma_wait_for_async_tx() we do dma_sync_wait(OP2). And I propose to do dma_sync_wait(OP3), because in your case we may never wait for OP2 completion since dma_sync_wait() flushes to h/w the chains of DMA0, but OP3 in DMA1 remains unlinked to h/w and it blocks all the chain of dependencies. > > The "iter", we finally waiting for there, corresponds to the last allocated > > but not-yet-submitted descriptor. But if the "iter" we are waiting for is > > dependent from another descriptor which has cookie > 0, but is not yet > > submitted to the h/w channel because of the fact that threshold is not > > achieved to this moment, then we may wait in dma_wait_for_async_tx() > > infinitely. I think that it makes more sense to get the first descriptor > > which was submitted to the channel but probably is not put into the h/w > > chain, i.e. with cookie > 0 and do dma_sync_wait() of this descriptor. > > > > When I modified the dma_wait_for_async_tx() in such way, then the kernel > > locking had disappeared. But nevertheless the mkfs processes hangs-up after > > some time. So, it looks like something is still missing in support of the > > chaining dependencies feature... > > > > I am preparing a new patch that replaces ASYNC_TX_DEP_ACK with > ASYNC_TX_CHAIN_ACK. The plan is to make the entire chain of > dependencies available up until the last transaction is submitted. > This allows the entire dependency chain to be walked at > async_tx_submit time so that we can properly handle these multiple > dependency cases. I'll send it out when it passes my internal > tests... Fine. I guess this replacement assumes some modifications to the RAID-5 driver as well. Right? -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re[2]: Bug in processing dependencies by async_tx_submit() ? 2007-11-02 0:36 ` Dan Williams 2007-11-02 8:13 ` Yuri Tikhonov @ 2007-11-12 8:46 ` Yuri Tikhonov 1 sibling, 0 replies; 6+ messages in thread From: Yuri Tikhonov @ 2007-11-12 8:46 UTC (permalink / raw) To: Dan Williams; +Cc: Neil Brown, Wolfgang Denk, Detlev Zundel, linux-raid Hi Dan, On 02.11.2007, 3:36:50 you wrote: > I am preparing a new patch that replaces ASYNC_TX_DEP_ACK with > ASYNC_TX_CHAIN_ACK. The plan is to make the entire chain of > dependencies available up until the last transaction is submitted. > This allows the entire dependency chain to be walked at > async_tx_submit time so that we can properly handle these multiple > dependency cases. I'll send it out when it passes my internal > tests... Meanwhile I've implemented my fix to this issue. With this the tests, which previously failed (mkfs got stuck), now pass successfully for me. Would you please take a look? Thanks in advance. diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c index bc18cbb..6d77ae6 100644 --- a/crypto/async_tx/async_tx.c +++ b/crypto/async_tx/async_tx.c @@ -92,7 +92,7 @@ dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx) /* find the root of the unsubmitted dependency chain */ while (iter->cookie == -EBUSY) { parent = iter->parent; - if (parent && parent->cookie == -EBUSY) + if (parent) iter = iter->parent; else break; @@ -120,10 +120,11 @@ async_tx_run_dependencies(struct dma_async_tx_descriptor *tx) depend_node) { chan = dep_tx->chan; dev = chan->device; - /* we can't depend on ourselves */ - BUG_ON(chan == tx->chan); list_del(&dep_tx->depend_node); - tx->tx_submit(dep_tx); + dep_tx->tx_submit(dep_tx); + + /* we no longer have a parent */ + dep_tx->parent = NULL; /* we need to poke the engine as client code does not * know about dependency submission events @@ -409,25 +410,41 @@ async_tx_submit(struct dma_chan *chan, struct dma_async_tx_descriptor *tx, /* set this new tx to run after depend_tx if: * 1/ a dependency exists (depend_tx is !NULL) * 2/ the tx can not be submitted to the current channel + * 3/ the depend_tx has a parent */ - if (depend_tx && depend_tx->chan != chan) { + if (depend_tx && (depend_tx->chan != chan || depend_tx->parent)) { /* if ack is already set then we cannot be sure * we are referring to the correct operation */ BUG_ON(depend_tx->ack); - tx->parent = depend_tx; spin_lock_bh(&depend_tx->lock); + tx->parent = depend_tx; list_add_tail(&tx->depend_node, &depend_tx->depend_list); if (depend_tx->cookie == 0) { - struct dma_chan *dep_chan = depend_tx->chan; - struct dma_device *dep_dev = dep_chan->device; - dep_dev->device_dependency_added(dep_chan); - } - spin_unlock_bh(&depend_tx->lock); + /* depend_tx has been completed, run our dep + * manually + */ + async_tx_run_dependencies(depend_tx); + spin_unlock_bh(&depend_tx->lock); + } else { + /* depend_tx still in fly */ + spin_unlock_bh(&depend_tx->lock); - /* schedule an interrupt to trigger the channel switch */ - async_trigger_callback(ASYNC_TX_ACK, depend_tx, NULL, NULL); + /* schedule an interrupt to trigger the channel + * switch or dependencies submittion + */ + if (!(flags & ASYNC_TX_INT) && (depend_tx->chan != chan || + !depend_tx->callback)) + async_trigger_callback(ASYNC_TX_ACK | ASYNC_TX_INT, + depend_tx, NULL, NULL); + + /* flush the parent if it's not submitted yet */ + spin_lock_bh(&depend_tx->lock); + depend_tx->chan->device->device_issue_pending( + depend_tx->chan); + spin_unlock_bh(&depend_tx->lock); + } } else { tx->parent = NULL; tx->tx_submit(tx); diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h index aea0402..ee09315 100644 --- a/include/linux/async_tx.h +++ b/include/linux/async_tx.h @@ -67,6 +67,7 @@ enum async_tx_flags { ASYNC_TX_KMAP_SRC = (1 << 5), ASYNC_TX_KMAP_DST = (1 << 6), ASYNC_TX_ASYNC_ONLY = (1 << 7), + ASYNC_TX_INT = (1 << 8), }; #ifdef CONFIG_DMA_ENGINE ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-11-12 8:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-31 16:21 Bug in processing dependencies by async_tx_submit() ? Yuri Tikhonov 2007-11-01 3:01 ` Dan Williams 2007-11-01 10:00 ` Yuri Tikhonov 2007-11-02 0:36 ` Dan Williams 2007-11-02 8:13 ` Yuri Tikhonov 2007-11-12 8:46 ` Re[2]: " Yuri Tikhonov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).