linux-omap.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] dma: cppi41: more suspend/resume patches
@ 2013-10-01 13:31 Daniel Mack
       [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Daniel Mack @ 2013-10-01 13:31 UTC (permalink / raw)
  To: linux-usb
  Cc: linux-omap, neumann, bigeasy, vinod.koul, dan.j.williams, balbi,
	Daniel Mack

While my first series makes the cppi41 driver survive suspend/resume
cycles as long as users are fully removed and added back after resume,
here are some more patches which make it all work completely.

Patch #1 restores more registers on resume time.

Patch #2 is a cosmetic cleanup that emerged while digging through the
driver and gaining a basic idea of how it's implemented. Nothing fancy.

Patch #3, however, gives me headaches. I can't fully explain what's
going on, but I can tell for sure that if fixes a problem that I stared
on for many hours.

The problem is that on resume, the musb core will detect that some of
the suspended USB devices' endpoints are stalled. Which is something
that is unrelated to the dma driver, it just seems to be an expected
condition. That, however, makes the musb core call
cppi41_dma_channel_abort() -> cppi41_tear_down_chan(), which is
an otherwise untravelled code path. When that function is called for
a channel which has all of td_queued, td_seen and td_desc_seen set
to FALSE, I'm always getting a warning like this:

[   17.105981] ------------[ cut here ]------------
[   17.110861] WARNING: CPU: 0 PID: 122 at drivers/dma/cppi41.c:644 cppi41_dma_control+0x378/0x3f8 [cppi41]()
[   17.120990] Modules linked in: musb_dsps musb_hdrc cppi41 snd_soc_cs4271 snd_soc_ak4104 snd_soc_davinci_mcasp musb_am335x
[   17.132583] CPU: 0 PID: 122 Comm: usb-storage Not tainted 3.12.0-rc3-00073-gb73d497-dirty #975
[   17.141670] [<c00135b8>] (unwind_backtrace+0x0/0xf4) from [<c0011418>] (show_stack+0x10/0x14)
[   17.150636] [<c0011418>] (show_stack+0x10/0x14) from [<c003597c>] (warn_slowpath_common+0x6c/0x84)
[   17.160052] [<c003597c>] (warn_slowpath_common+0x6c/0x84) from [<c0035a30>] (warn_slowpath_null+0x1c/0x24)
[   17.170198] [<c0035a30>] (warn_slowpath_null+0x1c/0x24) from [<bf015824>] (cppi41_dma_control+0x378/0x3f8 [cppi41])
[   17.181370] [<bf015824>] (cppi41_dma_control+0x378/0x3f8 [cppi41]) from [<bf023974>] (cppi41_dma_channel_abort+0xb0/0x124 [musb_hd)
[   17.194111] [<bf023974>] (cppi41_dma_channel_abort+0xb0/0x124 [musb_hdrc]) from [<bf02031c>] (musb_host_rx+0x2b0/0x404 [musb_hdrc])
[   17.206565] [<bf02031c>] (musb_host_rx+0x2b0/0x404 [musb_hdrc]) from [<bf01ca70>] (musb_interrupt+0x70/0x95c [musb_hdrc])
[   17.218102] [<bf01ca70>] (musb_interrupt+0x70/0x95c [musb_hdrc]) from [<bf02f640>] (dsps_interrupt+0x174/0x254 [musb_dsps])
[   17.229817] [<bf02f640>] (dsps_interrupt+0x174/0x254 [musb_dsps]) from [<c00686d0>] (handle_irq_event_percpu+0x38/0x194)
[   17.241238] [<c00686d0>] (handle_irq_event_percpu+0x38/0x194) from [<c0068868>] (handle_irq_event+0x3c/0x5c)
[   17.251565] [<c0068868>] (handle_irq_event+0x3c/0x5c) from [<c006aa58>] (handle_level_irq+0x90/0xf4)
[   17.261163] [<c006aa58>] (handle_level_irq+0x90/0xf4) from [<c0067f30>] (generic_handle_irq+0x2c/0x3c)
[   17.270942] [<c0067f30>] (generic_handle_irq+0x2c/0x3c) from [<c000eae4>] (handle_IRQ+0x38/0x84)
[   17.280174] [<c000eae4>] (handle_IRQ+0x38/0x84) from [<c00085b8>] (omap3_intc_handle_irq+0x68/0x74)
[   17.289678] [<c00085b8>] (omap3_intc_handle_irq+0x68/0x74) from [<c0011f04>] (__irq_svc+0x44/0x78)
[   17.299085] Exception stack(0xcedf1d18 to 0xcedf1d60)
[   17.304391] 1d00:                                                       00000001 c083c10c
[   17.312981] 1d20: 00000000 cec4cb80 60000013 cec68010 cee2e640 ced12c00 00000000 60000013
[   17.321572] 1d40: cee955cc 00000080 c08640ac cedf1d60 c007af4c c0511ab8 20000013 ffffffff
[   17.330177] [<c0011f04>] (__irq_svc+0x44/0x78) from [<c0511ab8>] (_raw_spin_unlock_irqrestore+0x64/0x68)
[   17.340156] [<c0511ab8>] (_raw_spin_unlock_irqrestore+0x64/0x68) from [<bf01ee78>] (musb_urb_enqueue+0x70/0x520 [musb_hdrc])
[   17.351974] [<bf01ee78>] (musb_urb_enqueue+0x70/0x520 [musb_hdrc]) from [<c0344248>] (usb_hcd_submit_urb+0xa0/0x26c)
[   17.363044] [<c0344248>] (usb_hcd_submit_urb+0xa0/0x26c) from [<c0352724>] (usb_stor_msg_common+0x84/0x134)
[   17.373283] [<c0352724>] (usb_stor_msg_common+0x84/0x134) from [<c0352b38>] (usb_stor_bulk_transfer_buf+0x48/0x7c)
[   17.384160] [<c0352b38>] (usb_stor_bulk_transfer_buf+0x48/0x7c) from [<c0352dfc>] (usb_stor_Bulk_transport+0x144/0x2fc)
[   17.395491] [<c0352dfc>] (usb_stor_Bulk_transport+0x144/0x2fc) from [<c0353524>] (usb_stor_invoke_transport+0x20/0x48c)
[   17.406817] [<c0353524>] (usb_stor_invoke_transport+0x20/0x48c) from [<c0354960>] (usb_stor_control_thread+0x164/0x228)
[   17.418158] [<c0354960>] (usb_stor_control_thread+0x164/0x228) from [<c0050e60>] (kthread+0xb4/0xb8)
[   17.427759] [<c0050e60>] (kthread+0xb4/0xb8) from [<c000e2c8>] (ret_from_fork+0x14/0x2c)
[   17.436250] ---[ end trace 0606f8051ee8bb0d ]---

Note that the line numbers don't match the current code in mainline due
to some debugging code, but it should be clear where the warning comes
from.

With patch #3 applied, I made this problem go away, and I can suspend
resume with all musb related drivers active just fine. The only issue
I have is that I don't fully understand the reason, as it seems to me
that my patch just changes the timing, and we're actually seeing a
race condition here.

Sebastian, can you give a comment on this? I'll post the musb patches
that are necessary as well now, and I'd appreciate more testers here.


Many thanks,
Daniel


Daniel Mack (3):
  dma: cppi41: restore more registers
  dma: cppi41: use cppi41_pop_desc() where possible
  dma: cppi41: move -EAGAIN in tear_down

 drivers/dma/cppi41.c | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] dma: cppi41: restore more registers
       [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-01 13:31   ` Daniel Mack
  2013-10-01 13:31   ` [PATCH 2/3] dma: cppi41: use cppi41_pop_desc() where possible Daniel Mack
  2013-10-01 13:31   ` [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down Daniel Mack
  2 siblings, 0 replies; 15+ messages in thread
From: Daniel Mack @ 2013-10-01 13:31 UTC (permalink / raw)
  To: linux-usb-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-omap-u79uwXL29TY76Z2rM5mHXA, neumann-SRDuVqtxQLSzQB+pC5nmwQ,
	bigeasy-hfZtesqFncYOwBW4kG4KsQ, vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w, balbi-l0cyMroinI0,
	Daniel Mack

With active users over suspend/resume cycles, it turns out that
more registers, in particular DMA_TDFDQ and RXHPCRA0, have to be
restored on resume.

Signed-off-by: Daniel Mack <zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/dma/cppi41.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
index 89decc9..9a298b1 100644
--- a/drivers/dma/cppi41.c
+++ b/drivers/dma/cppi41.c
@@ -141,6 +141,9 @@ struct cppi41_dd {
 	const struct chan_queues *queues_rx;
 	const struct chan_queues *queues_tx;
 	struct chan_queues td_queue;
+
+	/* context for suspend/resume */
+	unsigned int dma_tdfdq;
 };
 
 #define FIST_COMPLETION_QUEUE	93
@@ -1045,6 +1048,7 @@ static int cppi41_suspend(struct device *dev)
 {
 	struct cppi41_dd *cdd = dev_get_drvdata(dev);
 
+	cdd->dma_tdfdq = cppi_readl(cdd->ctrl_mem + DMA_TDFDQ);
 	cppi_writel(0, cdd->usbss_mem + USBSS_IRQ_CLEARR);
 	disable_sched(cdd);
 
@@ -1054,12 +1058,23 @@ static int cppi41_suspend(struct device *dev)
 static int cppi41_resume(struct device *dev)
 {
 	struct cppi41_dd *cdd = dev_get_drvdata(dev);
+	struct cppi41_channel *c;
 	int i;
 
 	for (i = 0; i < DESCS_AREAS; i++)
 		cppi_writel(cdd->descs_phys, cdd->qmgr_mem + QMGR_MEMBASE(i));
 
+	list_for_each_entry(c, &cdd->ddev.channels, chan.device_node)
+		if (!c->is_tx)
+			cppi_writel(c->q_num, c->gcr_reg + RXHPCRA0);
+
 	init_sched(cdd);
+
+	cppi_writel(cdd->dma_tdfdq, cdd->ctrl_mem + DMA_TDFDQ);
+	cppi_writel(cdd->scratch_phys, cdd->qmgr_mem + QMGR_LRAM0_BASE);
+	cppi_writel(QMGR_SCRATCH_SIZE, cdd->qmgr_mem + QMGR_LRAM_SIZE);
+	cppi_writel(0, cdd->qmgr_mem + QMGR_LRAM1_BASE);
+
 	cppi_writel(USBSS_IRQ_PD_COMP, cdd->usbss_mem + USBSS_IRQ_ENABLER);
 
 	return 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] dma: cppi41: use cppi41_pop_desc() where possible
       [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-10-01 13:31   ` [PATCH 1/3] dma: cppi41: restore more registers Daniel Mack
@ 2013-10-01 13:31   ` Daniel Mack
  2013-10-01 13:31   ` [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down Daniel Mack
  2 siblings, 0 replies; 15+ messages in thread
From: Daniel Mack @ 2013-10-01 13:31 UTC (permalink / raw)
  To: linux-usb-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-omap-u79uwXL29TY76Z2rM5mHXA, neumann-SRDuVqtxQLSzQB+pC5nmwQ,
	bigeasy-hfZtesqFncYOwBW4kG4KsQ, vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w, balbi-l0cyMroinI0,
	Daniel Mack

Use cppi41_pop_desc() when appropriate instead of open-coding the same
functionality again. That makes the code more readable. The function has
to be moved some lines up for this change.

Signed-off-by: Daniel Mack <zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/dma/cppi41.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
index 9a298b1..7747bf7 100644
--- a/drivers/dma/cppi41.c
+++ b/drivers/dma/cppi41.c
@@ -266,6 +266,15 @@ static u32 pd_trans_len(u32 val)
 	return val & ((1 << (DESC_LENGTH_BITS_NUM + 1)) - 1);
 }
 
+static u32 cppi41_pop_desc(struct cppi41_dd *cdd, unsigned queue_num)
+{
+	u32 desc;
+
+	desc = cppi_readl(cdd->qmgr_mem + QMGR_QUEUE_D(queue_num));
+	desc &= ~0x1f;
+	return desc;
+}
+
 static irqreturn_t cppi41_irq(int irq, void *data)
 {
 	struct cppi41_dd *cdd = data;
@@ -303,8 +312,7 @@ static irqreturn_t cppi41_irq(int irq, void *data)
 			q_num = __fls(val);
 			val &= ~(1 << q_num);
 			q_num += 32 * i;
-			desc = cppi_readl(cdd->qmgr_mem + QMGR_QUEUE_D(q_num));
-			desc &= ~0x1f;
+			desc = cppi41_pop_desc(cdd, q_num);
 			c = desc_to_chan(cdd, desc);
 			if (WARN_ON(!c)) {
 				pr_err("%s() q %d desc %08x\n", __func__,
@@ -520,15 +528,6 @@ static void cppi41_compute_td_desc(struct cppi41_desc *d)
 	d->pd0 = DESC_TYPE_TEARD << DESC_TYPE;
 }
 
-static u32 cppi41_pop_desc(struct cppi41_dd *cdd, unsigned queue_num)
-{
-	u32 desc;
-
-	desc = cppi_readl(cdd->qmgr_mem + QMGR_QUEUE_D(queue_num));
-	desc &= ~0x1f;
-	return desc;
-}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down
       [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-10-01 13:31   ` [PATCH 1/3] dma: cppi41: restore more registers Daniel Mack
  2013-10-01 13:31   ` [PATCH 2/3] dma: cppi41: use cppi41_pop_desc() where possible Daniel Mack
@ 2013-10-01 13:31   ` Daniel Mack
       [not found]     ` <1380634271-27588-4-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 15+ messages in thread
From: Daniel Mack @ 2013-10-01 13:31 UTC (permalink / raw)
  To: linux-usb-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-omap-u79uwXL29TY76Z2rM5mHXA, neumann-SRDuVqtxQLSzQB+pC5nmwQ,
	bigeasy-hfZtesqFncYOwBW4kG4KsQ, vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w, balbi-l0cyMroinI0,
	Daniel Mack

In cppi41_tear_down_chan(), bail out earlier in case td_seen is unset
instead of popping another descriptor when td_desc_seen is also unset.

My system ran into WARN() condition multiple times when
cppi41_tear_down_chan() was called for channels that had all of
td_queued, td_seen and td_seen set to false.

Signed-off-by: Daniel Mack <zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/dma/cppi41.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
index 7747bf7..6decf34 100644
--- a/drivers/dma/cppi41.c
+++ b/drivers/dma/cppi41.c
@@ -586,6 +586,9 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
 			}
 			c->td_seen = 1;
 		}
+
+		if (c->td_retry)
+			return -EAGAIN;
 	}
 	if (!c->td_desc_seen) {
 		desc_phys = cppi41_pop_desc(cdd, c->q_comp_num);
@@ -606,8 +609,6 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
 	 * descriptor before the TD we fetch it from enqueue, it has to be
 	 * there waiting for us.
 	 */
-	if (!c->td_seen && c->td_retry)
-		return -EAGAIN;
 
 	WARN_ON(!c->td_retry);
 	if (!c->td_desc_seen) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down
       [not found]     ` <1380634271-27588-4-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-02  8:29       ` Sebastian Andrzej Siewior
  2013-10-02  9:19         ` Daniel Mack
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-02  8:29 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA, neumann-SRDuVqtxQLSzQB+pC5nmwQ,
	vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w, balbi-l0cyMroinI0

* Daniel Mack | 2013-10-01 15:31:11 [+0200]:

>In cppi41_tear_down_chan(), bail out earlier in case td_seen is unset
>instead of popping another descriptor when td_desc_seen is also unset.
>
>My system ran into WARN() condition multiple times when
>cppi41_tear_down_chan() was called for channels that had all of
>td_queued, td_seen and td_seen set to false.

Which one?

>Signed-off-by: Daniel Mack <zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>---
> drivers/dma/cppi41.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
>index 7747bf7..6decf34 100644
>--- a/drivers/dma/cppi41.c
>+++ b/drivers/dma/cppi41.c
>@@ -586,6 +586,9 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
> 			}
> 			c->td_seen = 1;
> 		}
>+
>+		if (c->td_retry)
>+			return -EAGAIN;

So you return right away since the retry counter should be > 0 here. And
then you want to get the TDDOWN bit set and retry. Hmmm.
Let me answer to you 0/3 on this.

> 	}
> 	if (!c->td_desc_seen) {
> 		desc_phys = cppi41_pop_desc(cdd, c->q_comp_num);
>@@ -606,8 +609,6 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
> 	 * descriptor before the TD we fetch it from enqueue, it has to be
> 	 * there waiting for us.
> 	 */
>-	if (!c->td_seen && c->td_retry)
>-		return -EAGAIN;
> 
> 	WARN_ON(!c->td_retry);
> 	if (!c->td_desc_seen) {

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down
  2013-10-02  8:29       ` Sebastian Andrzej Siewior
@ 2013-10-02  9:19         ` Daniel Mack
  2013-10-02 10:25           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Mack @ 2013-10-02  9:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

Hi Sebastian,

On 02.10.2013 10:29, Sebastian Andrzej Siewior wrote:
> * Daniel Mack | 2013-10-01 15:31:11 [+0200]:

>> diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
>> index 7747bf7..6decf34 100644
>> --- a/drivers/dma/cppi41.c
>> +++ b/drivers/dma/cppi41.c
>> @@ -586,6 +586,9 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
>> 			}
>> 			c->td_seen = 1;
>> 		}
>> +
>> +		if (c->td_retry)
>> +			return -EAGAIN;
> 
> So you return right away since the retry counter should be > 0 here. And
> then you want to get the TDDOWN bit set and retry. Hmmm.
> Let me answer to you 0/3 on this.

Thanks a lot for having a look! As I'm going to be off for a couple of
days now, and only be able to read my mails sporadically, maybe you can
also try the musb suspend functions on your hardware. I'll give you a
quick wrap-up of how my test setup looks like.

On an AM33xx board, I have a host-only (type A) connector with a USB
memory stick plugged in. The relevant config settings are:

CONFIG_USB_MUSB_HDRC=m
# CONFIG_USB_MUSB_HOST is not set
# CONFIG_USB_MUSB_GADGET is not set
CONFIG_USB_MUSB_DUAL_ROLE=y
# CONFIG_USB_MUSB_TUSB6010 is not set
# CONFIG_USB_MUSB_OMAP2PLUS is not set
# CONFIG_USB_MUSB_AM35X is not set
CONFIG_USB_MUSB_DSPS=m
# CONFIG_USB_MUSB_UX500 is not set
CONFIG_USB_MUSB_AM335X_CHILD=m
# CONFIG_MUSB_PIO_ONLY is not set
CONFIG_USB_TI_CPPI41_DMA=y
CONFIG_TI_CPPI41=y

Once the system is booted up and the USB media is detected, I send the
system to sleep mode with "cat mem >/sys/power/state". After wakeup, I
access the media by mounting and unmounting it once, then send the
system back to sleep.

Repeating the above cycle multiple times will sooner or later make the
warning kick in without the discussed patch. Sometimes it happened on
first try, sometimes it took me up to ~20 cycles to make it happen.

I'd be curious whether you see the same behavior on your board as well,
and whether the fix work for you, too.

For reference, I just pushed my current working tree here:

  https://github.com/zonque/linux/tree/am33xx-3.12


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
  2013-10-01 13:31 [PATCH 0/3] dma: cppi41: more suspend/resume patches Daniel Mack
       [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-02 10:20 ` Sebastian Andrzej Siewior
       [not found]   ` <20131002102033.GB16680-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
  2013-10-09  6:41 ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-02 10:20 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

* Daniel Mack | 2013-10-01 15:31:08 [+0200]:

>Patch #3, however, gives me headaches. I can't fully explain what's
>going on, but I can tell for sure that if fixes a problem that I stared
>on for many hours.
>
>The problem is that on resume, the musb core will detect that some of
>the suspended USB devices' endpoints are stalled. Which is something
>that is unrelated to the dma driver, it just seems to be an expected
>condition. That, however, makes the musb core call
>cppi41_dma_channel_abort() -> cppi41_tear_down_chan(), which is
>an otherwise untravelled code path. When that function is called for
>a channel which has all of td_queued, td_seen and td_desc_seen set
>to FALSE, I'm always getting a warning like this:
>
>[   17.105981] ------------[ cut here ]------------
>[   17.110861] WARNING: CPU: 0 PID: 122 at drivers/dma/cppi41.c:644 cppi41_dma_control+0x378/0x3f8 [cppi41]()

This is 
    WARN_ON(!cdd->chan_busy[desc_num]);

at the end of cppi41_stop_chan() right? So you get the warning because
you tried to stop a channel which was not busy. But then you should not
be called at all because cppi41_dma_channel_abort() shouldn't call dma
driver on idle channels. So it should complete at some point.

>Note that the line numbers don't match the current code in mainline due
>to some debugging code, but it should be clear where the warning comes
>from.
>
>With patch #3 applied, I made this problem go away, and I can suspend
>resume with all musb related drivers active just fine. The only issue
>I have is that I don't fully understand the reason, as it seems to me
>that my patch just changes the timing, and we're actually seeing a
>race condition here.
>
>Sebastian, can you give a comment on this? I'll post the musb patches
>that are necessary as well now, and I'd appreciate more testers here.

How does your suspend & resume thingy work? Is it completly shutdown
i.e. powered off? According to you earlier patches I would assume so. In
that case the request is not enqueued and there is nothing to be removed
from the engine, right?
With the change you somehow get an interrupt that cleans up that slot.
If you trigger TD bits for a random channel you get atleast the teardown
descriptor. But then you don't complain about the WARN_ON() about
missing / wrong desc_phys.
In general this works like this:
- descriptor is busy / in progress.
  The TEAR-DOWN bits have to be set a few times. The hw returns the
  teardown descriptor and the descriptor that has been enqueued
- descriptor is queued but not busy / in use
  Setting the TEAR-DOWN bit once seems to be enough. The hw returns
  _only_ the teardown descriptor. The transfer descriptor remains pushed
  onto the queue like it has been never consumed. A pop cleans it up,
  the complete queue is empty. (Warning: reading the queue counter leads
  to a pop! So checking if the queue counter increments after pushing
  something to it is a bad idea).

The whole thing has been tested by manipulating the USB storage driver
to enqueue more / less data then required by the protocol leading to a
stall followed by an abort of the transfer. Let me re-do your suspend
with the patches you made so far to check what is going on and if the
"normal" transfer cancel is still working.

>Many thanks,
>Daniel

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down
  2013-10-02  9:19         ` Daniel Mack
@ 2013-10-02 10:25           ` Sebastian Andrzej Siewior
  2013-10-02 11:38             ` Daniel Mack
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-02 10:25 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

On 10/02/2013 11:19 AM, Daniel Mack wrote:
> Hi Sebastian,

Hi Daniel,

> On 02.10.2013 10:29, Sebastian Andrzej Siewior wrote:
>> * Daniel Mack | 2013-10-01 15:31:11 [+0200]:
> 
> Thanks a lot for having a look! As I'm going to be off for a couple of
> days now, and only be able to read my mails sporadically, maybe you can
> also try the musb suspend functions on your hardware. I'll give you a
> quick wrap-up of how my test setup looks like.
> 
> On an AM33xx board, I have a host-only (type A) connector with a USB
> memory stick plugged in. The relevant config settings are:
> 
> CONFIG_USB_MUSB_HDRC=m
> # CONFIG_USB_MUSB_HOST is not set
> # CONFIG_USB_MUSB_GADGET is not set
> CONFIG_USB_MUSB_DUAL_ROLE=y
> # CONFIG_USB_MUSB_TUSB6010 is not set
> # CONFIG_USB_MUSB_OMAP2PLUS is not set
> # CONFIG_USB_MUSB_AM35X is not set
> CONFIG_USB_MUSB_DSPS=m
> # CONFIG_USB_MUSB_UX500 is not set
> CONFIG_USB_MUSB_AM335X_CHILD=m
> # CONFIG_MUSB_PIO_ONLY is not set
> CONFIG_USB_TI_CPPI41_DMA=y
> CONFIG_TI_CPPI41=y
> 
> Once the system is booted up and the USB media is detected, I send the
> system to sleep mode with "cat mem >/sys/power/state". After wakeup, I
> access the media by mounting and unmounting it once, then send the
> system back to sleep.

Okay. Going to sleep is probably easy, I need to figure out how to
wakeup…

> Repeating the above cycle multiple times will sooner or later make the
> warning kick in without the discussed patch. Sometimes it happened on
> first try, sometimes it took me up to ~20 cycles to make it happen.

Ah. Okay.

> I'd be curious whether you see the same behavior on your board as well,
> and whether the fix work for you, too.
> 
> For reference, I just pushed my current working tree here:
> 
>   https://github.com/zonque/linux/tree/am33xx-3.12

Thanks.

> 
> 
> Thanks,
> Daniel
> 
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
       [not found]   ` <20131002102033.GB16680-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
@ 2013-10-02 11:07     ` Daniel Mack
  2013-10-02 12:57       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Mack @ 2013-10-02 11:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA, neumann-SRDuVqtxQLSzQB+pC5nmwQ,
	vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w, balbi-l0cyMroinI0

On 02.10.2013 12:20, Sebastian Andrzej Siewior wrote:
> * Daniel Mack | 2013-10-01 15:31:08 [+0200]:
> 
>> Patch #3, however, gives me headaches. I can't fully explain what's
>> going on, but I can tell for sure that if fixes a problem that I stared
>> on for many hours.
>>
>> The problem is that on resume, the musb core will detect that some of
>> the suspended USB devices' endpoints are stalled. Which is something
>> that is unrelated to the dma driver, it just seems to be an expected
>> condition. That, however, makes the musb core call
>> cppi41_dma_channel_abort() -> cppi41_tear_down_chan(), which is
>> an otherwise untravelled code path. When that function is called for
>> a channel which has all of td_queued, td_seen and td_desc_seen set
>> to FALSE, I'm always getting a warning like this:
>>
>> [   17.105981] ------------[ cut here ]------------
>> [   17.110861] WARNING: CPU: 0 PID: 122 at drivers/dma/cppi41.c:644 cppi41_dma_control+0x378/0x3f8 [cppi41]()
> 
> This is 
>     WARN_ON(!cdd->chan_busy[desc_num]);
> 
> at the end of cppi41_stop_chan() right?

No, as stated, the line numbers in the kernel message are somewhat off
due to added debugging code. What kicks in here is this one:

        if (!c->td_desc_seen) {
                desc_phys = cppi41_pop_desc(cdd, c->q_comp_num);
                if (desc_phys) {
                        __iormb();
                        WARN_ON(c->desc_phys != desc_phys);
                        c->td_desc_seen = 1;
                }
        }

> So you get the warning because
> you tried to stop a channel which was not busy. But then you should not
> be called at all because cppi41_dma_channel_abort() shouldn't call dma
> driver on idle channels.

However, I see nothing that forbids you from calling
dmaengine_terminate_all() on idle channels. If that's not handled
properly by the cppi driver, I'd say it needs fixing.

> How does your suspend & resume thingy work? Is it completly shutdown
> i.e. powered off? According to you earlier patches I would assume so. In
> that case the request is not enqueued and there is nothing to be removed
> from the engine, right?

No, my debugging showed that the channel has actually been prepared and
submitted before. It's just being torn down shortly after that. That's
what makes be believe in a race condition here.

> With the change you somehow get an interrupt that cleans up that slot.

Timing, I presume.

> The whole thing has been tested by manipulating the USB storage driver
> to enqueue more / less data then required by the protocol leading to a
> stall followed by an abort of the transfer. Let me re-do your suspend
> with the patches you made so far to check what is going on and if the
> "normal" transfer cancel is still working.

Ok, that sounds good.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down
  2013-10-02 10:25           ` Sebastian Andrzej Siewior
@ 2013-10-02 11:38             ` Daniel Mack
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Mack @ 2013-10-02 11:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

On 02.10.2013 12:25, Sebastian Andrzej Siewior wrote:
> On 10/02/2013 11:19 AM, Daniel Mack wrote:

>> Once the system is booted up and the USB media is detected, I send the
>> system to sleep mode with "cat mem >/sys/power/state". After wakeup, I
>> access the media by mounting and unmounting it once, then send the
>> system back to sleep.
> 
> Okay. Going to sleep is probably easy, I need to figure out how to
> wakeup…

Unless you pass no_console_suspend in your cmdline, you can just wake up
the system via UART0. IOW, just press enter on the serial console.


Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
  2013-10-02 11:07     ` Daniel Mack
@ 2013-10-02 12:57       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-02 12:57 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

On 10/02/2013 01:07 PM, Daniel Mack wrote:
> No, as stated, the line numbers in the kernel message are somewhat off
> due to added debugging code. What kicks in here is this one:
> 
>         if (!c->td_desc_seen) {
>                 desc_phys = cppi41_pop_desc(cdd, c->q_comp_num);
>                 if (desc_phys) {
>                         __iormb();
>                         WARN_ON(c->desc_phys != desc_phys);
>                         c->td_desc_seen = 1;
>                 }
>         }

Ach okay. So something completed but it wasn't the expected descriptor.

>> So you get the warning because
>> you tried to stop a channel which was not busy. But then you should not
>> be called at all because cppi41_dma_channel_abort() shouldn't call dma
>> driver on idle channels.
> 
> However, I see nothing that forbids you from calling
> dmaengine_terminate_all() on idle channels. If that's not handled
> properly by the cppi driver, I'd say it needs fixing.

No argue about that.

>> How does your suspend & resume thingy work? Is it completly shutdown
>> i.e. powered off? According to you earlier patches I would assume so. In
>> that case the request is not enqueued and there is nothing to be removed
>> from the engine, right?
> 
> No, my debugging showed that the channel has actually been prepared and
> submitted before. It's just being torn down shortly after that. That's
> what makes be believe in a race condition here.

I see.

> Thanks,
> Daniel

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
  2013-10-01 13:31 [PATCH 0/3] dma: cppi41: more suspend/resume patches Daniel Mack
       [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-10-02 10:20 ` [PATCH 0/3] dma: cppi41: more suspend/resume patches Sebastian Andrzej Siewior
@ 2013-10-09  6:41 ` Sebastian Andrzej Siewior
  2013-10-09  7:23   ` Daniel Mack
  2 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-09  6:41 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

* Daniel Mack | 2013-10-01 15:31:08 [+0200]:

>Patch #1 restores more registers on resume time.
>
>Patch #2 is a cosmetic cleanup that emerged while digging through the
>driver and gaining a basic idea of how it's implemented. Nothing fancy.

I'm fine with those two.

>
>Patch #3, however, gives me headaches. I can't fully explain what's
>going on, but I can tell for sure that if fixes a problem that I stared
>on for many hours.

I'm still trying to verify if it breaks something or not. So I haven't
forgotten about this.

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
  2013-10-09  6:41 ` Sebastian Andrzej Siewior
@ 2013-10-09  7:23   ` Daniel Mack
       [not found]     ` <5255047A.5010609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Mack @ 2013-10-09  7:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

On 09.10.2013 08:41, Sebastian Andrzej Siewior wrote:
> * Daniel Mack | 2013-10-01 15:31:08 [+0200]:
> 
>> Patch #1 restores more registers on resume time.
>>
>> Patch #2 is a cosmetic cleanup that emerged while digging through the
>> driver and gaining a basic idea of how it's implemented. Nothing fancy.
> 
> I'm fine with those two.
> 
>>
>> Patch #3, however, gives me headaches. I can't fully explain what's
>> going on, but I can tell for sure that if fixes a problem that I stared
>> on for many hours.
> 
> I'm still trying to verify if it breaks something or not. So I haven't
> forgotten about this.

Ok, thank you very much for the update :) I can of course test
alternative patches if you have any.

Could you actually reproduce the issue I described by sending your board
to suspend?


Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
       [not found]     ` <5255047A.5010609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-09  7:28       ` Sebastian Andrzej Siewior
  2013-10-09  7:31         ` Daniel Mack
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-09  7:28 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA, neumann-SRDuVqtxQLSzQB+pC5nmwQ,
	vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w, balbi-l0cyMroinI0

On 10/09/2013 09:23 AM, Daniel Mack wrote:
> Ok, thank you very much for the update :) I can of course test
> alternative patches if you have any.
> 
> Could you actually reproduce the issue I described by sending your board
> to suspend?

No, I don't have "mem", just "freeze". I try to test if this is a
regression compared to my earlier testing. If not then it looks good I
would say.

> 
> 
> Daniel
> 
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dma: cppi41: more suspend/resume patches
  2013-10-09  7:28       ` Sebastian Andrzej Siewior
@ 2013-10-09  7:31         ` Daniel Mack
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Mack @ 2013-10-09  7:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-usb, linux-omap, neumann, vinod.koul, dan.j.williams, balbi

On 09.10.2013 09:28, Sebastian Andrzej Siewior wrote:
> On 10/09/2013 09:23 AM, Daniel Mack wrote:
>> Ok, thank you very much for the update :) I can of course test
>> alternative patches if you have any.
>>
>> Could you actually reproduce the issue I described by sending your board
>> to suspend?
> 
> No, I don't have "mem", just "freeze".

Sounds like you need to update your cm3 firmware. I built mine from this
tree:

      git://arago-project.org/git/projects/am33x-cm3.git
      Branch "next3"
      AM335xPSP_04.06.00.08-141-g1628306

I can also send you my binary in PM if you want me to.

> I try to test if this is a
> regression compared to my earlier testing. If not then it looks good I
> would say.

Ok.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-10-09  7:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01 13:31 [PATCH 0/3] dma: cppi41: more suspend/resume patches Daniel Mack
     [not found] ` <1380634271-27588-1-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-01 13:31   ` [PATCH 1/3] dma: cppi41: restore more registers Daniel Mack
2013-10-01 13:31   ` [PATCH 2/3] dma: cppi41: use cppi41_pop_desc() where possible Daniel Mack
2013-10-01 13:31   ` [PATCH 3/3] dma: cppi41: move -EAGAIN in tear_down Daniel Mack
     [not found]     ` <1380634271-27588-4-git-send-email-zonque-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-02  8:29       ` Sebastian Andrzej Siewior
2013-10-02  9:19         ` Daniel Mack
2013-10-02 10:25           ` Sebastian Andrzej Siewior
2013-10-02 11:38             ` Daniel Mack
2013-10-02 10:20 ` [PATCH 0/3] dma: cppi41: more suspend/resume patches Sebastian Andrzej Siewior
     [not found]   ` <20131002102033.GB16680-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
2013-10-02 11:07     ` Daniel Mack
2013-10-02 12:57       ` Sebastian Andrzej Siewior
2013-10-09  6:41 ` Sebastian Andrzej Siewior
2013-10-09  7:23   ` Daniel Mack
     [not found]     ` <5255047A.5010609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-09  7:28       ` Sebastian Andrzej Siewior
2013-10-09  7:31         ` Daniel Mack

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).