* Runtime PM workqueue killing system performance with USB
@ 2014-05-22 10:27 Will Deacon
2014-05-22 15:02 ` Alan Stern
0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2014-05-22 10:27 UTC (permalink / raw)
To: stern, sarah.a.sharp; +Cc: linux-usb, linux-kernel, khilman
Hi all,
Although I don't think this is a new issue, booting mainline on my vexpress
a9x4 (Quad ARMv7 Cortex-A9 board) with USB and PM_RUNTIME results in each
CPU being constantly 20-50% loaded. A bit of investigation shows that this
is due to the runtime-pm callbacks trying to autosuspend USB, despite this
being unsupported by the host-controller (isp1760, which doesn't have a
->bus_suspend callback). I've included a backtrace at the bottom of this mail.
Anyway, since ->bus_suspend is not implemented, hcd_bus_suspend returns
-ENOENT, which propagates back up to usb_runtime_suspend via usb_suspend_both.
At this point, we override the return value with -EBUSY:
drivers/usb/core/driver.c:usb_runtime_suspend:
/* The PM core reacts badly unless the return code is 0,
* -EAGAIN, or -EBUSY, so always return -EBUSY on an error.
*/
if (status != 0)
return -EBUSY;
This then tells the runtime PM code to try again:
drivers/base/power/runtime.c
if (retval == -EAGAIN || retval == -EBUSY) {
dev->power.runtime_error = 0;
/*
* If the callback routine failed an autosuspend, and
* if the last_busy time has been updated so that there
* is a new autosuspend expiration time, automatically
* reschedule another autosuspend.
*/
if ((rpmflags & RPM_AUTO) &&
pm_runtime_autosuspend_expiration(dev) != 0)
goto repeat;
Consequently, I see a kworker thread on each CPU consuming a significant
amount of the system resources. Worse, if I enable something like kmemleak
(which adds more work to the failed suspend operation), I end up failing
to boot entirely (NFS bombs out).
Reverting db7c7c0aeef5 ("usb: Always return 0 or -EBUSY to the runtime
PM core.") fixes this for me, but the commit log suggests that will break
lsusb. That patch has also been in for three and a half years...
Any ideas on how to fix this properly? In what ways does the PM core react
badly to -ENOENT?
Cheers,
Will
--->8
[ 161.385424] CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.15.0-rc5 #2
[ 161.405009] Workqueue: pm pm_runtime_work
[ 161.417019] task: ed0bec00 ti: ed470000 task.ti: ed470000
[ 161.433195] PC is at kmemleak_free+0x8/0x4c
[ 161.445737] LR is at kfree+0xbc/0x154
[ 161.456699] pc : [<c05be7e0>] lr : [<c00ecc14>] psr: 400f0013
[ 161.456699] sp : ed471c58 ip : 0000001c fp : ed39adc0
[ 161.491098] r10: 00000000 r9 : ed59a300 r8 : c09df324
[ 161.506743] r7 : c03e7960 r6 : ee59b340 r5 : ed59a300 r4 : ed001f00
[ 161.526294] r3 : 0000f0e0 r2 : edff0000 r1 : ed59a300 r0 : ed59a300
[ 161.545848] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 161.567743] Control: 10c5387d Table: 8d5e004a DAC: 00000015
[ 161.584955] CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.15.0-rc5 #2
[ 161.604512] Workqueue: pm pm_runtime_work
[ 161.616572] [<c0015e90>] (unwind_backtrace) from [<c0011b14>] (show_stack+0x10/0x14)
[ 161.639794] [<c0011b14>] (show_stack) from [<c05c1444>] (dump_stack+0x88/0x98)
[ 161.661449] [<c05c1444>] (dump_stack) from [<c02b89a8>] (sysrq_handle_showallcpus+0x5c/0x64)
[ 161.686744] [<c02b89a8>] (sysrq_handle_showallcpus) from [<c02b8fe4>] (__handle_sysrq+0x128/0x17c)
[ 161.713600] [<c02b8fe4>] (__handle_sysrq) from [<c02d2124>] (pl011_fifo_to_tty+0x174/0x1c4)
[ 161.738631] [<c02d2124>] (pl011_fifo_to_tty) from [<c02d32f4>] (pl011_int+0x298/0x5a8)
[ 161.762362] [<c02d32f4>] (pl011_int) from [<c0086ac8>] (handle_irq_event_percpu+0x78/0x134)
[ 161.787393] [<c0086ac8>] (handle_irq_event_percpu) from [<c0086bc4>] (handle_irq_event+0x40/0x60)
[ 161.813991] [<c0086bc4>] (handle_irq_event) from [<c0089c30>] (handle_fasteoi_irq+0xa8/0x1a8)
[ 161.839562] [<c0089c30>] (handle_fasteoi_irq) from [<c00861c4>] (generic_handle_irq+0x2c/0x3c)
[ 161.865377] [<c00861c4>] (generic_handle_irq) from [<c000ef40>] (handle_IRQ+0x40/0x90)
[ 161.889105] [<c000ef40>] (handle_IRQ) from [<c00087e8>] (gic_handle_irq+0x2c/0x5c)
[ 161.911788] [<c00087e8>] (gic_handle_irq) from [<c0012680>] (__irq_svc+0x40/0x50)
[ 161.934204] Exception stack(0xed471c10 to 0xed471c58)
[ 161.949333] 1c00: ed59a300 ed59a300 edff0000 0000f0e0
[ 161.973840] 1c20: ed001f00 ed59a300 ee59b340 c03e7960 c09df324 ed59a300 00000000 ed39adc0
[ 161.998345] 1c40: 0000001c ed471c58 c00ecc14 c05be7e0 400f0013 ffffffff
[ 162.018168] [<c0012680>] (__irq_svc) from [<c05be7e0>] (kmemleak_free+0x8/0x4c)
[ 162.040082] [<c05be7e0>] (kmemleak_free) from [<c00ecc14>] (kfree+0xbc/0x154)
[ 162.061490] [<c00ecc14>] (kfree) from [<c03e7960>] (usb_hcd_submit_urb+0x2d4/0x804)
[ 162.084447] [<c03e7960>] (usb_hcd_submit_urb) from [<c03e94b4>] (usb_start_wait_urb+0x4c/0xbc)
[ 162.110264] [<c03e94b4>] (usb_start_wait_urb) from [<c03e95c4>] (usb_control_msg+0xa0/0xd0)
[ 162.135298] [<c03e95c4>] (usb_control_msg) from [<c03dfb10>] (hub_port_status+0x74/0xfc)
[ 162.159548] [<c03dfb10>] (hub_port_status) from [<c03e139c>] (hub_activate+0x164/0x500)
[ 162.183537] [<c03e139c>] (hub_activate) from [<c03e179c>] (hub_resume+0x14/0x1c)
[ 162.205709] [<c03e179c>] (hub_resume) from [<c03ec474>] (usb_resume_interface.isra.6+0xe8/0x118)
[ 162.232041] [<c03ec474>] (usb_resume_interface.isra.6) from [<c03ec560>] (usb_suspend_both+0xbc/0x19c)
[ 162.259938] [<c03ec560>] (usb_suspend_both) from [<c03ed418>] (usb_runtime_suspend+0x28/0x58)
[ 162.285497] [<c03ed418>] (usb_runtime_suspend) from [<c032888c>] (__rpm_callback+0x38/0x84)
[ 162.310530] [<c032888c>] (__rpm_callback) from [<c03288f8>] (rpm_callback+0x20/0x74)
[ 162.333739] [<c03288f8>] (rpm_callback) from [<c0328d54>] (rpm_suspend+0xd0/0x4e8)
[ 162.356428] [<c0328d54>] (rpm_suspend) from [<c032a048>] (__pm_runtime_suspend+0x70/0x98)
[ 162.380939] [<c032a048>] (__pm_runtime_suspend) from [<c03ed480>] (usb_runtime_idle+0x24/0x2c)
[ 162.406750] [<c03ed480>] (usb_runtime_idle) from [<c032888c>] (__rpm_callback+0x38/0x84)
[ 162.431001] [<c032888c>] (__rpm_callback) from [<c0329320>] (rpm_idle+0xf8/0x19c)
[ 162.453428] [<c0329320>] (rpm_idle) from [<c032a16c>] (pm_runtime_work+0x90/0xa0)
[ 162.475860] [<c032a16c>] (pm_runtime_work) from [<c005d674>] (process_one_work+0x108/0x374)
[ 162.500891] [<c005d674>] (process_one_work) from [<c005e384>] (worker_thread+0x130/0x3e4)
[ 162.525412] [<c005e384>] (worker_thread) from [<c0063a30>] (kthread+0xd8/0xf0)
[ 162.547063] [<c0063a30>] (kthread) from [<c000e6f8>] (ret_from_fork+0x14/0x3c)
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Runtime PM workqueue killing system performance with USB
2014-05-22 10:27 Runtime PM workqueue killing system performance with USB Will Deacon
@ 2014-05-22 15:02 ` Alan Stern
2014-05-22 17:39 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2014-05-22 15:02 UTC (permalink / raw)
To: Will Deacon; +Cc: sarah.a.sharp, linux-usb, linux-kernel, khilman
On Thu, 22 May 2014, Will Deacon wrote:
> Hi all,
>
> Although I don't think this is a new issue, booting mainline on my vexpress
> a9x4 (Quad ARMv7 Cortex-A9 board) with USB and PM_RUNTIME results in each
> CPU being constantly 20-50% loaded. A bit of investigation shows that this
> is due to the runtime-pm callbacks trying to autosuspend USB, despite this
5B> being unsupported by the host-controller (isp1760, which doesn't
have a
> ->bus_suspend callback). I've included a backtrace at the bottom of this mail.
>
> Anyway, since ->bus_suspend is not implemented, hcd_bus_suspend returns
> -ENOENT, which propagates back up to usb_runtime_suspend via usb_suspend_both.
> At this point, we override the return value with -EBUSY:
>
> drivers/usb/core/driver.c:usb_runtime_suspend:
>
> /* The PM core reacts badly unless the return code is 0,
> * -EAGAIN, or -EBUSY, so always return -EBUSY on an error.
> */
> if (status != 0)
> return -EBUSY;
>
> This then tells the runtime PM code to try again:
>
> drivers/base/power/runtime.c
>
> if (retval == -EAGAIN || retval == -EBUSY) {
> dev->power.runtime_error = 0;
>
> /*
> * If the callback routine failed an autosuspend, and
> * if the last_busy time has been updated so that there
> * is a new autosuspend expiration time, automatically
> * reschedule another autosuspend.
> */
> if ((rpmflags & RPM_AUTO) &&
> pm_runtime_autosuspend_expiration(dev) != 0)
> goto repeat;
>
> Consequently, I see a kworker thread on each CPU consuming a significant
> amount of the system resources. Worse, if I enable something like kmemleak
> (which adds more work to the failed suspend operation), I end up failing
> to boot entirely (NFS bombs out).
>
> Reverting db7c7c0aeef5 ("usb: Always return 0 or -EBUSY to the runtime
> PM core.") fixes this for me, but the commit log suggests that will break
> lsusb. That patch has also been in for three and a half years...
>
> Any ideas on how to fix this properly? In what ways does the PM core react
> badly to -ENOENT?
Okay, this is a bad problem.
The PM core takes any error response other than -EBUSY or -EAGAIN as an
indication that the device is in a runtime-PM error state. While that
is appropriate for a USB device, perhaps it's not so appropriate for a
USB host controller.
Anyway, there are two possible ways of handling this. One is to avoid
changing the error code to -EBUSY when the device in question is a root
hub. Just let it go into a runtime-PM error state; it won't matter
since the controller doesn't support runtime PM anyway. You can test
this by changing the "if (status != 0)" line in usb_runtime_suspend to
if (status != 0 && udev->parent)
The other approach is to disable runtime PM for the root hub when the
host controller driver doesn't have a bus_suspend or bus_resume method.
This seems like a cleaner approach; the patch below implements it.
Alan Stern
Index: usb-3.15/drivers/usb/core/hub.c
===================================================================
--- usb-3.15.orig/drivers/usb/core/hub.c
+++ usb-3.15/drivers/usb/core/hub.c
@@ -1703,8 +1703,19 @@ static int hub_probe(struct usb_interfac
*/
pm_runtime_set_autosuspend_delay(&hdev->dev, 0);
- /* Hubs have proper suspend/resume support. */
- usb_enable_autosuspend(hdev);
+ /*
+ * Hubs have proper suspend/resume support, except for root hubs
+ * where the controller driver doesn't have bus_suspend and
+ * bus_resume methods.
+ */
+ if (hdev->parent) { /* normal device */
+ usb_enable_autosuspend(hdev);
+ } else { /* root hub */
+ const struct hc_driver *drv = bus_to_hcd(hdev->bus)->driver;
+
+ if (drv->bus_suspend && drv->bus_resume)
+ usb_enable_autosuspend(hdev);
+ }
if (hdev->level == MAX_TOPO_LEVEL) {
dev_err(&intf->dev,
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Runtime PM workqueue killing system performance with USB
2014-05-22 15:02 ` Alan Stern
@ 2014-05-22 17:39 ` Will Deacon
2014-05-22 18:14 ` Alan Stern
0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2014-05-22 17:39 UTC (permalink / raw)
To: Alan Stern
Cc: sarah.a.sharp@linux.intel.com, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, khilman@linaro.org
Hi Alan,
On Thu, May 22, 2014 at 04:02:06PM +0100, Alan Stern wrote:
> On Thu, 22 May 2014, Will Deacon wrote:
> > Consequently, I see a kworker thread on each CPU consuming a significant
> > amount of the system resources. Worse, if I enable something like kmemleak
> > (which adds more work to the failed suspend operation), I end up failing
> > to boot entirely (NFS bombs out).
> >
> > Reverting db7c7c0aeef5 ("usb: Always return 0 or -EBUSY to the runtime
> > PM core.") fixes this for me, but the commit log suggests that will break
> > lsusb. That patch has also been in for three and a half years...
> >
> > Any ideas on how to fix this properly? In what ways does the PM core react
> > badly to -ENOENT?
>
> Okay, this is a bad problem.
>
> The PM core takes any error response other than -EBUSY or -EAGAIN as an
> indication that the device is in a runtime-PM error state. While that
> is appropriate for a USB device, perhaps it's not so appropriate for a
> USB host controller.
>
> Anyway, there are two possible ways of handling this. One is to avoid
> changing the error code to -EBUSY when the device in question is a root
> hub. Just let it go into a runtime-PM error state; it won't matter
> since the controller doesn't support runtime PM anyway. You can test
> this by changing the "if (status != 0)" line in usb_runtime_suspend to
>
> if (status != 0 && udev->parent)
I'd tried something like this already, but I prefer your patch below. Plus,
this hack results in a failure being logged to dmesg on the initial suspend
attempt.
> The other approach is to disable runtime PM for the root hub when the
> host controller driver doesn't have a bus_suspend or bus_resume method.
> This seems like a cleaner approach; the patch below implements it.
Thanks for this! I can confirm that your patch below fixes the issue for me,
so:
Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cheers,
Will
> Index: usb-3.15/drivers/usb/core/hub.c
> ===================================================================
> --- usb-3.15.orig/drivers/usb/core/hub.c
> +++ usb-3.15/drivers/usb/core/hub.c
> @@ -1703,8 +1703,19 @@ static int hub_probe(struct usb_interfac
> */
> pm_runtime_set_autosuspend_delay(&hdev->dev, 0);
>
> - /* Hubs have proper suspend/resume support. */
> - usb_enable_autosuspend(hdev);
> + /*
> + * Hubs have proper suspend/resume support, except for root hubs
> + * where the controller driver doesn't have bus_suspend and
> + * bus_resume methods.
> + */
> + if (hdev->parent) { /* normal device */
> + usb_enable_autosuspend(hdev);
> + } else { /* root hub */
> + const struct hc_driver *drv = bus_to_hcd(hdev->bus)->driver;
> +
> + if (drv->bus_suspend && drv->bus_resume)
> + usb_enable_autosuspend(hdev);
> + }
>
> if (hdev->level == MAX_TOPO_LEVEL) {
> dev_err(&intf->dev,
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Runtime PM workqueue killing system performance with USB
2014-05-22 17:39 ` Will Deacon
@ 2014-05-22 18:14 ` Alan Stern
2014-05-23 14:32 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2014-05-22 18:14 UTC (permalink / raw)
To: Will Deacon
Cc: sarah.a.sharp@linux.intel.com, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, khilman@linaro.org
On Thu, 22 May 2014, Will Deacon wrote:
> > Anyway, there are two possible ways of handling this. One is to avoid
> > changing the error code to -EBUSY when the device in question is a root
> > hub. Just let it go into a runtime-PM error state; it won't matter
> > since the controller doesn't support runtime PM anyway. You can test
> > this by changing the "if (status != 0)" line in usb_runtime_suspend to
> >
> > if (status != 0 && udev->parent)
>
> I'd tried something like this already, but I prefer your patch below. Plus,
> this hack results in a failure being logged to dmesg on the initial suspend
> attempt.
>
> > The other approach is to disable runtime PM for the root hub when the
> > host controller driver doesn't have a bus_suspend or bus_resume method.
> > This seems like a cleaner approach; the patch below implements it.
>
> Thanks for this! I can confirm that your patch below fixes the issue for me,
> so:
>
> Reported-by: Will Deacon <will.deacon@arm.com>
> Tested-by: Will Deacon <will.deacon@arm.com>
You know, I think it might be best to make both changes. Even though
runtime PM will be disabled by default, the user can always override
this setting. If that happens, the suspend should fail with the proper
error code instead of going into a loop.
Do you mind if I add the change to usb_runtime_suspend() in the patch?
Alan Stern
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Runtime PM workqueue killing system performance with USB
2014-05-22 18:14 ` Alan Stern
@ 2014-05-23 14:32 ` Will Deacon
0 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2014-05-23 14:32 UTC (permalink / raw)
To: Alan Stern
Cc: sarah.a.sharp@linux.intel.com, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, khilman@linaro.org
On Thu, May 22, 2014 at 07:14:40PM +0100, Alan Stern wrote:
> On Thu, 22 May 2014, Will Deacon wrote:
>
> > > Anyway, there are two possible ways of handling this. One is to avoid
> > > changing the error code to -EBUSY when the device in question is a root
> > > hub. Just let it go into a runtime-PM error state; it won't matter
> > > since the controller doesn't support runtime PM anyway. You can test
> > > this by changing the "if (status != 0)" line in usb_runtime_suspend to
> > >
> > > if (status != 0 && udev->parent)
> >
> > I'd tried something like this already, but I prefer your patch below. Plus,
> > this hack results in a failure being logged to dmesg on the initial suspend
> > attempt.
> >
> > > The other approach is to disable runtime PM for the root hub when the
> > > host controller driver doesn't have a bus_suspend or bus_resume method.
> > > This seems like a cleaner approach; the patch below implements it.
> >
> > Thanks for this! I can confirm that your patch below fixes the issue for me,
> > so:
> >
> > Reported-by: Will Deacon <will.deacon@arm.com>
> > Tested-by: Will Deacon <will.deacon@arm.com>
>
> You know, I think it might be best to make both changes. Even though
> runtime PM will be disabled by default, the user can always override
> this setting. If that happens, the suspend should fail with the proper
> error code instead of going into a loop.
>
> Do you mind if I add the change to usb_runtime_suspend() in the patch?
That sounds sensible -- if runtime PM is being forced on, then errors are
worth reporting.
Will
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-23 14:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-22 10:27 Runtime PM workqueue killing system performance with USB Will Deacon
2014-05-22 15:02 ` Alan Stern
2014-05-22 17:39 ` Will Deacon
2014-05-22 18:14 ` Alan Stern
2014-05-23 14:32 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox