Runtime PM workqueue killing system performance with USB

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Runtime PM workqueue killing system performance with USB
@ 2014-05-22 10:27 Will Deacon
  2014-05-22 15:02 ` Alan Stern
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2014-05-22 10:27 UTC (permalink / raw)
  To: stern, sarah.a.sharp; +Cc: linux-usb, linux-kernel, khilman

Hi all,

Although I don't think this is a new issue, booting mainline on my vexpress
a9x4 (Quad ARMv7 Cortex-A9 board) with USB and PM_RUNTIME results in each
CPU being constantly 20-50% loaded. A bit of investigation shows that this
is due to the runtime-pm callbacks trying to autosuspend USB, despite this
being unsupported by the host-controller (isp1760, which doesn't have a
->bus_suspend callback). I've included a backtrace at the bottom of this mail.

Anyway, since ->bus_suspend is not implemented, hcd_bus_suspend returns
-ENOENT, which propagates back up to usb_runtime_suspend via usb_suspend_both.
At this point, we override the return value with -EBUSY:

drivers/usb/core/driver.c:usb_runtime_suspend:

	/* The PM core reacts badly unless the return code is 0,
	 * -EAGAIN, or -EBUSY, so always return -EBUSY on an error.
	 */
	if (status != 0)
		return -EBUSY;

This then tells the runtime PM code to try again:

drivers/base/power/runtime.c

	if (retval == -EAGAIN || retval == -EBUSY) {
		dev->power.runtime_error = 0;

		/*
		 * If the callback routine failed an autosuspend, and
		 * if the last_busy time has been updated so that there
		 * is a new autosuspend expiration time, automatically
		 * reschedule another autosuspend.
		 */
		if ((rpmflags & RPM_AUTO) &&
		    pm_runtime_autosuspend_expiration(dev) != 0)
			goto repeat;

Consequently, I see a kworker thread on each CPU consuming a significant
amount of the system resources. Worse, if I enable something like kmemleak
(which adds more work to the failed suspend operation), I end up failing
to boot entirely (NFS bombs out).

Reverting db7c7c0aeef5 ("usb: Always return 0 or -EBUSY to the runtime
PM core.") fixes this for me, but the commit log suggests that will break
lsusb. That patch has also been in for three and a half years...

Any ideas on how to fix this properly? In what ways does the PM core react
badly to -ENOENT?

Cheers,

Will

--->8

[  161.385424] CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.15.0-rc5 #2
[  161.405009] Workqueue: pm pm_runtime_work
[  161.417019] task: ed0bec00 ti: ed470000 task.ti: ed470000
[  161.433195] PC is at kmemleak_free+0x8/0x4c
[  161.445737] LR is at kfree+0xbc/0x154
[  161.456699] pc : [<c05be7e0>]    lr : [<c00ecc14>]    psr: 400f0013
[  161.456699] sp : ed471c58  ip : 0000001c  fp : ed39adc0
[  161.491098] r10: 00000000  r9 : ed59a300  r8 : c09df324
[  161.506743] r7 : c03e7960  r6 : ee59b340  r5 : ed59a300  r4 : ed001f00
[  161.526294] r3 : 0000f0e0  r2 : edff0000  r1 : ed59a300  r0 : ed59a300
[  161.545848] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  161.567743] Control: 10c5387d  Table: 8d5e004a  DAC: 00000015
[  161.584955] CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.15.0-rc5 #2
[  161.604512] Workqueue: pm pm_runtime_work
[  161.616572] [<c0015e90>] (unwind_backtrace) from [<c0011b14>] (show_stack+0x10/0x14)
[  161.639794] [<c0011b14>] (show_stack) from [<c05c1444>] (dump_stack+0x88/0x98)
[  161.661449] [<c05c1444>] (dump_stack) from [<c02b89a8>] (sysrq_handle_showallcpus+0x5c/0x64)
[  161.686744] [<c02b89a8>] (sysrq_handle_showallcpus) from [<c02b8fe4>] (__handle_sysrq+0x128/0x17c)
[  161.713600] [<c02b8fe4>] (__handle_sysrq) from [<c02d2124>] (pl011_fifo_to_tty+0x174/0x1c4)
[  161.738631] [<c02d2124>] (pl011_fifo_to_tty) from [<c02d32f4>] (pl011_int+0x298/0x5a8)
[  161.762362] [<c02d32f4>] (pl011_int) from [<c0086ac8>] (handle_irq_event_percpu+0x78/0x134)
[  161.787393] [<c0086ac8>] (handle_irq_event_percpu) from [<c0086bc4>] (handle_irq_event+0x40/0x60)
[  161.813991] [<c0086bc4>] (handle_irq_event) from [<c0089c30>] (handle_fasteoi_irq+0xa8/0x1a8)
[  161.839562] [<c0089c30>] (handle_fasteoi_irq) from [<c00861c4>] (generic_handle_irq+0x2c/0x3c)
[  161.865377] [<c00861c4>] (generic_handle_irq) from [<c000ef40>] (handle_IRQ+0x40/0x90)
[  161.889105] [<c000ef40>] (handle_IRQ) from [<c00087e8>] (gic_handle_irq+0x2c/0x5c)
[  161.911788] [<c00087e8>] (gic_handle_irq) from [<c0012680>] (__irq_svc+0x40/0x50)
[  161.934204] Exception stack(0xed471c10 to 0xed471c58)
[  161.949333] 1c00:                                     ed59a300 ed59a300 edff0000 0000f0e0
[  161.973840] 1c20: ed001f00 ed59a300 ee59b340 c03e7960 c09df324 ed59a300 00000000 ed39adc0
[  161.998345] 1c40: 0000001c ed471c58 c00ecc14 c05be7e0 400f0013 ffffffff
[  162.018168] [<c0012680>] (__irq_svc) from [<c05be7e0>] (kmemleak_free+0x8/0x4c)
[  162.040082] [<c05be7e0>] (kmemleak_free) from [<c00ecc14>] (kfree+0xbc/0x154)
[  162.061490] [<c00ecc14>] (kfree) from [<c03e7960>] (usb_hcd_submit_urb+0x2d4/0x804)
[  162.084447] [<c03e7960>] (usb_hcd_submit_urb) from [<c03e94b4>] (usb_start_wait_urb+0x4c/0xbc)
[  162.110264] [<c03e94b4>] (usb_start_wait_urb) from [<c03e95c4>] (usb_control_msg+0xa0/0xd0)
[  162.135298] [<c03e95c4>] (usb_control_msg) from [<c03dfb10>] (hub_port_status+0x74/0xfc)
[  162.159548] [<c03dfb10>] (hub_port_status) from [<c03e139c>] (hub_activate+0x164/0x500)
[  162.183537] [<c03e139c>] (hub_activate) from [<c03e179c>] (hub_resume+0x14/0x1c)
[  162.205709] [<c03e179c>] (hub_resume) from [<c03ec474>] (usb_resume_interface.isra.6+0xe8/0x118)
[  162.232041] [<c03ec474>] (usb_resume_interface.isra.6) from [<c03ec560>] (usb_suspend_both+0xbc/0x19c)
[  162.259938] [<c03ec560>] (usb_suspend_both) from [<c03ed418>] (usb_runtime_suspend+0x28/0x58)
[  162.285497] [<c03ed418>] (usb_runtime_suspend) from [<c032888c>] (__rpm_callback+0x38/0x84)
[  162.310530] [<c032888c>] (__rpm_callback) from [<c03288f8>] (rpm_callback+0x20/0x74)
[  162.333739] [<c03288f8>] (rpm_callback) from [<c0328d54>] (rpm_suspend+0xd0/0x4e8)
[  162.356428] [<c0328d54>] (rpm_suspend) from [<c032a048>] (__pm_runtime_suspend+0x70/0x98)
[  162.380939] [<c032a048>] (__pm_runtime_suspend) from [<c03ed480>] (usb_runtime_idle+0x24/0x2c)
[  162.406750] [<c03ed480>] (usb_runtime_idle) from [<c032888c>] (__rpm_callback+0x38/0x84)
[  162.431001] [<c032888c>] (__rpm_callback) from [<c0329320>] (rpm_idle+0xf8/0x19c)
[  162.453428] [<c0329320>] (rpm_idle) from [<c032a16c>] (pm_runtime_work+0x90/0xa0)
[  162.475860] [<c032a16c>] (pm_runtime_work) from [<c005d674>] (process_one_work+0x108/0x374)
[  162.500891] [<c005d674>] (process_one_work) from [<c005e384>] (worker_thread+0x130/0x3e4)
[  162.525412] [<c005e384>] (worker_thread) from [<c0063a30>] (kthread+0xd8/0xf0)
[  162.547063] [<c0063a30>] (kthread) from [<c000e6f8>] (ret_from_fork+0x14/0x3c)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Runtime PM workqueue killing system performance with USB
  2014-05-22 10:27 Runtime PM workqueue killing system performance with USB Will Deacon
@ 2014-05-22 15:02 ` Alan Stern
  2014-05-22 17:39   ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2014-05-22 15:02 UTC (permalink / raw)
  To: Will Deacon; +Cc: sarah.a.sharp, linux-usb, linux-kernel, khilman

On Thu, 22 May 2014, Will Deacon wrote:

> Hi all,
> 
> Although I don't think this is a new issue, booting mainline on my vexpress
> a9x4 (Quad ARMv7 Cortex-A9 board) with USB and PM_RUNTIME results in each
> CPU being constantly 20-50% loaded. A bit of investigation shows that this
> is due to the runtime-pm callbacks trying to autosuspend USB, despite this
5B> being unsupported by the host-controller (isp1760, which doesn't 
have a
> ->bus_suspend callback). I've included a backtrace at the bottom of this mail.
> 
> Anyway, since ->bus_suspend is not implemented, hcd_bus_suspend returns
> -ENOENT, which propagates back up to usb_runtime_suspend via usb_suspend_both.
> At this point, we override the return value with -EBUSY:
> 
> drivers/usb/core/driver.c:usb_runtime_suspend:
> 
> 	/* The PM core reacts badly unless the return code is 0,
> 	 * -EAGAIN, or -EBUSY, so always return -EBUSY on an error.
> 	 */
> 	if (status != 0)
> 		return -EBUSY;
> 
> This then tells the runtime PM code to try again:
> 
> drivers/base/power/runtime.c
> 
> 	if (retval == -EAGAIN || retval == -EBUSY) {
> 		dev->power.runtime_error = 0;
> 
> 		/*
> 		 * If the callback routine failed an autosuspend, and
> 		 * if the last_busy time has been updated so that there
> 		 * is a new autosuspend expiration time, automatically
> 		 * reschedule another autosuspend.
> 		 */
> 		if ((rpmflags & RPM_AUTO) &&
> 		    pm_runtime_autosuspend_expiration(dev) != 0)
> 			goto repeat;
> 
> Consequently, I see a kworker thread on each CPU consuming a significant
> amount of the system resources. Worse, if I enable something like kmemleak
> (which adds more work to the failed suspend operation), I end up failing
> to boot entirely (NFS bombs out).
> 
> Reverting db7c7c0aeef5 ("usb: Always return 0 or -EBUSY to the runtime
> PM core.") fixes this for me, but the commit log suggests that will break
> lsusb. That patch has also been in for three and a half years...
> 
> Any ideas on how to fix this properly? In what ways does the PM core react
> badly to -ENOENT?

Okay, this is a bad problem.

The PM core takes any error response other than -EBUSY or -EAGAIN as an 
indication that the device is in a runtime-PM error state.  While that 
is appropriate for a USB device, perhaps it's not so appropriate for a 
USB host controller.

Anyway, there are two possible ways of handling this.  One is to avoid 
changing the error code to -EBUSY when the device in question is a root 
hub.  Just let it go into a runtime-PM error state; it won't matter 
since the controller doesn't support runtime PM anyway.  You can test 
this by changing the "if (status != 0)" line in usb_runtime_suspend to

	if (status != 0 && udev->parent)

The other approach is to disable runtime PM for the root hub when the 
host controller driver doesn't have a bus_suspend or bus_resume method.
This seems like a cleaner approach; the patch below implements it.

Alan Stern



Index: usb-3.15/drivers/usb/core/hub.c
===================================================================
--- usb-3.15.orig/drivers/usb/core/hub.c
+++ usb-3.15/drivers/usb/core/hub.c
@@ -1703,8 +1703,19 @@ static int hub_probe(struct usb_interfac
 	 */
 	pm_runtime_set_autosuspend_delay(&hdev->dev, 0);
 
-	/* Hubs have proper suspend/resume support. */
-	usb_enable_autosuspend(hdev);
+	/*
+	 * Hubs have proper suspend/resume support, except for root hubs
+	 * where the controller driver doesn't have bus_suspend and
+	 * bus_resume methods.
+	 */
+	if (hdev->parent) {		/* normal device */
+		usb_enable_autosuspend(hdev);
+	} else {			/* root hub */
+		const struct hc_driver *drv = bus_to_hcd(hdev->bus)->driver;
+
+		if (drv->bus_suspend && drv->bus_resume)
+			usb_enable_autosuspend(hdev);
+	}
 
 	if (hdev->level == MAX_TOPO_LEVEL) {
 		dev_err(&intf->dev,


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Runtime PM workqueue killing system performance with USB
  2014-05-22 15:02 ` Alan Stern
@ 2014-05-22 17:39   ` Will Deacon
  2014-05-22 18:14     ` Alan Stern
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2014-05-22 17:39 UTC (permalink / raw)
  To: Alan Stern
  Cc: sarah.a.sharp@linux.intel.com, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, khilman@linaro.org

Hi Alan,

On Thu, May 22, 2014 at 04:02:06PM +0100, Alan Stern wrote:
> On Thu, 22 May 2014, Will Deacon wrote:
> > Consequently, I see a kworker thread on each CPU consuming a significant
> > amount of the system resources. Worse, if I enable something like kmemleak
> > (which adds more work to the failed suspend operation), I end up failing
> > to boot entirely (NFS bombs out).
> > 
> > Reverting db7c7c0aeef5 ("usb: Always return 0 or -EBUSY to the runtime
> > PM core.") fixes this for me, but the commit log suggests that will break
> > lsusb. That patch has also been in for three and a half years...
> > 
> > Any ideas on how to fix this properly? In what ways does the PM core react
> > badly to -ENOENT?
> 
> Okay, this is a bad problem.
> 
> The PM core takes any error response other than -EBUSY or -EAGAIN as an 
> indication that the device is in a runtime-PM error state.  While that 
> is appropriate for a USB device, perhaps it's not so appropriate for a 
> USB host controller.
> 
> Anyway, there are two possible ways of handling this.  One is to avoid 
> changing the error code to -EBUSY when the device in question is a root 
> hub.  Just let it go into a runtime-PM error state; it won't matter 
> since the controller doesn't support runtime PM anyway.  You can test 
> this by changing the "if (status != 0)" line in usb_runtime_suspend to
> 
> 	if (status != 0 && udev->parent)

I'd tried something like this already, but I prefer your patch below. Plus,
this hack results in a failure being logged to dmesg on the initial suspend
attempt.

> The other approach is to disable runtime PM for the root hub when the 
> host controller driver doesn't have a bus_suspend or bus_resume method.
> This seems like a cleaner approach; the patch below implements it.

Thanks for this! I can confirm that your patch below fixes the issue for me,
so:

  Reported-by: Will Deacon <will.deacon@arm.com>
  Tested-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

> Index: usb-3.15/drivers/usb/core/hub.c
> ===================================================================
> --- usb-3.15.orig/drivers/usb/core/hub.c
> +++ usb-3.15/drivers/usb/core/hub.c
> @@ -1703,8 +1703,19 @@ static int hub_probe(struct usb_interfac
>  	 */
>  	pm_runtime_set_autosuspend_delay(&hdev->dev, 0);
>  
> -	/* Hubs have proper suspend/resume support. */
> -	usb_enable_autosuspend(hdev);
> +	/*
> +	 * Hubs have proper suspend/resume support, except for root hubs
> +	 * where the controller driver doesn't have bus_suspend and
> +	 * bus_resume methods.
> +	 */
> +	if (hdev->parent) {		/* normal device */
> +		usb_enable_autosuspend(hdev);
> +	} else {			/* root hub */
> +		const struct hc_driver *drv = bus_to_hcd(hdev->bus)->driver;
> +
> +		if (drv->bus_suspend && drv->bus_resume)
> +			usb_enable_autosuspend(hdev);
> +	}
>  
>  	if (hdev->level == MAX_TOPO_LEVEL) {
>  		dev_err(&intf->dev,
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Runtime PM workqueue killing system performance with USB
  2014-05-22 17:39   ` Will Deacon
@ 2014-05-22 18:14     ` Alan Stern
  2014-05-23 14:32       ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2014-05-22 18:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: sarah.a.sharp@linux.intel.com, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, khilman@linaro.org

On Thu, 22 May 2014, Will Deacon wrote:

> > Anyway, there are two possible ways of handling this.  One is to avoid 
> > changing the error code to -EBUSY when the device in question is a root 
> > hub.  Just let it go into a runtime-PM error state; it won't matter 
> > since the controller doesn't support runtime PM anyway.  You can test 
> > this by changing the "if (status != 0)" line in usb_runtime_suspend to
> > 
> > 	if (status != 0 && udev->parent)
> 
> I'd tried something like this already, but I prefer your patch below. Plus,
> this hack results in a failure being logged to dmesg on the initial suspend
> attempt.
> 
> > The other approach is to disable runtime PM for the root hub when the 
> > host controller driver doesn't have a bus_suspend or bus_resume method.
> > This seems like a cleaner approach; the patch below implements it.
> 
> Thanks for this! I can confirm that your patch below fixes the issue for me,
> so:
> 
>   Reported-by: Will Deacon <will.deacon@arm.com>
>   Tested-by: Will Deacon <will.deacon@arm.com>

You know, I think it might be best to make both changes.  Even though 
runtime PM will be disabled by default, the user can always override 
this setting.  If that happens, the suspend should fail with the proper 
error code instead of going into a loop.

Do you mind if I add the change to usb_runtime_suspend() in the patch?

Alan Stern


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Runtime PM workqueue killing system performance with USB
  2014-05-22 18:14     ` Alan Stern
@ 2014-05-23 14:32       ` Will Deacon
  0 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2014-05-23 14:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: sarah.a.sharp@linux.intel.com, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, khilman@linaro.org

On Thu, May 22, 2014 at 07:14:40PM +0100, Alan Stern wrote:
> On Thu, 22 May 2014, Will Deacon wrote:
> 
> > > Anyway, there are two possible ways of handling this.  One is to avoid 
> > > changing the error code to -EBUSY when the device in question is a root 
> > > hub.  Just let it go into a runtime-PM error state; it won't matter 
> > > since the controller doesn't support runtime PM anyway.  You can test 
> > > this by changing the "if (status != 0)" line in usb_runtime_suspend to
> > > 
> > > 	if (status != 0 && udev->parent)
> > 
> > I'd tried something like this already, but I prefer your patch below. Plus,
> > this hack results in a failure being logged to dmesg on the initial suspend
> > attempt.
> > 
> > > The other approach is to disable runtime PM for the root hub when the 
> > > host controller driver doesn't have a bus_suspend or bus_resume method.
> > > This seems like a cleaner approach; the patch below implements it.
> > 
> > Thanks for this! I can confirm that your patch below fixes the issue for me,
> > so:
> > 
> >   Reported-by: Will Deacon <will.deacon@arm.com>
> >   Tested-by: Will Deacon <will.deacon@arm.com>
> 
> You know, I think it might be best to make both changes.  Even though 
> runtime PM will be disabled by default, the user can always override 
> this setting.  If that happens, the suspend should fail with the proper 
> error code instead of going into a loop.
> 
> Do you mind if I add the change to usb_runtime_suspend() in the patch?

That sounds sensible -- if runtime PM is being forced on, then errors are
worth reporting.

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-23 14:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-22 10:27 Runtime PM workqueue killing system performance with USB Will Deacon
2014-05-22 15:02 ` Alan Stern
2014-05-22 17:39   ` Will Deacon
2014-05-22 18:14     ` Alan Stern
2014-05-23 14:32       ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox