linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthias Kaehlcke <mka@chromium.org>
To: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Fabrice Gasnier <fabrice.gasnier@foss.st.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, regressions@lists.linux.dev,
	Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Regression: onboard-usb-hub breaks USB on RPi 3
Date: Thu, 22 Dec 2022 00:55:53 +0000	[thread overview]
Message-ID: <Y6OrGbqaMy2iVDWB@google.com> (raw)
In-Reply-To: <8503cdb5-6089-b9e4-25ff-f3ac294b7a79@i2se.com>

Hi Stefan,

On Wed, Dec 21, 2022 at 10:31:09PM +0100, Stefan Wahren wrote:
> Hi Matthias,
> 
> Am 21.12.22 um 20:02 schrieb Matthias Kaehlcke:
> > Hi Stefan,
> > 
> > On Wed, Dec 21, 2022 at 07:00:41PM +0100, Stefan Wahren wrote:
> > > I will try to play with lock debugging.
> > Thanks, hopefully that can provide some hint.
> > 
> DETECT_HUNG_TASK reveals this in error case:
> 
> [  243.676253] INFO: task kworker/2:1:44 blocked for more than 122 seconds.
> [  243.676284]       Not tainted 6.1.0-00007-g22fada783b9f #32
> [  243.676294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.676303] task:kworker/2:1     state:D stack:0     pid:44 ppid:2     
> flags:0x00000000
> [  243.676329] Workqueue: events onboard_hub_attach_usb_driver
> [onboard_usb_hub]
> [  243.676388]  __schedule from schedule+0x58/0xf8
> [  243.676419]  schedule from schedule_preempt_disabled+0x1c/0x2c
> [  243.676445]  schedule_preempt_disabled from
> __mutex_lock.constprop.0+0x29c/0x948
> [  243.676474]  __mutex_lock.constprop.0 from __driver_attach+0x74/0x188
> [  243.676503]  __driver_attach from bus_for_each_dev+0x70/0xb0
> [  243.676532]  bus_for_each_dev from onboard_hub_attach_usb_driver+0xc/0x28
> [onboard_usb_hub]
> [  243.676587]  onboard_hub_attach_usb_driver [onboard_usb_hub] from
> process_one_work+0x1f8/0x520
> [  243.676637]  process_one_work from worker_thread+0x40/0x55c
> [  243.676663]  worker_thread from kthread+0xf0/0x110
> [  243.676685]  kthread from ret_from_fork+0x14/0x2c
> [  243.676705] Exception stack(0xf091dfb0 to 0xf091dff8)
> [  243.676720] dfa0:                                     00000000 00000000
> 00000000 00000000
> [  243.676737] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000
> [  243.676752] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  243.676788] INFO: task systemd-udevd:148 blocked for more than 122
> seconds.
> [  243.676800]       Not tainted 6.1.0-00007-g22fada783b9f #32
> [  243.676809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.676817] task:systemd-udevd   state:D stack:0     pid:148 ppid:144   
> flags:0x00000081
> [  243.676839]  __schedule from schedule+0x58/0xf8
> [  243.676864]  schedule from schedule_timeout+0xb4/0x15c
> [  243.676893]  schedule_timeout from __wait_for_common+0xc4/0x228
> [  243.676922]  __wait_for_common from __flush_work+0x1a8/0x360
> [  243.676949]  __flush_work from __cancel_work_timer+0x10c/0x1e4
> [  243.676975]  __cancel_work_timer from onboard_hub_remove+0x28/0xbc
> [onboard_usb_hub]
> [  243.677021]  onboard_hub_remove [onboard_usb_hub] from
> platform_remove+0x20/0x4c
> [  243.677067]  platform_remove from
> device_release_driver_internal+0x194/0x21c
> [  243.677092]  device_release_driver_internal from
> bus_remove_device+0xcc/0xf8
> [  243.677124]  bus_remove_device from device_del+0x16c/0x468
> [  243.677159]  device_del from platform_device_del.part.0+0x10/0x74
> [  243.677187]  platform_device_del.part.0 from
> platform_device_unregister+0x18/0x24
> [  243.677216]  platform_device_unregister from
> of_platform_device_destroy+0x98/0xa8
> [  243.677249]  of_platform_device_destroy from
> onboard_hub_destroy_pdevs+0x48/0x6c
> [  243.677287]  onboard_hub_destroy_pdevs from hub_disconnect+0x104/0x174
> [  243.677321]  hub_disconnect from usb_unbind_interface+0x78/0x26c
> [  243.677356]  usb_unbind_interface from
> device_release_driver_internal+0x194/0x21c
> [  243.677388]  device_release_driver_internal from
> bus_remove_device+0xcc/0xf8
> [  243.677419]  bus_remove_device from device_del+0x16c/0x468
> [  243.677452]  device_del from usb_disable_device+0xcc/0x178
> [  243.677486]  usb_disable_device from usb_set_configuration+0x4ec/0x8d0
> [  243.677523]  usb_set_configuration from usb_unbind_device+0x24/0x7c
> [  243.677560]  usb_unbind_device from
> device_release_driver_internal+0x194/0x21c
> [  243.677590]  device_release_driver_internal from device_reprobe+0x18/0x90
> [  243.677620]  device_reprobe from __usb_bus_reprobe_drivers+0x40/0x6c
> [  243.677648]  __usb_bus_reprobe_drivers from bus_for_each_dev+0x70/0xb0
> [  243.677676]  bus_for_each_dev from usb_register_device_driver+0x9c/0xd0
> [  243.677713]  usb_register_device_driver from onboard_hub_init+0x30/0x1000
> [onboard_usb_hub]
> [  243.677765]  onboard_hub_init [onboard_usb_hub] from
> do_one_initcall+0x40/0x204
> [  243.677811]  do_one_initcall from do_init_module+0x44/0x1d4
> [  243.677840]  do_init_module from sys_finit_module+0xbc/0xf8
> [  243.677865]  sys_finit_module from __sys_trace_return+0x0/0x10
> [  243.677887] Exception stack(0xf4659fa8 to 0xf4659ff0)
> [  243.677904] 9fa0:                   bf369800 0051dba8 00000006 b6e438e0
> 00000000 b6e443f4
> [  243.677921] 9fc0: bf369800 0051dba8 00000000 0000017b 00531658 0051a1dc
> 00526398 00000000
> [  243.677935] 9fe0: befbb160 befbb150 b6e3a9d8 b6f2aae0

Thanks, that's useful!

The flow is something like this:

- USB root hub is instantiated
- core hub driver calls onboard_hub_create_pdevs(), which creates the
  platform device for the 1st level hub
  - the platform device is created even though the onboard hub driver
    hasn't been loaded yet, because onboard_hub_create/destroy_pdevs()
    is linked into the USB core
- 1st level Microchip hubs is probed by the core hub driver
- core hub driver calls onboard_hub_create_pdevs(), which creates
  the platform device for the 2nd level hub

- onboard_hub platform driver is registered
- platform device for 1st level hub is probed
  - schedules 'attach' work
- platform device for 2nd level hub is probed
  - schedules 'attach' work

- onboard_hub USB driver is registered
- device (and parent) lock of Microchip hub is held while the device is
  re-probing

- 'attach' work (running in another thread) calls driver_attach(), which
  blocks on one of the hub device locks

- onboard_hub_destroy_pdevs() is called by the core hub driver when one
  of the Microchip hubs is detached
- destroying the pdevs invokes onboard_hub_remove(), which waits for the
  'attach' work to complete
  - waits forever, since the 'attach' work can't acquire the device lock

For the Rpi 3 B Plus and boards with similar configurations it should be
enough with not creating the onboard_hub platform devices, which anyway
is the right thing do. I'll send patches for this.

The above race condition could also impact boards which actually should
use the onboard_hub driver, so not creating the pdevs for some boards
won't be enough.

Out of my head I can't think of a clean solution. The onboard hub driver
doesn't control the locks involved. Detaching the driver is necessary
to make sure the onboard_hub USB devices don't hold stale pointers to
their platform device. Re-attaching is needed because of the detach.

One option could be to change the 'attach' work from being a member of
struct onboard_hub to a static variable owned by the driver. With that
onboard_hub_remove() wouldn't have to wait for the work to finish. An
inconvenient is that only one instance of the work can run at any time,
which could result in a race condition when multiple onboard hubs are
probed closely together. It could happen that the USB device of the 2nd
(or 3rd, ...) hub isn't re-attached if it wasn't on the system wide USB
'bus' yet when the 'attach' work of the 1st hub runs. Probably a rare
condition within the (as of now) rare scenario of multiple onboard hubs,
but it could happen. A mitigation could be to enter a sleepy loop if
schedule_work() returns false (work is already running) and schedule it
again until it is actually scheduled on behalf of the platform device
in question. I might go for that if I don't get a better idea.

Happy holidays!

m.

  reply	other threads:[~2022-12-22  0:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-18 13:35 Regression: onboard-usb-hub breaks USB on RPi 3 Stefan Wahren
2022-12-19 10:41 ` Regression: onboard-usb-hub breaks USB on RPi 3 #forregzbot Thorsten Leemhuis
2022-12-19 17:44 ` Regression: onboard-usb-hub breaks USB on RPi 3 Matthias Kaehlcke
2022-12-19 22:32   ` Stefan Wahren
2022-12-20  0:30     ` Matthias Kaehlcke
2022-12-20 16:19       ` Stefan Wahren
2022-12-20 22:50         ` Matthias Kaehlcke
2022-12-21 12:29           ` Stefan Wahren
2022-12-21 16:50             ` Matthias Kaehlcke
2022-12-21 18:00               ` Stefan Wahren
2022-12-21 19:02                 ` Matthias Kaehlcke
2022-12-21 21:31                   ` Stefan Wahren
2022-12-22  0:55                     ` Matthias Kaehlcke [this message]
2022-12-22 11:19                       ` Stefan Wahren
2022-12-27 13:15                     ` Stefan Wahren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y6OrGbqaMy2iVDWB@google.com \
    --to=mka@chromium.org \
    --cc=f.fainelli@gmail.com \
    --cc=fabrice.gasnier@foss.st.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stefan.wahren@i2se.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).