From: Matthias Kaehlcke <mka@chromium.org>
To: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Fabrice Gasnier <fabrice.gasnier@foss.st.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
linux-usb@vger.kernel.org, regressions@lists.linux.dev,
Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Regression: onboard-usb-hub breaks USB on RPi 3
Date: Thu, 22 Dec 2022 00:55:53 +0000 [thread overview]
Message-ID: <Y6OrGbqaMy2iVDWB@google.com> (raw)
In-Reply-To: <8503cdb5-6089-b9e4-25ff-f3ac294b7a79@i2se.com>
Hi Stefan,
On Wed, Dec 21, 2022 at 10:31:09PM +0100, Stefan Wahren wrote:
> Hi Matthias,
>
> Am 21.12.22 um 20:02 schrieb Matthias Kaehlcke:
> > Hi Stefan,
> >
> > On Wed, Dec 21, 2022 at 07:00:41PM +0100, Stefan Wahren wrote:
> > > I will try to play with lock debugging.
> > Thanks, hopefully that can provide some hint.
> >
> DETECT_HUNG_TASK reveals this in error case:
>
> [ 243.676253] INFO: task kworker/2:1:44 blocked for more than 122 seconds.
> [ 243.676284] Not tainted 6.1.0-00007-g22fada783b9f #32
> [ 243.676294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 243.676303] task:kworker/2:1 state:D stack:0 pid:44 ppid:2
> flags:0x00000000
> [ 243.676329] Workqueue: events onboard_hub_attach_usb_driver
> [onboard_usb_hub]
> [ 243.676388] __schedule from schedule+0x58/0xf8
> [ 243.676419] schedule from schedule_preempt_disabled+0x1c/0x2c
> [ 243.676445] schedule_preempt_disabled from
> __mutex_lock.constprop.0+0x29c/0x948
> [ 243.676474] __mutex_lock.constprop.0 from __driver_attach+0x74/0x188
> [ 243.676503] __driver_attach from bus_for_each_dev+0x70/0xb0
> [ 243.676532] bus_for_each_dev from onboard_hub_attach_usb_driver+0xc/0x28
> [onboard_usb_hub]
> [ 243.676587] onboard_hub_attach_usb_driver [onboard_usb_hub] from
> process_one_work+0x1f8/0x520
> [ 243.676637] process_one_work from worker_thread+0x40/0x55c
> [ 243.676663] worker_thread from kthread+0xf0/0x110
> [ 243.676685] kthread from ret_from_fork+0x14/0x2c
> [ 243.676705] Exception stack(0xf091dfb0 to 0xf091dff8)
> [ 243.676720] dfa0: 00000000 00000000
> 00000000 00000000
> [ 243.676737] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000
> [ 243.676752] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 243.676788] INFO: task systemd-udevd:148 blocked for more than 122
> seconds.
> [ 243.676800] Not tainted 6.1.0-00007-g22fada783b9f #32
> [ 243.676809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 243.676817] task:systemd-udevd state:D stack:0 pid:148 ppid:144
> flags:0x00000081
> [ 243.676839] __schedule from schedule+0x58/0xf8
> [ 243.676864] schedule from schedule_timeout+0xb4/0x15c
> [ 243.676893] schedule_timeout from __wait_for_common+0xc4/0x228
> [ 243.676922] __wait_for_common from __flush_work+0x1a8/0x360
> [ 243.676949] __flush_work from __cancel_work_timer+0x10c/0x1e4
> [ 243.676975] __cancel_work_timer from onboard_hub_remove+0x28/0xbc
> [onboard_usb_hub]
> [ 243.677021] onboard_hub_remove [onboard_usb_hub] from
> platform_remove+0x20/0x4c
> [ 243.677067] platform_remove from
> device_release_driver_internal+0x194/0x21c
> [ 243.677092] device_release_driver_internal from
> bus_remove_device+0xcc/0xf8
> [ 243.677124] bus_remove_device from device_del+0x16c/0x468
> [ 243.677159] device_del from platform_device_del.part.0+0x10/0x74
> [ 243.677187] platform_device_del.part.0 from
> platform_device_unregister+0x18/0x24
> [ 243.677216] platform_device_unregister from
> of_platform_device_destroy+0x98/0xa8
> [ 243.677249] of_platform_device_destroy from
> onboard_hub_destroy_pdevs+0x48/0x6c
> [ 243.677287] onboard_hub_destroy_pdevs from hub_disconnect+0x104/0x174
> [ 243.677321] hub_disconnect from usb_unbind_interface+0x78/0x26c
> [ 243.677356] usb_unbind_interface from
> device_release_driver_internal+0x194/0x21c
> [ 243.677388] device_release_driver_internal from
> bus_remove_device+0xcc/0xf8
> [ 243.677419] bus_remove_device from device_del+0x16c/0x468
> [ 243.677452] device_del from usb_disable_device+0xcc/0x178
> [ 243.677486] usb_disable_device from usb_set_configuration+0x4ec/0x8d0
> [ 243.677523] usb_set_configuration from usb_unbind_device+0x24/0x7c
> [ 243.677560] usb_unbind_device from
> device_release_driver_internal+0x194/0x21c
> [ 243.677590] device_release_driver_internal from device_reprobe+0x18/0x90
> [ 243.677620] device_reprobe from __usb_bus_reprobe_drivers+0x40/0x6c
> [ 243.677648] __usb_bus_reprobe_drivers from bus_for_each_dev+0x70/0xb0
> [ 243.677676] bus_for_each_dev from usb_register_device_driver+0x9c/0xd0
> [ 243.677713] usb_register_device_driver from onboard_hub_init+0x30/0x1000
> [onboard_usb_hub]
> [ 243.677765] onboard_hub_init [onboard_usb_hub] from
> do_one_initcall+0x40/0x204
> [ 243.677811] do_one_initcall from do_init_module+0x44/0x1d4
> [ 243.677840] do_init_module from sys_finit_module+0xbc/0xf8
> [ 243.677865] sys_finit_module from __sys_trace_return+0x0/0x10
> [ 243.677887] Exception stack(0xf4659fa8 to 0xf4659ff0)
> [ 243.677904] 9fa0: bf369800 0051dba8 00000006 b6e438e0
> 00000000 b6e443f4
> [ 243.677921] 9fc0: bf369800 0051dba8 00000000 0000017b 00531658 0051a1dc
> 00526398 00000000
> [ 243.677935] 9fe0: befbb160 befbb150 b6e3a9d8 b6f2aae0
Thanks, that's useful!
The flow is something like this:
- USB root hub is instantiated
- core hub driver calls onboard_hub_create_pdevs(), which creates the
platform device for the 1st level hub
- the platform device is created even though the onboard hub driver
hasn't been loaded yet, because onboard_hub_create/destroy_pdevs()
is linked into the USB core
- 1st level Microchip hubs is probed by the core hub driver
- core hub driver calls onboard_hub_create_pdevs(), which creates
the platform device for the 2nd level hub
- onboard_hub platform driver is registered
- platform device for 1st level hub is probed
- schedules 'attach' work
- platform device for 2nd level hub is probed
- schedules 'attach' work
- onboard_hub USB driver is registered
- device (and parent) lock of Microchip hub is held while the device is
re-probing
- 'attach' work (running in another thread) calls driver_attach(), which
blocks on one of the hub device locks
- onboard_hub_destroy_pdevs() is called by the core hub driver when one
of the Microchip hubs is detached
- destroying the pdevs invokes onboard_hub_remove(), which waits for the
'attach' work to complete
- waits forever, since the 'attach' work can't acquire the device lock
For the Rpi 3 B Plus and boards with similar configurations it should be
enough with not creating the onboard_hub platform devices, which anyway
is the right thing do. I'll send patches for this.
The above race condition could also impact boards which actually should
use the onboard_hub driver, so not creating the pdevs for some boards
won't be enough.
Out of my head I can't think of a clean solution. The onboard hub driver
doesn't control the locks involved. Detaching the driver is necessary
to make sure the onboard_hub USB devices don't hold stale pointers to
their platform device. Re-attaching is needed because of the detach.
One option could be to change the 'attach' work from being a member of
struct onboard_hub to a static variable owned by the driver. With that
onboard_hub_remove() wouldn't have to wait for the work to finish. An
inconvenient is that only one instance of the work can run at any time,
which could result in a race condition when multiple onboard hubs are
probed closely together. It could happen that the USB device of the 2nd
(or 3rd, ...) hub isn't re-attached if it wasn't on the system wide USB
'bus' yet when the 'attach' work of the 1st hub runs. Probably a rare
condition within the (as of now) rare scenario of multiple onboard hubs,
but it could happen. A mitigation could be to enter a sleepy loop if
schedule_work() returns false (work is already running) and schedule it
again until it is actually scheduled on behalf of the platform device
in question. I might go for that if I don't get a better idea.
Happy holidays!
m.
next prev parent reply other threads:[~2022-12-22 0:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-18 13:35 Regression: onboard-usb-hub breaks USB on RPi 3 Stefan Wahren
2022-12-19 10:41 ` Regression: onboard-usb-hub breaks USB on RPi 3 #forregzbot Thorsten Leemhuis
2022-12-19 17:44 ` Regression: onboard-usb-hub breaks USB on RPi 3 Matthias Kaehlcke
2022-12-19 22:32 ` Stefan Wahren
2022-12-20 0:30 ` Matthias Kaehlcke
2022-12-20 16:19 ` Stefan Wahren
2022-12-20 22:50 ` Matthias Kaehlcke
2022-12-21 12:29 ` Stefan Wahren
2022-12-21 16:50 ` Matthias Kaehlcke
2022-12-21 18:00 ` Stefan Wahren
2022-12-21 19:02 ` Matthias Kaehlcke
2022-12-21 21:31 ` Stefan Wahren
2022-12-22 0:55 ` Matthias Kaehlcke [this message]
2022-12-22 11:19 ` Stefan Wahren
2022-12-27 13:15 ` Stefan Wahren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6OrGbqaMy2iVDWB@google.com \
--to=mka@chromium.org \
--cc=f.fainelli@gmail.com \
--cc=fabrice.gasnier@foss.st.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-usb@vger.kernel.org \
--cc=regressions@lists.linux.dev \
--cc=stefan.wahren@i2se.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).