* Re: USB: add check to detect host controller hardware removal [not found] <PH0PR11MB5191464B2F01511D2ADECB3BF1D2A@PH0PR11MB5191.namprd11.prod.outlook.com> @ 2023-10-13 17:17 ` Alan Stern 2023-10-15 13:37 ` Li, Meng 2023-10-16 16:56 ` Steven Rostedt 0 siblings, 2 replies; 25+ messages in thread From: Alan Stern @ 2023-10-13 17:17 UTC (permalink / raw) To: Li, Meng; +Cc: Steven Rostedt, Ingo Molnar, USB mailing list Messages like this should always be sent to the mailing list as well as to me. And in this case, since it involves an RT kernel, it should be CC-ed to some of the people involved in developing the RT patches. On Fri, Oct 13, 2023 at 04:25:43AM +0000, Li, Meng wrote: > Hi Alan Stern, > > Sorry for disrupting you very abruptly. I encounter a calltrace when trying to remove a PCIe-to-USB card device with below command. Only occurred in rt kernel. > > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove > > Call trace: > dump_backtrace.part.0+0xc8/0xd4 > show_stack+0x20/0x30 > dump_stack_lvl+0x6c/0x88 > dump_stack+0x18/0x34 > __might_resched+0x160/0x1c0 > rt_spin_lock+0x38/0xb0 > xhci_irq+0x44/0x16d0 > usb_hcd_irq+0x38/0x5c > usb_hcd_pci_remove+0x84/0x14c > xhci_pci_remove+0x78/0xc0 > pci_device_remove+0x44/0xcc > device_remove+0x54/0x8c > device_release_driver_internal+0x1ec/0x260 > device_release_driver+0x20/0x30 > pci_stop_bus_device+0x8c/0xcc > pci_stop_and_remove_bus_device_locked+0x28/0x44 > remove_store+0xa0/0xb0 > dev_attr_store+0x20/0x34 > sysfs_kf_write+0x4c/0x5c > kernfs_fop_write_iter+0x128/0x1f0 > vfs_write+0x1c0/0x2f0 > ksys_write+0x78/0x110 > __arm64_sys_write+0x24/0x30 > invoke_syscall+0x5c/0x130 > el0_svc_common.constprop.0+0x4c/0xf4 > do_el0_svc+0x34/0xc0 > el0_svc+0x2c/0x84 > el0t_64_sync_handler+0xf4/0x120 > el0t_64_sync+0x18c/0x190 > > this trace is caused by below patch that is used to fix a usb hang issue. > > commit c548795abe0d3520b74e18f23ca0a0d72deddab9 > Author: Alan Stern stern@rowland.harvard.edu<mailto:stern@rowland.harvard.edu> > Date: Wed Jun 9 17:34:27 2010 -0400 > > USB: add check to detect host controller hardware removal > > > I know it is too too long ago, but could you please try to recall how to reproduce the issue that you fixed, how can I produce the usb hang issue. I think the issue could be reproduced just by hot-removing a USB host controller. Maybe the controller was on a PCMCIA card; I don't remember. And I don't remember what kind of USB host controller it was. You might be able to find the original bug report in the linux-usb or linux-usb-devel mailing list archives for 2010. > I want to try whether I can get another method to fix the USB hang issue without disable irq, so that avoid the calltrace. For new people, the contents of that commit are: This patch (as1391) fixes a problem that can occur when USB host controller hardware is hot-unplugged. If no interrupts are generated by the unplug then the HCD may not realize that the controller is gone, and the subsequent unbind may hang waiting for interrupts that never arrive. The solution (for PCI-based controllers) is to call the HCD's interrupt handler at the start of usb_hcd_pci_remove(). If the hardware is gone, the handler will realize this when it tries to read the controller's status register. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index 1cf2d1e79a5c..7e2d5271b0c9 100644 --- a/drivers/usb/core/hcd-pci.c +++ b/drivers/usb/core/hcd-pci.c @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev) if (!hcd) return; + /* Fake an interrupt request in order to give the driver a chance + * to test whether the controller hardware has been removed (e.g., + * cardbus physical eject). + */ + local_irq_disable(); + usb_hcd_irq(0, hcd); + local_irq_enable(); + usb_remove_hcd(hcd); if (hcd->driver->flags & HCD_MEMORY) { iounmap(hcd->regs); The local_irq_disable() is there so that the irq handler will be invoked in the state that it expects (i.e., with interrupts disabled). Apparently Meng's RT kernel doesn't like it when the handler then calls spin_lock(); I don't know why. Alan Stern ^ permalink raw reply related [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal 2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern @ 2023-10-15 13:37 ` Li, Meng 2023-10-16 16:56 ` Steven Rostedt 1 sibling, 0 replies; 25+ messages in thread From: Li, Meng @ 2023-10-15 13:37 UTC (permalink / raw) To: Alan Stern; +Cc: Steven Rostedt, Ingo Molnar, USB mailing list > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: Saturday, October 14, 2023 1:18 AM > To: Li, Meng <Meng.Li@windriver.com> > Cc: Steven Rostedt <rostedt@goodmis.org>; Ingo Molnar > <mingo@redhat.com>; USB mailing list <linux-usb@vger.kernel.org> > Subject: Re: USB: add check to detect host controller hardware removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > Messages like this should always be sent to the mailing list as well as to me. > And in this case, since it involves an RT kernel, it should be CC-ed to some of > the people involved in developing the RT patches. > > On Fri, Oct 13, 2023 at 04:25:43AM +0000, Li, Meng wrote: > > Hi Alan Stern, > > > > Sorry for disrupting you very abruptly. I encounter a calltrace when trying to > remove a PCIe-to-USB card device with below command. Only occurred in rt > kernel. > > > > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove > > > > Call trace: > > dump_backtrace.part.0+0xc8/0xd4 > > show_stack+0x20/0x30 > > dump_stack_lvl+0x6c/0x88 > > dump_stack+0x18/0x34 > > __might_resched+0x160/0x1c0 > > rt_spin_lock+0x38/0xb0 > > xhci_irq+0x44/0x16d0 > > usb_hcd_irq+0x38/0x5c > > usb_hcd_pci_remove+0x84/0x14c > > xhci_pci_remove+0x78/0xc0 > > pci_device_remove+0x44/0xcc > > device_remove+0x54/0x8c > > device_release_driver_internal+0x1ec/0x260 > > device_release_driver+0x20/0x30 > > pci_stop_bus_device+0x8c/0xcc > > pci_stop_and_remove_bus_device_locked+0x28/0x44 > > remove_store+0xa0/0xb0 > > dev_attr_store+0x20/0x34 > > sysfs_kf_write+0x4c/0x5c > > kernfs_fop_write_iter+0x128/0x1f0 > > vfs_write+0x1c0/0x2f0 > > ksys_write+0x78/0x110 > > __arm64_sys_write+0x24/0x30 > > invoke_syscall+0x5c/0x130 > > el0_svc_common.constprop.0+0x4c/0xf4 > > do_el0_svc+0x34/0xc0 > > el0_svc+0x2c/0x84 > > el0t_64_sync_handler+0xf4/0x120 > > el0t_64_sync+0x18c/0x190 > > > > this trace is caused by below patch that is used to fix a usb hang issue. > > > > commit c548795abe0d3520b74e18f23ca0a0d72deddab9 > > Author: Alan Stern > stern@rowland.harvard.edu<mailto:stern@rowland.harvard.edu> > > Date: Wed Jun 9 17:34:27 2010 -0400 > > > > USB: add check to detect host controller hardware removal > > > > > > I know it is too too long ago, but could you please try to recall how to > reproduce the issue that you fixed, how can I produce the usb hang issue. > > I think the issue could be reproduced just by hot-removing a USB host > controller. Maybe the controller was on a PCMCIA card; I don't remember. > And I don't remember what kind of USB host controller it was. > > You might be able to find the original bug report in the linux-usb or linux-usb- > devel mailing list archives for 2010. > Thanks for offering this clue about how to reproduce this issue, I will try it on my side. Thanks, Limeng > > I want to try whether I can get another method to fix the USB hang issue > without disable irq, so that avoid the calltrace. > > For new people, the contents of that commit are: > > This patch (as1391) fixes a problem that can occur when USB host > controller hardware is hot-unplugged. If no interrupts are generated > by the unplug then the HCD may not realize that the controller is > gone, and the subsequent unbind may hang waiting for interrupts that > never arrive. > > The solution (for PCI-based controllers) is to call the HCD's > interrupt handler at the start of usb_hcd_pci_remove(). If the > hardware is gone, the handler will realize this when it tries to read > the controller's status register. > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index > 1cf2d1e79a5c..7e2d5271b0c9 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev) > if (!hcd) > return; > > + /* Fake an interrupt request in order to give the driver a chance > + * to test whether the controller hardware has been removed (e.g., > + * cardbus physical eject). > + */ > + local_irq_disable(); > + usb_hcd_irq(0, hcd); > + local_irq_enable(); > + > usb_remove_hcd(hcd); > if (hcd->driver->flags & HCD_MEMORY) { > iounmap(hcd->regs); > > The local_irq_disable() is there so that the irq handler will be invoked in the > state that it expects (i.e., with interrupts disabled). > Apparently Meng's RT kernel doesn't like it when the handler then calls > spin_lock(); I don't know why. > > Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern 2023-10-15 13:37 ` Li, Meng @ 2023-10-16 16:56 ` Steven Rostedt 2023-10-16 19:23 ` Alan Stern 2023-10-19 12:49 ` Sebastian Andrzej Siewior 1 sibling, 2 replies; 25+ messages in thread From: Steven Rostedt @ 2023-10-16 16:56 UTC (permalink / raw) To: Alan Stern Cc: Li, Meng, Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users On Fri, 13 Oct 2023 13:17:52 -0400 Alan Stern <stern@rowland.harvard.edu> wrote: > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev) > if (!hcd) > return; > > + /* Fake an interrupt request in order to give the driver a chance > + * to test whether the controller hardware has been removed (e.g., > + * cardbus physical eject). > + */ > + local_irq_disable(); > + usb_hcd_irq(0, hcd); > + local_irq_enable(); > + > usb_remove_hcd(hcd); > if (hcd->driver->flags & HCD_MEMORY) { > iounmap(hcd->regs); > > The local_irq_disable() is there so that the irq handler will be invoked > in the state that it expects (i.e., with interrupts disabled). > Apparently Meng's RT kernel doesn't like it when the handler then > calls spin_lock(); I don't know why. Because in RT, spin_lock()s are actually mutexes. In RT, interrupt handlers do not even run with interrupts disabled (they run as threads), so the assumption that they run with interrupts disabled on RT is incorrect. One hack would simply be: if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_disable(); usb_hcd_irq(0, hcd); if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_enable(); But that's rather ugly. We use to have that as a wrapper of just: local_irq_disable_nort(); but I don't know if we removed that or not. Sebastian? -- Steve ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-16 16:56 ` Steven Rostedt @ 2023-10-16 19:23 ` Alan Stern 2023-10-17 2:23 ` Li, Meng 2023-10-19 12:49 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 25+ messages in thread From: Alan Stern @ 2023-10-16 19:23 UTC (permalink / raw) To: Steven Rostedt Cc: Li, Meng, Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users On Mon, Oct 16, 2023 at 12:56:24PM -0400, Steven Rostedt wrote: > On Fri, 13 Oct 2023 13:17:52 -0400 > Alan Stern <stern@rowland.harvard.edu> wrote: > > > --- a/drivers/usb/core/hcd-pci.c > > +++ b/drivers/usb/core/hcd-pci.c > > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev) > > if (!hcd) > > return; > > > > + /* Fake an interrupt request in order to give the driver a chance > > + * to test whether the controller hardware has been removed (e.g., > > + * cardbus physical eject). > > + */ > > + local_irq_disable(); > > + usb_hcd_irq(0, hcd); > > + local_irq_enable(); > > + > > usb_remove_hcd(hcd); > > if (hcd->driver->flags & HCD_MEMORY) { > > iounmap(hcd->regs); > > > > The local_irq_disable() is there so that the irq handler will be invoked > > in the state that it expects (i.e., with interrupts disabled). > > Apparently Meng's RT kernel doesn't like it when the handler then > > calls spin_lock(); I don't know why. > > Because in RT, spin_lock()s are actually mutexes. > > In RT, interrupt handlers do not even run with interrupts disabled (they > run as threads), so the assumption that they run with interrupts disabled > on RT is incorrect. One hack would simply be: > > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > local_irq_disable(); > usb_hcd_irq(0, hcd); > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > local_irq_enable(); > > But that's rather ugly. We use to have that as a wrapper of just: > > local_irq_disable_nort(); > > but I don't know if we removed that or not. > > Sebastian? Thanks for the information. I guess a simple approach would be to add the wrapper back in, since it's not present in the current kernel. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal 2023-10-16 19:23 ` Alan Stern @ 2023-10-17 2:23 ` Li, Meng 2023-10-17 14:06 ` Alan Stern 0 siblings, 1 reply; 25+ messages in thread From: Li, Meng @ 2023-10-17 2:23 UTC (permalink / raw) To: Alan Stern, Steven Rostedt Cc: Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: Tuesday, October 17, 2023 3:24 AM > To: Steven Rostedt <rostedt@goodmis.org> > Cc: Li, Meng <Meng.Li@windriver.com>; Ingo Molnar <mingo@redhat.com>; > USB mailing list <linux-usb@vger.kernel.org>; Sebastian Andrzej Siewior > <bigeasy@linutronix.de>; linux-rt-users <linux-rt-users@vger.kernel.org> > Subject: Re: USB: add check to detect host controller hardware removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > On Mon, Oct 16, 2023 at 12:56:24PM -0400, Steven Rostedt wrote: > > On Fri, 13 Oct 2023 13:17:52 -0400 > > Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > --- a/drivers/usb/core/hcd-pci.c > > > +++ b/drivers/usb/core/hcd-pci.c > > > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev) > > > if (!hcd) > > > return; > > > > > > + /* Fake an interrupt request in order to give the driver a chance > > > + * to test whether the controller hardware has been removed (e.g., > > > + * cardbus physical eject). > > > + */ > > > + local_irq_disable(); > > > + usb_hcd_irq(0, hcd); > > > + local_irq_enable(); > > > + > > > usb_remove_hcd(hcd); > > > if (hcd->driver->flags & HCD_MEMORY) { > > > iounmap(hcd->regs); > > > > > > The local_irq_disable() is there so that the irq handler will be > > > invoked in the state that it expects (i.e., with interrupts disabled). > > > Apparently Meng's RT kernel doesn't like it when the handler then > > > calls spin_lock(); I don't know why. > > > > Because in RT, spin_lock()s are actually mutexes. > > > > In RT, interrupt handlers do not even run with interrupts disabled > > (they run as threads), so the assumption that they run with interrupts > > disabled on RT is incorrect. One hack would simply be: > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > local_irq_disable(); > > usb_hcd_irq(0, hcd); > > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > local_irq_enable(); > > > > But that's rather ugly. We use to have that as a wrapper of just: > > > > local_irq_disable_nort(); > > > > but I don't know if we removed that or not. > > > > Sebastian? > > Thanks for the information. I guess a simple approach would be to add the > wrapper back in, since it's not present in the current kernel. > I did some debugs on my side. Firstly, the local_irq_disable_nort() function had been removed from latest RT kernel. Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: add check to detect host controller hardware removal") is no longer useful. Because the function usb_remove_hcd() is invoked from xhci_pci_remove() of file xhci-pci.c in advance. I am trying to fix this issue with getting register status directly without local_irq_disable(). Thanks, Limeng > Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-17 2:23 ` Li, Meng @ 2023-10-17 14:06 ` Alan Stern 2023-10-18 5:00 ` Li, Meng 0 siblings, 1 reply; 25+ messages in thread From: Alan Stern @ 2023-10-17 14:06 UTC (permalink / raw) To: Li, Meng Cc: Steven Rostedt, Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote: > I did some debugs on my side. > Firstly, the local_irq_disable_nort() function had been removed from latest RT kernel. What's in the RT kernel doesn't matter here, because the code you're patching belongs to the vanilla kernel. > Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: add check to detect host controller hardware removal") is no longer useful. > Because the function usb_remove_hcd() is invoked from xhci_pci_remove() of file xhci-pci.c in advance. What about for non-xHCI controllers? > I am trying to fix this issue with getting register status directly without local_irq_disable(). Were you able to locate the original bug report? Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal 2023-10-17 14:06 ` Alan Stern @ 2023-10-18 5:00 ` Li, Meng 2023-10-18 15:20 ` Alan Stern 0 siblings, 1 reply; 25+ messages in thread From: Li, Meng @ 2023-10-18 5:00 UTC (permalink / raw) To: Alan Stern Cc: Steven Rostedt, Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: Tuesday, October 17, 2023 10:06 PM > To: Li, Meng <Meng.Li@windriver.com> > Cc: Steven Rostedt <rostedt@goodmis.org>; Ingo Molnar > <mingo@redhat.com>; USB mailing list <linux-usb@vger.kernel.org>; > Sebastian Andrzej Siewior <bigeasy@linutronix.de>; linux-rt-users <linux-rt- > users@vger.kernel.org> > Subject: Re: USB: add check to detect host controller hardware removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote: > > I did some debugs on my side. > > Firstly, the local_irq_disable_nort() function had been removed from latest > RT kernel. > > What's in the RT kernel doesn't matter here, because the code you're patching > belongs to the vanilla kernel. > > > Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: > add check to detect host controller hardware removal") is no longer useful. > > Because the function usb_remove_hcd() is invoked from xhci_pci_remove() > of file xhci-pci.c in advance. > > What about for non-xHCI controllers? > I will try non-xHCI controllers in later if I can find out one on my side. > > I am trying to fix this issue with getting register status directly without > local_irq_disable(). > > Were you able to locate the original bug report? > This is original bug report https://bugzilla.redhat.com/show_bug.cgi?id=579093 my latest debug information as below: when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register. Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP Modules linked in: CPU: 3 PID: 329 Comm: usb-storage Tainted: G W 6.1.53-rt10-yocto-preempt-rt #1 Hardware name: LS1043A RDB Board (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : xhci_ring_ep_doorbell+0x78/0x120 lr : xhci_queue_bulk_tx+0x3b0/0x8a4 sp : ffff80000b0e3960 x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100 x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100 x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000 x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002 x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50 x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000 x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290 Call trace: xhci_ring_ep_doorbell+0x78/0x120 xhci_queue_bulk_tx+0x3b0/0x8a4 xhci_urb_enqueue+0x274/0x510 usb_hcd_submit_urb+0xc0/0x8b0 usb_submit_urb+0x29c/0x5c0 usb_stor_msg_common+0x9c/0x190 usb_stor_bulk_transfer_buf+0x58/0x110 usb_stor_Bulk_transport+0xdc/0x380 usb_stor_invoke_transport+0x40/0x530 usb_stor_transparent_scsi_command+0x18/0x24 usb_stor_control_thread+0x20c/0x2a0 kthread+0x12c/0x130 ret_from_fork+0x10/0x20 Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021) ---[ end trace 0000000000000000 ]--- Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally. In this situation, if remove the pcie device with below command # echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove The code will hang at the xhci->lock in xhci_urb_dequeue() function. Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(), there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status); although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed. Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed. In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware. so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver. Thanks, Limeng > Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-18 5:00 ` Li, Meng @ 2023-10-18 15:20 ` Alan Stern 2023-10-18 15:34 ` Alan Stern 2023-10-19 12:38 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 25+ messages in thread From: Alan Stern @ 2023-10-18 15:20 UTC (permalink / raw) To: Li, Meng Cc: Steven Rostedt, Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users On Wed, Oct 18, 2023 at 05:00:58AM +0000, Li, Meng wrote: > > Were you able to locate the original bug report? > > > This is original bug report > https://bugzilla.redhat.com/show_bug.cgi?id=579093 The Red Hat Bugzilla says: You are not authorized to access bug #579093. So I can't tell exactly what happened back then. :-( But I do vaguely remember the discussion with Stratus Technologies. They had special hardware in their systems, which allowed them to do hot-swapping of PCI components. > my latest debug information as below: > when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register. > Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP > Modules linked in: > CPU: 3 PID: 329 Comm: usb-storage Tainted: G W 6.1.53-rt10-yocto-preempt-rt #1 > Hardware name: LS1043A RDB Board (DT) > pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : xhci_ring_ep_doorbell+0x78/0x120 > lr : xhci_queue_bulk_tx+0x3b0/0x8a4 > sp : ffff80000b0e3960 > x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100 > x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f > x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100 > x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000 > x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000 > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002 > x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50 > x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000 > x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000 > x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290 > Call trace: > xhci_ring_ep_doorbell+0x78/0x120 > xhci_queue_bulk_tx+0x3b0/0x8a4 > xhci_urb_enqueue+0x274/0x510 > usb_hcd_submit_urb+0xc0/0x8b0 > usb_submit_urb+0x29c/0x5c0 > usb_stor_msg_common+0x9c/0x190 > usb_stor_bulk_transfer_buf+0x58/0x110 > usb_stor_Bulk_transport+0xdc/0x380 > usb_stor_invoke_transport+0x40/0x530 > usb_stor_transparent_scsi_command+0x18/0x24 > usb_stor_control_thread+0x20c/0x2a0 > kthread+0x12c/0x130 > ret_from_fork+0x10/0x20 > Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021) > ---[ end trace 0000000000000000 ]--- > Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally. > In this situation, if remove the pcie device with below command > # echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove > The code will hang at the xhci->lock in xhci_urb_dequeue() function. > Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(), > there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status); > although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed. > Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed. > In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware. If you hadn't removed the card suddenly, the exception would not have occurred. So the logical conclusion isn't that we should get rid of the usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't remove PCIe cards while the system is running. Not unless your computer uses the special hardware from Stratus Technologies. > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver. What about cardbus or PCMCIA cards? Removing one of those cards suddenly, while the system is running, is a perfectly reasonable thing to do and it will not cause any hardware damage. So I think we should keep the usb_hcd_irq(0, hcd) call. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-18 15:20 ` Alan Stern @ 2023-10-18 15:34 ` Alan Stern 2023-10-19 12:38 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 25+ messages in thread From: Alan Stern @ 2023-10-18 15:34 UTC (permalink / raw) To: Li, Meng Cc: Steven Rostedt, Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users On Wed, Oct 18, 2023 at 11:20:46AM -0400, Alan Stern wrote: > On Wed, Oct 18, 2023 at 05:00:58AM +0000, Li, Meng wrote: > > > Were you able to locate the original bug report? > > > > > This is original bug report > > https://bugzilla.redhat.com/show_bug.cgi?id=579093 > > The Red Hat Bugzilla says: > > You are not authorized to access bug #579093. > > So I can't tell exactly what happened back then. :-( > > But I do vaguely remember the discussion with Stratus Technologies. > They had special hardware in their systems, which allowed them to do > hot-swapping of PCI components. Incidentally, for anyone who's interested, some early discussion about these problems can be found on an open mailing list here: https://marc.info/?l=linux-usb&m=127559364101206&w=2 Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-18 15:20 ` Alan Stern 2023-10-18 15:34 ` Alan Stern @ 2023-10-19 12:38 ` Sebastian Andrzej Siewior 2023-10-19 15:09 ` Alan Stern 1 sibling, 1 reply; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-10-19 12:38 UTC (permalink / raw) To: Alan Stern Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On 2023-10-18 11:20:46 [-0400], Alan Stern wrote: > > If you hadn't removed the card suddenly, the exception would not have > occurred. So the logical conclusion isn't that we should get rid of the > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't > remove PCIe cards while the system is running. Not unless your computer > uses the special hardware from Stratus Technologies. So the card was removed and the kernel complained that it can't access the memory behind the PCI-bar? How odd… > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver. > > What about cardbus or PCMCIA cards? Removing one of those cards > suddenly, while the system is running, is a perfectly reasonable thing > to do and it will not cause any hardware damage. So I think we should > keep the usb_hcd_irq(0, hcd) call. Don't you invoke pci_driver::remove in such case to properly let the physical device go? This can also be tested via unbind from sysfs. > Alan Stern Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-19 12:38 ` Sebastian Andrzej Siewior @ 2023-10-19 15:09 ` Alan Stern 2023-10-19 15:27 ` Alan Stern 0 siblings, 1 reply; 25+ messages in thread From: Alan Stern @ 2023-10-19 15:09 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On Thu, Oct 19, 2023 at 02:38:23PM +0200, Sebastian Andrzej Siewior wrote: > On 2023-10-18 11:20:46 [-0400], Alan Stern wrote: > > > > If you hadn't removed the card suddenly, the exception would not have > > occurred. So the logical conclusion isn't that we should get rid of the > > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't > > remove PCIe cards while the system is running. Not unless your computer > > uses the special hardware from Stratus Technologies. > > So the card was removed and the kernel complained that it can't access > the memory behind the PCI-bar? > > How odd… > > > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver. > > > > What about cardbus or PCMCIA cards? Removing one of those cards > > suddenly, while the system is running, is a perfectly reasonable thing > > to do and it will not cause any hardware damage. So I think we should > > keep the usb_hcd_irq(0, hcd) call. > > Don't you invoke pci_driver::remove in such case to properly let the > physical device go? This can also be tested via unbind from sysfs. I don't know any more. The code in question was added in 2010 in order to handle a specially designed high-availability hot-swap-capable system, and it may not be relevant now. The original problem: When a particular PCI device was disabled (to simulate a failure), pci_driver::remove did not get called. Maybe they wanted to have it fail over to a backup device and appear to the kernel as though nothing had happened? It's hard to tell exactly what was going on thirteen years ago. So while this function call may not be needed any more, I prefer to keep it around if that's not too much trouble. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-19 15:09 ` Alan Stern @ 2023-10-19 15:27 ` Alan Stern 2023-10-20 9:52 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 25+ messages in thread From: Alan Stern @ 2023-10-19 15:27 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On Thu, Oct 19, 2023 at 11:09:57AM -0400, Alan Stern wrote: > On Thu, Oct 19, 2023 at 02:38:23PM +0200, Sebastian Andrzej Siewior wrote: > > On 2023-10-18 11:20:46 [-0400], Alan Stern wrote: > > > > > > If you hadn't removed the card suddenly, the exception would not have > > > occurred. So the logical conclusion isn't that we should get rid of the > > > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't > > > remove PCIe cards while the system is running. Not unless your computer > > > uses the special hardware from Stratus Technologies. > > > > So the card was removed and the kernel complained that it can't access > > the memory behind the PCI-bar? > > > > How odd… > > > > > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver. > > > > > > What about cardbus or PCMCIA cards? Removing one of those cards > > > suddenly, while the system is running, is a perfectly reasonable thing > > > to do and it will not cause any hardware damage. So I think we should > > > keep the usb_hcd_irq(0, hcd) call. > > > > Don't you invoke pci_driver::remove in such case to properly let the > > physical device go? This can also be tested via unbind from sysfs. > > I don't know any more. The code in question was added in 2010 in order > to handle a specially designed high-availability hot-swap-capable > system, and it may not be relevant now. > > The original problem: When a particular PCI device was disabled (to > simulate a failure), pci_driver::remove did not get called. Oops... Correction: pci_driver::remove _did_ get called, and the code in question is part of the remove handler. Perhaps putting it there was a mistake. At the time, I probably thought the problem was a general one which might affect all the PCI USB controller drivers, but looking back now, it might have been better to put in the individual device driver. Perhaps that what we should do. Alan Stern > Maybe > they wanted to have it fail over to a backup device and appear to the > kernel as though nothing had happened? It's hard to tell exactly what > was going on thirteen years ago. > > So while this function call may not be needed any more, I prefer to keep > it around if that's not too much trouble. > > Alan Stern > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-19 15:27 ` Alan Stern @ 2023-10-20 9:52 ` Sebastian Andrzej Siewior 2023-10-20 15:19 ` Alan Stern 0 siblings, 1 reply; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-10-20 9:52 UTC (permalink / raw) To: Alan Stern Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On 2023-10-19 11:27:54 [-0400], Alan Stern wrote: > > Perhaps that what we should do. Perfect. > Alan Stern Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-20 9:52 ` Sebastian Andrzej Siewior @ 2023-10-20 15:19 ` Alan Stern 2023-11-03 15:46 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 25+ messages in thread From: Alan Stern @ 2023-10-20 15:19 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On Fri, Oct 20, 2023 at 11:52:38AM +0200, Sebastian Andrzej Siewior wrote: > On 2023-10-19 11:27:54 [-0400], Alan Stern wrote: > > > > Perhaps that what we should do. > > Perfect. Hmmm... This turns out not to be as easy as one might think. Sebastian, if you can instead suggest a way to call drivers' interrupt handlers (i.e., simulate an interrupt) without causing problems for RT kernels, I think that would be a better approach. The fundamental problem here is that the uhci-hcd driver was not written with unexpected hardware removal in mind. It doesn't have timeouts to handle situations where the device doesn't generate an IRQ to indicate completion of an I/O operation. And since it's been ten years since I've done any significant work on the driver, I'd really like to avoid the need for such a far-reaching change (not least because I don't have any way to test it). I suppose an alternative approach would be to add a new callback pointer to the hc_driver structure -- something that could tell the driver to check for hardware removal. I'll do that if there's no other choice. But simulating an interrupt seems easier, provided it can be done at all. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-20 15:19 ` Alan Stern @ 2023-11-03 15:46 ` Sebastian Andrzej Siewior 2023-11-03 20:42 ` Alan Stern 0 siblings, 1 reply; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-11-03 15:46 UTC (permalink / raw) To: Alan Stern Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On 2023-10-20 11:19:49 [-0400], Alan Stern wrote: > Hmmm... This turns out not to be as easy as one might think. > > Sebastian, if you can instead suggest a way to call drivers' interrupt > handlers (i.e., simulate an interrupt) without causing problems for RT > kernels, I think that would be a better approach. So there is generic_handle_irq_safe(). It should get all the details right like incrementing the counter in /proc/interrupts, doing nothing if the interrupt has been masked or waking the interrupt thread if the interrupt has happen to be threaded. It triggers the interrupt so for a shared handler it will invoke _all_ registered interrupt handler and for threaded interrupts it will return before the thread had a chance to run (free_irq() will handle it properly and wait for the interrupt thread/handler to complete). > The fundamental problem here is that the uhci-hcd driver was not written > with unexpected hardware removal in mind. It doesn't have timeouts to > handle situations where the device doesn't generate an IRQ to indicate > completion of an I/O operation. And since it's been ten years since > I've done any significant work on the driver, I'd really like to avoid > the need for such a far-reaching change (not least because I don't have > any way to test it). I see. Don't over complicate or "correct" things here. What should work is that the removal callback can be called at any time and things continue work. That means it will purge all queues, cancel all requests, timers, whatever and free all resources associated with the driver/ device. If it comes to PCI-hotplug you have to have a so called PCI-hotplug slot. This "slot" will let the OS know if the hardware has been removed or added. If you don't have such a thing you have to maintain the state yourself by using the "remove" and "rescan" sysfs files of the PCI slot. I'm not aware of any requirement for a PCI-driver to check if its device has been removed. > I suppose an alternative approach would be to add a new callback pointer > to the hc_driver structure -- something that could tell the driver to > check for hardware removal. I'll do that if there's no other choice. > But simulating an interrupt seems easier, provided it can be done at > all. > > Alan Stern Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-11-03 15:46 ` Sebastian Andrzej Siewior @ 2023-11-03 20:42 ` Alan Stern 2023-11-06 3:02 ` Li, Meng 2023-11-06 8:27 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 25+ messages in thread From: Alan Stern @ 2023-11-03 20:42 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On Fri, Nov 03, 2023 at 04:46:24PM +0100, Sebastian Andrzej Siewior wrote: > On 2023-10-20 11:19:49 [-0400], Alan Stern wrote: > > Hmmm... This turns out not to be as easy as one might think. > > > > Sebastian, if you can instead suggest a way to call drivers' interrupt > > handlers (i.e., simulate an interrupt) without causing problems for RT > > kernels, I think that would be a better approach. > > So there is generic_handle_irq_safe(). It should get all the details > right like incrementing the counter in /proc/interrupts, doing nothing > if the interrupt has been masked or waking the interrupt thread if the > interrupt has happen to be threaded. > It triggers the interrupt so for a shared handler it will invoke _all_ > registered interrupt handler and for threaded interrupts it will return > before the thread had a chance to run (free_irq() will handle it > properly and wait for the interrupt thread/handler to complete). Good. Meng Li, can you test a patch that replaces the local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to generic_handle_irq_safe()? > > The fundamental problem here is that the uhci-hcd driver was not written > > with unexpected hardware removal in mind. It doesn't have timeouts to > > handle situations where the device doesn't generate an IRQ to indicate > > completion of an I/O operation. And since it's been ten years since > > I've done any significant work on the driver, I'd really like to avoid > > the need for such a far-reaching change (not least because I don't have > > any way to test it). > > I see. Don't over complicate or "correct" things here. What should work > is that the removal callback can be called at any time and things > continue work. That means it will purge all queues, cancel all requests, > timers, whatever and free all resources associated with the driver/ > device. The driver _does_ work under those circumstances -- provided the hardware is still present and accessible. > If it comes to PCI-hotplug you have to have a so called PCI-hotplug > slot. This "slot" will let the OS know if the hardware has been removed > or added. If you don't have such a thing you have to maintain the state > yourself by using the "remove" and "rescan" sysfs files of the PCI slot. > > I'm not aware of any requirement for a PCI-driver to check if its device > has been removed. That's the problem: The driver doesn't really support PCI-hotplug. The code that Meng Li wants to change was sort of a half-baked way to add such support. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal 2023-11-03 20:42 ` Alan Stern @ 2023-11-06 3:02 ` Li, Meng 2023-11-06 8:28 ` Sebastian Andrzej Siewior 2023-11-06 8:27 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 25+ messages in thread From: Li, Meng @ 2023-11-06 3:02 UTC (permalink / raw) To: Alan Stern, Sebastian Andrzej Siewior Cc: Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: Saturday, November 4, 2023 4:42 AM > To: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: Li, Meng <Meng.Li@windriver.com>; Steven Rostedt > <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list > <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org> > Subject: Re: USB: add check to detect host controller hardware removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > On Fri, Nov 03, 2023 at 04:46:24PM +0100, Sebastian Andrzej Siewior wrote: > > On 2023-10-20 11:19:49 [-0400], Alan Stern wrote: > > > Hmmm... This turns out not to be as easy as one might think. > > > > > > Sebastian, if you can instead suggest a way to call drivers' > > > interrupt handlers (i.e., simulate an interrupt) without causing > > > problems for RT kernels, I think that would be a better approach. > > > > So there is generic_handle_irq_safe(). It should get all the details > > right like incrementing the counter in /proc/interrupts, doing nothing > > if the interrupt has been masked or waking the interrupt thread if the > > interrupt has happen to be threaded. > > It triggers the interrupt so for a shared handler it will invoke _all_ > > registered interrupt handler and for threaded interrupts it will > > return before the thread had a chance to run (free_irq() will handle > > it properly and wait for the interrupt thread/handler to complete). > > Good. Meng Li, can you test a patch that replaces the > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to > generic_handle_irq_safe()? It needs an irq number as parameter, what I should pass to this function, 0 or dev->irq or other value? Thanks, Limeng > > > > The fundamental problem here is that the uhci-hcd driver was not > > > written with unexpected hardware removal in mind. It doesn't have > > > timeouts to handle situations where the device doesn't generate an > > > IRQ to indicate completion of an I/O operation. And since it's been > > > ten years since I've done any significant work on the driver, I'd > > > really like to avoid the need for such a far-reaching change (not > > > least because I don't have any way to test it). > > > > I see. Don't over complicate or "correct" things here. What should > > work is that the removal callback can be called at any time and things > > continue work. That means it will purge all queues, cancel all > > requests, timers, whatever and free all resources associated with the > > driver/ device. > > The driver _does_ work under those circumstances -- provided the hardware is > still present and accessible. > > > If it comes to PCI-hotplug you have to have a so called PCI-hotplug > > slot. This "slot" will let the OS know if the hardware has been > > removed or added. If you don't have such a thing you have to maintain > > the state yourself by using the "remove" and "rescan" sysfs files of the PCI > slot. > > > > I'm not aware of any requirement for a PCI-driver to check if its > > device has been removed. > > That's the problem: The driver doesn't really support PCI-hotplug. > The code that Meng Li wants to change was sort of a half-baked way to add > such support. > > Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: USB: add check to detect host controller hardware removal 2023-11-06 3:02 ` Li, Meng @ 2023-11-06 8:28 ` Sebastian Andrzej Siewior 2023-11-06 8:54 ` Li, Meng 2023-11-06 15:25 ` Alan Stern 0 siblings, 2 replies; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-11-06 8:28 UTC (permalink / raw) To: Li, Meng Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On 2023-11-06 03:02:42 [+0000], Li, Meng wrote: > > Good. Meng Li, can you test a patch that replaces the > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to > > generic_handle_irq_safe()? > > It needs an irq number as parameter, what I should pass to this > function, 0 or dev->irq or other value? dev->irq is what it asks for. I would really appreciate if you would instead use the sysfs interface to remove the device prior physically removing it. This should solve your trouble. > Thanks, > Limeng Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: RE: USB: add check to detect host controller hardware removal 2023-11-06 8:28 ` Sebastian Andrzej Siewior @ 2023-11-06 8:54 ` Li, Meng 2023-11-06 9:09 ` Sebastian Andrzej Siewior 2023-11-06 15:25 ` Alan Stern 1 sibling, 1 reply; 25+ messages in thread From: Li, Meng @ 2023-11-06 8:54 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users > -----Original Message----- > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Sent: Monday, November 6, 2023 4:28 PM > To: Li, Meng <Meng.Li@windriver.com> > Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt > <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list > <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org> > Subject: Re: RE: USB: add check to detect host controller hardware removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > On 2023-11-06 03:02:42 [+0000], Li, Meng wrote: > > > Good. Meng Li, can you test a patch that replaces the > > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with > > > a single call to generic_handle_irq_safe()? > > > > It needs an irq number as parameter, what I should pass to this > > function, 0 or dev->irq or other value? > > dev->irq is what it asks for. I would really appreciate if you would > instead use the sysfs interface to remove the device prior physically removing > it. This should solve your trouble. > This is not my original issue that I encountered. I agree that we should remove the device from sys interface firstly, and then do hot-plug action. My original issue was the calltrace on RT kernel if I remove the device from sys interface. # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect, device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd 0001:01:00.0: USB bus 2 deregistered BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh root@nxp-ls1043:preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 ~# CPU: 0 PID: 765 Comm: sh Tainted: G W 6.1.57-rt10-yocto-preempt-rt #1 Hardware name: LS1043A RDB Board (DT) Call trace: dump_backtrace.part.0+0xc8/0xd4 show_stack+0x20/0x30 dump_stack_lvl+0x6c/0x88 dump_stack+0x18/0x34 __might_resched+0x160/0x1c0 rt_spin_lock+0x38/0xb0 xhci_irq+0x44/0x16d0 usb_hcd_irq+0x38/0x5c usb_hcd_pci_remove+0x84/0x14c xhci_pci_remove+0x78/0xc0 pci_device_remove+0x44/0xcc device_remove+0x54/0x8c device_release_driver_internal+0x1ec/0x260 device_release_driver+0x20/0x30 pci_stop_bus_device+0x8c/0xcc pci_stop_and_remove_bus_device_locked+0x28/0x44 remove_store+0xa0/0xb0 dev_attr_store+0x20/0x34 sysfs_kf_write+0x4c/0x5c kernfs_fop_write_iter+0x128/0x1f0 vfs_write+0x1c0/0x2f0 ksys_write+0x78/0x110 __arm64_sys_write+0x24/0x30 invoke_syscall+0x5c/0x130 el0_svc_common.constprop.0+0x4c/0xf4 do_el0_svc+0x34/0xc0 el0_svc+0x2c/0x84 el0t_64_sync_handler+0xf4/0x120 el0t_64_sync+0x18c/0x190 and then I checked the commit log to get which commit introduced this issue. I found out the commit is commit c548795abe0d3520b74e18f23ca0a0d72deddab9 Author: Alan Stern <stern@rowland.harvard.edu> Date: Wed Jun 9 17:34:27 2010 -0400 USB: add check to detect host controller hardware removal And then, Alan Stern told me the background of this issue. so, I started to do hotplug operation on my board to see what symptom on my nxp-ls1043/6 board. And then there were lots of discussion followed. Thanks, Limeng > > Thanks, > > Limeng > > Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: RE: USB: add check to detect host controller hardware removal 2023-11-06 8:54 ` Li, Meng @ 2023-11-06 9:09 ` Sebastian Andrzej Siewior 2023-11-07 3:08 ` Li, Meng 0 siblings, 1 reply; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-11-06 9:09 UTC (permalink / raw) To: Li, Meng Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On 2023-11-06 08:54:50 [+0000], Li, Meng wrote: > This is not my original issue that I encountered. > I agree that we should remove the device from sys interface firstly, > and then do hot-plug action. > My original issue was the calltrace on RT kernel if I remove the > device from sys interface. > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove > xhci_hcd 0001:01:00.0: remove, state 1 > usb usb2: USB disconnect, device number 1 > usb 2-4: USB disconnect, device number 2 > xhci_hcd 0001:01:00.0: USB bus 2 deregistered > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh Right. > and then I checked the commit log to get which commit introduced this issue. > I found out the commit is > commit c548795abe0d3520b74e18f23ca0a0d72deddab9 > Author: Alan Stern <stern@rowland.harvard.edu> > Date: Wed Jun 9 17:34:27 2010 -0400 > > USB: add check to detect host controller hardware removal > > And then, Alan Stern told me the background of this issue. so, I > started to do hotplug operation on my board to see what symptom on my > nxp-ls1043/6 board. > And then there were lots of discussion followed. Okay. I somehow mapped it that you try to add this to xhci. The suggested replacement should cover it. Better if we could get rid of it ;) > Thanks, > Limeng Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: RE: RE: USB: add check to detect host controller hardware removal 2023-11-06 9:09 ` Sebastian Andrzej Siewior @ 2023-11-07 3:08 ` Li, Meng 2023-11-07 6:17 ` Greg KH 0 siblings, 1 reply; 25+ messages in thread From: Li, Meng @ 2023-11-07 3:08 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users > -----Original Message----- > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Sent: Monday, November 6, 2023 5:09 PM > To: Li, Meng <Meng.Li@windriver.com> > Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt > <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list > <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org> > Subject: Re: RE: RE: USB: add check to detect host controller hardware > removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > On 2023-11-06 08:54:50 [+0000], Li, Meng wrote: > > > This is not my original issue that I encountered. > > I agree that we should remove the device from sys interface firstly, > > and then do hot-plug action. > > My original issue was the calltrace on RT kernel if I remove the > > device from sys interface. > > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove > > xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect, > > device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd > > 0001:01:00.0: USB bus 2 deregistered > > BUG: sleeping function called from invalid context at > > kernel/locking/spinlock_rt.c:46 > > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh > > Right. > > > and then I checked the commit log to get which commit introduced this > issue. > > I found out the commit is > > commit c548795abe0d3520b74e18f23ca0a0d72deddab9 > > Author: Alan Stern <stern@rowland.harvard.edu> > > Date: Wed Jun 9 17:34:27 2010 -0400 > > > > USB: add check to detect host controller hardware removal > > > > And then, Alan Stern told me the background of this issue. so, I > > started to do hotplug operation on my board to see what symptom on my > > nxp-ls1043/6 board. > > And then there were lots of discussion followed. > > Okay. I somehow mapped it that you try to add this to xhci. > The suggested replacement should cover it. Better if we could get rid of it ;) > generic_handle_irq_safe(dev->irq) works find on my board, there is no calltrace anymore. Will you create a patch in later? Thanks, Limeng > > Thanks, > > Limeng > > Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: RE: USB: add check to detect host controller hardware removal 2023-11-07 3:08 ` Li, Meng @ 2023-11-07 6:17 ` Greg KH 0 siblings, 0 replies; 25+ messages in thread From: Greg KH @ 2023-11-07 6:17 UTC (permalink / raw) To: Li, Meng Cc: Sebastian Andrzej Siewior, Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On Tue, Nov 07, 2023 at 03:08:13AM +0000, Li, Meng wrote: > > > > -----Original Message----- > > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > Sent: Monday, November 6, 2023 5:09 PM > > To: Li, Meng <Meng.Li@windriver.com> > > Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt > > <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list > > <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org> > > Subject: Re: RE: RE: USB: add check to detect host controller hardware > > removal > > > > CAUTION: This email comes from a non Wind River email account! > > Do not click links or open attachments unless you recognize the sender and > > know the content is safe. > > > > On 2023-11-06 08:54:50 [+0000], Li, Meng wrote: > > > > > This is not my original issue that I encountered. > > > I agree that we should remove the device from sys interface firstly, > > > and then do hot-plug action. > > > My original issue was the calltrace on RT kernel if I remove the > > > device from sys interface. > > > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove > > > xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect, > > > device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd > > > 0001:01:00.0: USB bus 2 deregistered > > > BUG: sleeping function called from invalid context at > > > kernel/locking/spinlock_rt.c:46 > > > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh > > > > Right. > > > > > and then I checked the commit log to get which commit introduced this > > issue. > > > I found out the commit is > > > commit c548795abe0d3520b74e18f23ca0a0d72deddab9 > > > Author: Alan Stern <stern@rowland.harvard.edu> > > > Date: Wed Jun 9 17:34:27 2010 -0400 > > > > > > USB: add check to detect host controller hardware removal > > > > > > And then, Alan Stern told me the background of this issue. so, I > > > started to do hotplug operation on my board to see what symptom on my > > > nxp-ls1043/6 board. > > > And then there were lots of discussion followed. > > > > Okay. I somehow mapped it that you try to add this to xhci. > > The suggested replacement should cover it. Better if we could get rid of it ;) > > > > generic_handle_irq_safe(dev->irq) works find on my board, there is no calltrace anymore. > Will you create a patch in later? As you have tested this, please create it and submit it so that we know exactly what changes are required and you get the proper credit for doing this work. thanks, greg k-h ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: USB: add check to detect host controller hardware removal 2023-11-06 8:28 ` Sebastian Andrzej Siewior 2023-11-06 8:54 ` Li, Meng @ 2023-11-06 15:25 ` Alan Stern 1 sibling, 0 replies; 25+ messages in thread From: Alan Stern @ 2023-11-06 15:25 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On Mon, Nov 06, 2023 at 09:28:29AM +0100, Sebastian Andrzej Siewior wrote: > On 2023-11-06 03:02:42 [+0000], Li, Meng wrote: > > > Good. Meng Li, can you test a patch that replaces the > > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to > > > generic_handle_irq_safe()? > > > > It needs an irq number as parameter, what I should pass to this > > function, 0 or dev->irq or other value? > > dev->irq is what it asks for. In fact it should be hcd->irq -- see usb_hcd_request_irqs(). Maybe this is the same value as dev->irq; I don't know. > I would really appreciate if you would > instead use the sysfs interface to remove the device prior physically > removing it. This should solve your trouble. The whole reason for having this code in the first place -- the specific use case -- was to handle device removal caused by sudden hardware failure. Obviously there's no way to use the sysfs interface before this occurs. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-11-03 20:42 ` Alan Stern 2023-11-06 3:02 ` Li, Meng @ 2023-11-06 8:27 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-11-06 8:27 UTC (permalink / raw) To: Alan Stern Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users On 2023-11-03 16:42:18 [-0400], Alan Stern wrote: > > I see. Don't over complicate or "correct" things here. What should work > > is that the removal callback can be called at any time and things > > continue work. That means it will purge all queues, cancel all requests, > > timers, whatever and free all resources associated with the driver/ > > device. > > The driver _does_ work under those circumstances -- provided the > hardware is still present and accessible. In that case I don't see a problem. > > If it comes to PCI-hotplug you have to have a so called PCI-hotplug > > slot. This "slot" will let the OS know if the hardware has been removed > > or added. If you don't have such a thing you have to maintain the state > > yourself by using the "remove" and "rescan" sysfs files of the PCI slot. > > > > I'm not aware of any requirement for a PCI-driver to check if its device > > has been removed. > > That's the problem: The driver doesn't really support PCI-hotplug. > The code that Meng Li wants to change was sort of a half-baked way to > add such support. This sounds half-baked. Just remove the device from sysfs and then physically plug the card. It is going to end in mess if every driver gets some "hotplug" support. > Alan Stern Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal 2023-10-16 16:56 ` Steven Rostedt 2023-10-16 19:23 ` Alan Stern @ 2023-10-19 12:49 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 25+ messages in thread From: Sebastian Andrzej Siewior @ 2023-10-19 12:49 UTC (permalink / raw) To: Steven Rostedt Cc: Alan Stern, Li, Meng, Ingo Molnar, USB mailing list, linux-rt-users On 2023-10-16 12:56:24 [-0400], Steven Rostedt wrote: > On Fri, 13 Oct 2023 13:17:52 -0400 > Alan Stern <stern@rowland.harvard.edu> wrote: > > Because in RT, spin_lock()s are actually mutexes. > > In RT, interrupt handlers do not even run with interrupts disabled (they > run as threads), so the assumption that they run with interrupts disabled > on RT is incorrect. One hack would simply be: > > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > local_irq_disable(); > usb_hcd_irq(0, hcd); > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > local_irq_enable(); > > But that's rather ugly. We use to have that as a wrapper of just: > > local_irq_disable_nort(); > > but I don't know if we removed that or not. > > Sebastian? Got removed once it had zero users. It was hated because it was used usually used as a quick solution to hide problems underneath. > -- Steve Sebastian ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-11-07 6:17 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <PH0PR11MB5191464B2F01511D2ADECB3BF1D2A@PH0PR11MB5191.namprd11.prod.outlook.com>
2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
2023-10-15 13:37 ` Li, Meng
2023-10-16 16:56 ` Steven Rostedt
2023-10-16 19:23 ` Alan Stern
2023-10-17 2:23 ` Li, Meng
2023-10-17 14:06 ` Alan Stern
2023-10-18 5:00 ` Li, Meng
2023-10-18 15:20 ` Alan Stern
2023-10-18 15:34 ` Alan Stern
2023-10-19 12:38 ` Sebastian Andrzej Siewior
2023-10-19 15:09 ` Alan Stern
2023-10-19 15:27 ` Alan Stern
2023-10-20 9:52 ` Sebastian Andrzej Siewior
2023-10-20 15:19 ` Alan Stern
2023-11-03 15:46 ` Sebastian Andrzej Siewior
2023-11-03 20:42 ` Alan Stern
2023-11-06 3:02 ` Li, Meng
2023-11-06 8:28 ` Sebastian Andrzej Siewior
2023-11-06 8:54 ` Li, Meng
2023-11-06 9:09 ` Sebastian Andrzej Siewior
2023-11-07 3:08 ` Li, Meng
2023-11-07 6:17 ` Greg KH
2023-11-06 15:25 ` Alan Stern
2023-11-06 8:27 ` Sebastian Andrzej Siewior
2023-10-19 12:49 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox