* [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null
@ 2024-12-17 7:58 胡连勤
2024-12-24 4:53 ` Prashanth K
2025-01-16 13:11 ` Jon Hunter
0 siblings, 2 replies; 17+ messages in thread
From: 胡连勤 @ 2024-12-17 7:58 UTC (permalink / raw)
To: gregkh@linuxfoundation.org, Prashanth K, mwalle@kernel.org,
quic_jjohnson@quicinc.com, David Brownell
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
胡连勤, opensource.kernel
From: Lianqin Hu <hulianqin@vivo.com>
Considering that in some extreme cases, when performing the
unbinding operation, gserial_disconnect has cleared gser->ioport,
which triggers gadget reconfiguration, and then calls gs_read_complete,
resulting in access to a null pointer. Therefore, ep is disabled before
gserial_disconnect sets port to null to prevent this from happening.
Call trace:
gs_read_complete+0x58/0x240
usb_gadget_giveback_request+0x40/0x160
dwc3_remove_requests+0x170/0x484
dwc3_ep0_out_start+0xb0/0x1d4
__dwc3_gadget_start+0x25c/0x720
kretprobe_trampoline.cfi_jt+0x0/0x8
kretprobe_trampoline.cfi_jt+0x0/0x8
udc_bind_to_driver+0x1d8/0x300
usb_gadget_probe_driver+0xa8/0x1dc
gadget_dev_desc_UDC_store+0x13c/0x188
configfs_write_iter+0x160/0x1f4
vfs_write+0x2d0/0x40c
ksys_write+0x7c/0xf0
__arm64_sys_write+0x20/0x30
invoke_syscall+0x60/0x150
el0_svc_common+0x8c/0xf8
do_el0_svc+0x28/0xa0
el0_svc+0x24/0x84
Fixes: c1dca562be8a ("usb gadget: split out serial core")
Cc: stable@vger.kernel.org
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
---
Changes in v3:
- Add --- line above the version tag information
- Remove extra blank lines in commit messages
- Version tag information from v2 to changes in v2
- Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/
Changes in v2:
- Remove some address information from patch descriptions
- Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/
- Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/
drivers/usb/gadget/function/u_serial.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 53d9fc41acc5..bc143a86c2dd 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser)
/* REVISIT as above: how best to track this? */
port->port_line_coding = gser->port_line_coding;
+ /* disable endpoints, aborting down any active I/O */
+ usb_ep_disable(gser->out);
+ usb_ep_disable(gser->in);
+
port->port_usb = NULL;
gser->ioport = NULL;
if (port->port.count > 0) {
@@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser)
spin_unlock(&port->port_lock);
spin_unlock_irqrestore(&serial_port_lock, flags);
- /* disable endpoints, aborting down any active I/O */
- usb_ep_disable(gser->out);
- usb_ep_disable(gser->in);
-
/* finally, free any unused/unusable I/O buffers */
spin_lock_irqsave(&port->port_lock, flags);
if (port->port.count == 0)
--
2.39.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2024-12-17 7:58 [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 胡连勤 @ 2024-12-24 4:53 ` Prashanth K 2024-12-25 7:03 ` 答复: " 胡连勤 2025-01-16 13:11 ` Jon Hunter 1 sibling, 1 reply; 17+ messages in thread From: Prashanth K @ 2024-12-24 4:53 UTC (permalink / raw) To: 胡连勤, gregkh@linuxfoundation.org, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel On 17-12-24 01:28 pm, 胡连勤 wrote: > From: Lianqin Hu <hulianqin@vivo.com> > > Considering that in some extreme cases, when performing the > unbinding operation, gserial_disconnect has cleared gser->ioport, > which triggers gadget reconfiguration, and then calls gs_read_complete, > resulting in access to a null pointer. Therefore, ep is disabled before > gserial_disconnect sets port to null to prevent this from happening. gs_read_complete() gets port from ep->driver_data, and it shouldn't get affected if gser->ioport is null as long as port[port_num].port is valid. Am I missing something here? > > Call trace: > gs_read_complete+0x58/0x240 > usb_gadget_giveback_request+0x40/0x160 > dwc3_remove_requests+0x170/0x484 > dwc3_ep0_out_start+0xb0/0x1d4 > __dwc3_gadget_start+0x25c/0x720 > kretprobe_trampoline.cfi_jt+0x0/0x8 > kretprobe_trampoline.cfi_jt+0x0/0x8 > udc_bind_to_driver+0x1d8/0x300 > usb_gadget_probe_driver+0xa8/0x1dc > gadget_dev_desc_UDC_store+0x13c/0x188 > configfs_write_iter+0x160/0x1f4 > vfs_write+0x2d0/0x40c > ksys_write+0x7c/0xf0 > __arm64_sys_write+0x20/0x30 > invoke_syscall+0x60/0x150 > el0_svc_common+0x8c/0xf8 > do_el0_svc+0x28/0xa0 > el0_svc+0x24/0x84 > Regards, Prashanth K ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2024-12-24 4:53 ` Prashanth K @ 2024-12-25 7:03 ` 胡连勤 2024-12-26 4:59 ` Prashanth K 0 siblings, 1 reply; 17+ messages in thread From: 胡连勤 @ 2024-12-25 7:03 UTC (permalink / raw) To: Prashanth K, gregkh@linuxfoundation.org, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel Hello Prashanth K: > > Considering that in some extreme cases, when performing the unbinding > > operation, gserial_disconnect has cleared gser->ioport, which triggers > > gadget reconfiguration, and then calls gs_read_complete, resulting in > > access to a null pointer. Therefore, ep is disabled before > > gserial_disconnect sets port to null to prevent this from happening. > > gs_read_complete() gets port from ep->driver_data, and it shouldn't get > affected if gser->ioport is null as long as port[port_num].port is valid. Am I > missing something here? This description is not very appropriate, thank you very much for your suggestion, I will modify it. > > > > Call trace: > > gs_read_complete+0x58/0x240 > > usb_gadget_giveback_request+0x40/0x160 > > dwc3_remove_requests+0x170/0x484 > > dwc3_ep0_out_start+0xb0/0x1d4 > > __dwc3_gadget_start+0x25c/0x720 > > kretprobe_trampoline.cfi_jt+0x0/0x8 > > kretprobe_trampoline.cfi_jt+0x0/0x8 > > udc_bind_to_driver+0x1d8/0x300 > > usb_gadget_probe_driver+0xa8/0x1dc > > gadget_dev_desc_UDC_store+0x13c/0x188 > > configfs_write_iter+0x160/0x1f4 > > vfs_write+0x2d0/0x40c > > ksys_write+0x7c/0xf0 > > __arm64_sys_write+0x20/0x30 > > invoke_syscall+0x60/0x150 > > el0_svc_common+0x8c/0xf8 > > do_el0_svc+0x28/0xa0 > > el0_svc+0x24/0x84 > > Thanks ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2024-12-25 7:03 ` 答复: " 胡连勤 @ 2024-12-26 4:59 ` Prashanth K 0 siblings, 0 replies; 17+ messages in thread From: Prashanth K @ 2024-12-26 4:59 UTC (permalink / raw) To: 胡连勤, gregkh@linuxfoundation.org, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel On 25-12-24 12:33 pm, 胡连勤 wrote: > Hello Prashanth K: > >>> Considering that in some extreme cases, when performing the unbinding >>> operation, gserial_disconnect has cleared gser->ioport, which triggers >>> gadget reconfiguration, and then calls gs_read_complete, resulting in >>> access to a null pointer. Therefore, ep is disabled before >>> gserial_disconnect sets port to null to prevent this from happening. >> >> gs_read_complete() gets port from ep->driver_data, and it shouldn't get >> affected if gser->ioport is null as long as port[port_num].port is valid. Am I >> missing something here? > > This description is not very appropriate, thank you very much for your suggestion, I will modify it. Seems like this patch has already accepted applied to Gregs tree. So I think its fine for now, just update new description here. > >>> >>> Call trace: >>> gs_read_complete+0x58/0x240 >>> usb_gadget_giveback_request+0x40/0x160 >>> dwc3_remove_requests+0x170/0x484 >>> dwc3_ep0_out_start+0xb0/0x1d4 >>> __dwc3_gadget_start+0x25c/0x720 >>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>> udc_bind_to_driver+0x1d8/0x300 >>> usb_gadget_probe_driver+0xa8/0x1dc >>> gadget_dev_desc_UDC_store+0x13c/0x188 >>> configfs_write_iter+0x160/0x1f4 >>> vfs_write+0x2d0/0x40c >>> ksys_write+0x7c/0xf0 >>> __arm64_sys_write+0x20/0x30 >>> invoke_syscall+0x60/0x150 >>> el0_svc_common+0x8c/0xf8 >>> do_el0_svc+0x28/0xa0 >>> el0_svc+0x24/0x84 >>> > > Thanks Regards, Prashanth K ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2024-12-17 7:58 [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 胡连勤 2024-12-24 4:53 ` Prashanth K @ 2025-01-16 13:11 ` Jon Hunter 2025-01-16 13:28 ` gregkh 1 sibling, 1 reply; 17+ messages in thread From: Jon Hunter @ 2025-01-16 13:11 UTC (permalink / raw) To: 胡连勤, gregkh@linuxfoundation.org, Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis Hi Greg, Lianqin, On 17/12/2024 07:58, 胡连勤 wrote: > From: Lianqin Hu <hulianqin@vivo.com> > > Considering that in some extreme cases, when performing the > unbinding operation, gserial_disconnect has cleared gser->ioport, > which triggers gadget reconfiguration, and then calls gs_read_complete, > resulting in access to a null pointer. Therefore, ep is disabled before > gserial_disconnect sets port to null to prevent this from happening. > > Call trace: > gs_read_complete+0x58/0x240 > usb_gadget_giveback_request+0x40/0x160 > dwc3_remove_requests+0x170/0x484 > dwc3_ep0_out_start+0xb0/0x1d4 > __dwc3_gadget_start+0x25c/0x720 > kretprobe_trampoline.cfi_jt+0x0/0x8 > kretprobe_trampoline.cfi_jt+0x0/0x8 > udc_bind_to_driver+0x1d8/0x300 > usb_gadget_probe_driver+0xa8/0x1dc > gadget_dev_desc_UDC_store+0x13c/0x188 > configfs_write_iter+0x160/0x1f4 > vfs_write+0x2d0/0x40c > ksys_write+0x7c/0xf0 > __arm64_sys_write+0x20/0x30 > invoke_syscall+0x60/0x150 > el0_svc_common+0x8c/0xf8 > do_el0_svc+0x28/0xa0 > el0_svc+0x24/0x84 > > Fixes: c1dca562be8a ("usb gadget: split out serial core") > Cc: stable@vger.kernel.org > Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Signed-off-by: Lianqin Hu <hulianqin@vivo.com> > --- > > Changes in v3: > - Add --- line above the version tag information > - Remove extra blank lines in commit messages > - Version tag information from v2 to changes in v2 > - Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/ > > Changes in v2: > - Remove some address information from patch descriptions > - Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/ > - Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/ > > drivers/usb/gadget/function/u_serial.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c > index 53d9fc41acc5..bc143a86c2dd 100644 > --- a/drivers/usb/gadget/function/u_serial.c > +++ b/drivers/usb/gadget/function/u_serial.c > @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) > /* REVISIT as above: how best to track this? */ > port->port_line_coding = gser->port_line_coding; > > + /* disable endpoints, aborting down any active I/O */ > + usb_ep_disable(gser->out); > + usb_ep_disable(gser->in); > + > port->port_usb = NULL; > gser->ioport = NULL; > if (port->port.count > 0) { > @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) > spin_unlock(&port->port_lock); > spin_unlock_irqrestore(&serial_port_lock, flags); > > - /* disable endpoints, aborting down any active I/O */ > - usb_ep_disable(gser->out); > - usb_ep_disable(gser->in); > - > /* finally, free any unused/unusable I/O buffers */ > spin_lock_irqsave(&port->port_lock, flags); > if (port->port.count == 0) We have observed a reboot regression on Tegra234 (I have not tried other boards) and bisect is pointing to this commit. Reverting this on top of mainline is fixing the problem. With this change, when the board reboots we see ... [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU [ 80.917354] rcu: 6-....: (5248 ticks this GP) idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) [ 80.932704] Sending NMI from CPU 6 to CPUs 2: [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.13.0-rc7-00043-g619f0b6fad52 #1 [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 [ 90.981578] sp : ffff800082eb3cd0 [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: ffff0000802933c0 [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: ffff800080132018 [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: ffff80008280d970 [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: 0000000000000000 [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: 0000000000000000 [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : 0000000000000001 [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : ffff800080132018 [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : 0000000000000001 [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : 0000000000000004 [ 90.981599] Call trace: [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 [ 90.981609] wait_rcu_exp_gp+0x18/0x30 [ 90.981611] kthread_worker_fn+0xd0/0x188 [ 90.981614] kthread+0x118/0x11c [ 90.981619] ret_from_fork+0x10/0x20 [ 101.416347] sched: DL replenish lagged too much Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-16 13:11 ` Jon Hunter @ 2025-01-16 13:28 ` gregkh 2025-01-16 15:01 ` Jon Hunter 0 siblings, 1 reply; 17+ messages in thread From: gregkh @ 2025-01-16 13:28 UTC (permalink / raw) To: Jon Hunter Cc: 胡连勤, Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis On Thu, Jan 16, 2025 at 01:11:36PM +0000, Jon Hunter wrote: > Hi Greg, Lianqin, > > On 17/12/2024 07:58, 胡连勤 wrote: > > From: Lianqin Hu <hulianqin@vivo.com> > > > > Considering that in some extreme cases, when performing the > > unbinding operation, gserial_disconnect has cleared gser->ioport, > > which triggers gadget reconfiguration, and then calls gs_read_complete, > > resulting in access to a null pointer. Therefore, ep is disabled before > > gserial_disconnect sets port to null to prevent this from happening. > > > > Call trace: > > gs_read_complete+0x58/0x240 > > usb_gadget_giveback_request+0x40/0x160 > > dwc3_remove_requests+0x170/0x484 > > dwc3_ep0_out_start+0xb0/0x1d4 > > __dwc3_gadget_start+0x25c/0x720 > > kretprobe_trampoline.cfi_jt+0x0/0x8 > > kretprobe_trampoline.cfi_jt+0x0/0x8 > > udc_bind_to_driver+0x1d8/0x300 > > usb_gadget_probe_driver+0xa8/0x1dc > > gadget_dev_desc_UDC_store+0x13c/0x188 > > configfs_write_iter+0x160/0x1f4 > > vfs_write+0x2d0/0x40c > > ksys_write+0x7c/0xf0 > > __arm64_sys_write+0x20/0x30 > > invoke_syscall+0x60/0x150 > > el0_svc_common+0x8c/0xf8 > > do_el0_svc+0x28/0xa0 > > el0_svc+0x24/0x84 > > > > Fixes: c1dca562be8a ("usb gadget: split out serial core") > > Cc: stable@vger.kernel.org > > Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Signed-off-by: Lianqin Hu <hulianqin@vivo.com> > > --- > > > > Changes in v3: > > - Add --- line above the version tag information > > - Remove extra blank lines in commit messages > > - Version tag information from v2 to changes in v2 > > - Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/ > > > > Changes in v2: > > - Remove some address information from patch descriptions > > - Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/ > > - Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/ > > > > drivers/usb/gadget/function/u_serial.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c > > index 53d9fc41acc5..bc143a86c2dd 100644 > > --- a/drivers/usb/gadget/function/u_serial.c > > +++ b/drivers/usb/gadget/function/u_serial.c > > @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) > > /* REVISIT as above: how best to track this? */ > > port->port_line_coding = gser->port_line_coding; > > + /* disable endpoints, aborting down any active I/O */ > > + usb_ep_disable(gser->out); > > + usb_ep_disable(gser->in); > > + > > port->port_usb = NULL; > > gser->ioport = NULL; > > if (port->port.count > 0) { > > @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) > > spin_unlock(&port->port_lock); > > spin_unlock_irqrestore(&serial_port_lock, flags); > > - /* disable endpoints, aborting down any active I/O */ > > - usb_ep_disable(gser->out); > > - usb_ep_disable(gser->in); > > - > > /* finally, free any unused/unusable I/O buffers */ > > spin_lock_irqsave(&port->port_lock, flags); > > if (port->port.count == 0) > > > We have observed a reboot regression on Tegra234 (I have not tried other > boards) and bisect is pointing to this commit. Reverting this on top of > mainline is fixing the problem. > > With this change, when the board reboots we see ... > > [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled > [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled > [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled > [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU > [ 80.917354] rcu: 6-....: (5248 ticks this GP) idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 > [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) > [ 80.932704] Sending NMI from CPU 6 to CPUs 2: > [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.13.0-rc7-00043-g619f0b6fad52 #1 > [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 > [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 > [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > [ 90.981578] sp : ffff800082eb3cd0 > [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: ffff0000802933c0 > [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: ffff800080132018 > [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: ffff80008280d970 > [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: 0000000000000000 > [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: 0000000000000000 > [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : 0000000000000001 > [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : ffff800080132018 > [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : 0000000000000001 > [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : 0000000000000004 > [ 90.981599] Call trace: > [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) > [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 > [ 90.981609] wait_rcu_exp_gp+0x18/0x30 > [ 90.981611] kthread_worker_fn+0xd0/0x188 > [ 90.981614] kthread+0x118/0x11c > [ 90.981619] ret_from_fork+0x10/0x20 > [ 101.416347] sched: DL replenish lagged too much > Odd, you have a usb-serial gadget device in this system that is disconnecting somehow? That oops doesn't point to anything in the usb gadget codebase, "all" we have done is move the call to shutdown the endpoints to earlier in the disconnect function. I'm glad to revert this, but it feels really odd that this is causing you an rcu stall issue. confused, greg k-h ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-16 13:28 ` gregkh @ 2025-01-16 15:01 ` Jon Hunter 2025-01-17 5:04 ` 答复: " 胡连勤 2025-01-21 12:19 ` Jon Hunter 0 siblings, 2 replies; 17+ messages in thread From: Jon Hunter @ 2025-01-16 15:01 UTC (permalink / raw) To: gregkh@linuxfoundation.org Cc: 胡连勤, Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis On 16/01/2025 13:28, gregkh@linuxfoundation.org wrote: > On Thu, Jan 16, 2025 at 01:11:36PM +0000, Jon Hunter wrote: >> Hi Greg, Lianqin, >> >> On 17/12/2024 07:58, 胡连勤 wrote: >>> From: Lianqin Hu <hulianqin@vivo.com> >>> >>> Considering that in some extreme cases, when performing the >>> unbinding operation, gserial_disconnect has cleared gser->ioport, >>> which triggers gadget reconfiguration, and then calls gs_read_complete, >>> resulting in access to a null pointer. Therefore, ep is disabled before >>> gserial_disconnect sets port to null to prevent this from happening. >>> >>> Call trace: >>> gs_read_complete+0x58/0x240 >>> usb_gadget_giveback_request+0x40/0x160 >>> dwc3_remove_requests+0x170/0x484 >>> dwc3_ep0_out_start+0xb0/0x1d4 >>> __dwc3_gadget_start+0x25c/0x720 >>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>> udc_bind_to_driver+0x1d8/0x300 >>> usb_gadget_probe_driver+0xa8/0x1dc >>> gadget_dev_desc_UDC_store+0x13c/0x188 >>> configfs_write_iter+0x160/0x1f4 >>> vfs_write+0x2d0/0x40c >>> ksys_write+0x7c/0xf0 >>> __arm64_sys_write+0x20/0x30 >>> invoke_syscall+0x60/0x150 >>> el0_svc_common+0x8c/0xf8 >>> do_el0_svc+0x28/0xa0 >>> el0_svc+0x24/0x84 >>> >>> Fixes: c1dca562be8a ("usb gadget: split out serial core") >>> Cc: stable@vger.kernel.org >>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com> >>> --- >>> >>> Changes in v3: >>> - Add --- line above the version tag information >>> - Remove extra blank lines in commit messages >>> - Version tag information from v2 to changes in v2 >>> - Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/ >>> >>> Changes in v2: >>> - Remove some address information from patch descriptions >>> - Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/ >>> - Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/ >>> >>> drivers/usb/gadget/function/u_serial.c | 8 ++++---- >>> 1 file changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c >>> index 53d9fc41acc5..bc143a86c2dd 100644 >>> --- a/drivers/usb/gadget/function/u_serial.c >>> +++ b/drivers/usb/gadget/function/u_serial.c >>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) >>> /* REVISIT as above: how best to track this? */ >>> port->port_line_coding = gser->port_line_coding; >>> + /* disable endpoints, aborting down any active I/O */ >>> + usb_ep_disable(gser->out); >>> + usb_ep_disable(gser->in); >>> + >>> port->port_usb = NULL; >>> gser->ioport = NULL; >>> if (port->port.count > 0) { >>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) >>> spin_unlock(&port->port_lock); >>> spin_unlock_irqrestore(&serial_port_lock, flags); >>> - /* disable endpoints, aborting down any active I/O */ >>> - usb_ep_disable(gser->out); >>> - usb_ep_disable(gser->in); >>> - >>> /* finally, free any unused/unusable I/O buffers */ >>> spin_lock_irqsave(&port->port_lock, flags); >>> if (port->port.count == 0) >> >> >> We have observed a reboot regression on Tegra234 (I have not tried other >> boards) and bisect is pointing to this commit. Reverting this on top of >> mainline is fixing the problem. >> >> With this change, when the board reboots we see ... >> >> [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled >> [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled >> [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled >> [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU >> [ 80.917354] rcu: 6-....: (5248 ticks this GP) idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 >> [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) >> [ 80.932704] Sending NMI from CPU 6 to CPUs 2: >> [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.13.0-rc7-00043-g619f0b6fad52 #1 >> [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 >> [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 >> [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 >> [ 90.981578] sp : ffff800082eb3cd0 >> [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: ffff0000802933c0 >> [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: ffff800080132018 >> [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: ffff80008280d970 >> [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: 0000000000000000 >> [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 >> [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: 0000000000000000 >> [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : 0000000000000001 >> [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : ffff800080132018 >> [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : 0000000000000001 >> [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : 0000000000000004 >> [ 90.981599] Call trace: >> [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) >> [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 >> [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 >> [ 90.981609] wait_rcu_exp_gp+0x18/0x30 >> [ 90.981611] kthread_worker_fn+0xd0/0x188 >> [ 90.981614] kthread+0x118/0x11c >> [ 90.981619] ret_from_fork+0x10/0x20 >> [ 101.416347] sched: DL replenish lagged too much >> > > Odd, you have a usb-serial gadget device in this system that is > disconnecting somehow? That oops doesn't point to anything in the usb > gadget codebase, "all" we have done is move the call to shutdown the > endpoints to earlier in the disconnect function. Yes the board starts usb-serial and usb-ethernet gadget and on reboot when tearing it down I am seeing the above. As soon as it disables the tegra-xudc endpoints (as seen above) the board appears to stall. > I'm glad to revert this, but it feels really odd that this is causing > you an rcu stall issue. Thanks. I can't say I understand it either, but I am certain it is caused by this change. Happy to run any tests to narrow this down a bit. Jon -- nvpublic ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-16 15:01 ` Jon Hunter @ 2025-01-17 5:04 ` 胡连勤 2025-01-17 8:24 ` gregkh 2025-01-17 9:48 ` Jon Hunter 2025-01-21 12:19 ` Jon Hunter 1 sibling, 2 replies; 17+ messages in thread From: 胡连勤 @ 2025-01-17 5:04 UTC (permalink / raw) To: Jon Hunter, gregkh@linuxfoundation.org Cc: Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis Hi Jon: > >>> Considering that in some extreme cases, when performing the > >>> unbinding operation, gserial_disconnect has cleared gser->ioport, > >>> which triggers gadget reconfiguration, and then calls > >>> gs_read_complete, resulting in access to a null pointer. Therefore, > >>> ep is disabled before gserial_disconnect sets port to null to prevent this > from happening. > >>> > >>> Call trace: > >>> gs_read_complete+0x58/0x240 > >>> usb_gadget_giveback_request+0x40/0x160 > >>> dwc3_remove_requests+0x170/0x484 > >>> dwc3_ep0_out_start+0xb0/0x1d4 > >>> __dwc3_gadget_start+0x25c/0x720 > >>> kretprobe_trampoline.cfi_jt+0x0/0x8 > >>> kretprobe_trampoline.cfi_jt+0x0/0x8 > >>> udc_bind_to_driver+0x1d8/0x300 > >>> usb_gadget_probe_driver+0xa8/0x1dc > >>> gadget_dev_desc_UDC_store+0x13c/0x188 > >>> configfs_write_iter+0x160/0x1f4 > >>> vfs_write+0x2d0/0x40c > >>> ksys_write+0x7c/0xf0 > >>> __arm64_sys_write+0x20/0x30 > >>> invoke_syscall+0x60/0x150 > >>> el0_svc_common+0x8c/0xf8 > >>> do_el0_svc+0x28/0xa0 > >>> el0_svc+0x24/0x84 > >>> > >>> Fixes: c1dca562be8a ("usb gadget: split out serial core") > >>> Cc: stable@vger.kernel.org > >>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > >>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com> > >>> --- > >>> > >>> Changes in v3: > >>> - Add --- line above the version tag information > >>> - Remove extra blank lines in commit messages > >>> - Version tag information from v2 to changes in v2 > >>> - Link to v2: > >>> > https://lo/ > >>> > re.kernel.org%2Fall%2FTYUPR06MB6217DAA095A9863D4B58D57CD23B2%40T > YUPR > >>> > 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 > viv > >>> > o.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb5821 > a797 > >>> > a6412ed%7C0%7C0%7C638726365197191059%7CUnknown%7CTWFpbGZsb3d > 8eyJFbXB > >>> > 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp > bCI > >>> > sIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=G4l0lGF093J44BuTeGpEYpYvC > MGiK3d > >>> TnR%2Fd2Zn8Q1U%3D&reserved=0 > >>> > >>> Changes in v2: > >>> - Remove some address information from patch descriptions > >>> - Link to v1: > https://lore.k/ > ernel.org%2Fall%2FTYUPR06MB621763AB815989161F4033AFD2762%40TYUPR > 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 > vivo.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb58 > 21a797a6412ed%7C0%7C0%7C638726365197206594%7CUnknown%7CTWFpb > GZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4z > MiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6rGddQJ > L9K%2Bqyr98IXYkf6zM8AzxN6%2FaJZztGybx%2FUw%3D&reserved=0 > >>> - Link to suggestions: > >>> > https://lo/ > >>> > re.kernel.org%2Fall%2FTYUPR06MB6217DE28012FFEC5E808DD64D2962%40TY > UPR > >>> > 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 > viv > >>> > o.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb5821 > a797 > >>> > a6412ed%7C0%7C0%7C638726365197216620%7CUnknown%7CTWFpbGZsb3d > 8eyJFbXB > >>> > 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp > bCI > >>> > sIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=7EvOpENpZd8s6S8PR1D%2B2 > HnQmBLPa > >>> UNotpV5UGSDiDE%3D&reserved=0 > >>> > >>> drivers/usb/gadget/function/u_serial.c | 8 ++++---- > >>> 1 file changed, 4 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/usb/gadget/function/u_serial.c > >>> b/drivers/usb/gadget/function/u_serial.c > >>> index 53d9fc41acc5..bc143a86c2dd 100644 > >>> --- a/drivers/usb/gadget/function/u_serial.c > >>> +++ b/drivers/usb/gadget/function/u_serial.c > >>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) > >>> /* REVISIT as above: how best to track this? */ > >>> port->port_line_coding = gser->port_line_coding; > >>> + /* disable endpoints, aborting down any active I/O */ > >>> + usb_ep_disable(gser->out); > >>> + usb_ep_disable(gser->in); > >>> + > >>> port->port_usb = NULL; > >>> gser->ioport = NULL; > >>> if (port->port.count > 0) { > >>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) > >>> spin_unlock(&port->port_lock); > >>> spin_unlock_irqrestore(&serial_port_lock, flags); > >>> - /* disable endpoints, aborting down any active I/O */ > >>> - usb_ep_disable(gser->out); > >>> - usb_ep_disable(gser->in); > >>> - > >>> /* finally, free any unused/unusable I/O buffers */ > >>> spin_lock_irqsave(&port->port_lock, flags); > >>> if (port->port.count == 0) > >> > >> > >> We have observed a reboot regression on Tegra234 (I have not tried > >> other > >> boards) and bisect is pointing to this commit. Reverting this on top > >> of mainline is fixing the problem. > >> > >> With this change, when the board reboots we see ... > >> > >> [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled > >> [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled > >> [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled > >> [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU > >> [ 80.917354] rcu: 6-....: (5248 ticks this GP) > idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 > >> [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) > >> [ 80.932704] Sending NMI from CPU 6 to CPUs 2: > >> [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted > 6.13.0-rc7-00043-g619f0b6fad52 #1 > >> [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer > Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 > >> [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > >> [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 > >> [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > >> [ 90.981578] sp : ffff800082eb3cd0 > >> [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: > ffff0000802933c0 > >> [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: > ffff800080132018 > >> [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: > ffff80008280d970 > >> [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: > 0000000000000000 > >> [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: > 0000000000000000 > >> [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: > 0000000000000000 > >> [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : > 0000000000000001 > >> [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : > ffff800080132018 > >> [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : > 0000000000000001 > >> [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : > 0000000000000004 > >> [ 90.981599] Call trace: > >> [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) > >> [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > >> [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 > >> [ 90.981609] wait_rcu_exp_gp+0x18/0x30 > >> [ 90.981611] kthread_worker_fn+0xd0/0x188 > >> [ 90.981614] kthread+0x118/0x11c > >> [ 90.981619] ret_from_fork+0x10/0x20 > >> [ 101.416347] sched: DL replenish lagged too much > >> > > > > Odd, you have a usb-serial gadget device in this system that is > > disconnecting somehow? That oops doesn't point to anything in the usb > > gadget codebase, "all" we have done is move the call to shutdown the > > endpoints to earlier in the disconnect function. > > Yes the board starts usb-serial and usb-ethernet gadget and on reboot when > tearing it down I am seeing the above. As soon as it disables the tegra-xudc > endpoints (as seen above) the board appears to stall. > > > I'm glad to revert this, but it feels really odd that this is causing > > you an rcu stall issue. > > Thanks. I can't say I understand it either, but I am certain it is caused by this > change. > > Happy to run any tests to narrow this down a bit. I'm really sorry. If we are sure that the problem is caused by this patch, we will roll it back first. In addition, what are the differences between the serial use of Tegra234 board and mobile phones? When we consider other solutions, we need to take this usage scenario of Tegra234 board into consideration. Thanks ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-17 5:04 ` 答复: " 胡连勤 @ 2025-01-17 8:24 ` gregkh 2025-01-17 9:48 ` Jon Hunter 1 sibling, 0 replies; 17+ messages in thread From: gregkh @ 2025-01-17 8:24 UTC (permalink / raw) To: 胡连勤 Cc: Jon Hunter, Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis On Fri, Jan 17, 2025 at 05:04:30AM +0000, 胡连勤 wrote: > Hi Jon: > > > >>> Considering that in some extreme cases, when performing the > > >>> unbinding operation, gserial_disconnect has cleared gser->ioport, > > >>> which triggers gadget reconfiguration, and then calls > > >>> gs_read_complete, resulting in access to a null pointer. Therefore, > > >>> ep is disabled before gserial_disconnect sets port to null to prevent this > > from happening. > > >>> > > >>> Call trace: > > >>> gs_read_complete+0x58/0x240 > > >>> usb_gadget_giveback_request+0x40/0x160 > > >>> dwc3_remove_requests+0x170/0x484 > > >>> dwc3_ep0_out_start+0xb0/0x1d4 > > >>> __dwc3_gadget_start+0x25c/0x720 > > >>> kretprobe_trampoline.cfi_jt+0x0/0x8 > > >>> kretprobe_trampoline.cfi_jt+0x0/0x8 > > >>> udc_bind_to_driver+0x1d8/0x300 > > >>> usb_gadget_probe_driver+0xa8/0x1dc > > >>> gadget_dev_desc_UDC_store+0x13c/0x188 > > >>> configfs_write_iter+0x160/0x1f4 > > >>> vfs_write+0x2d0/0x40c > > >>> ksys_write+0x7c/0xf0 > > >>> __arm64_sys_write+0x20/0x30 > > >>> invoke_syscall+0x60/0x150 > > >>> el0_svc_common+0x8c/0xf8 > > >>> do_el0_svc+0x28/0xa0 > > >>> el0_svc+0x24/0x84 > > >>> > > >>> Fixes: c1dca562be8a ("usb gadget: split out serial core") > > >>> Cc: stable@vger.kernel.org > > >>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > >>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com> > > >>> --- > > >>> > > >>> Changes in v3: > > >>> - Add --- line above the version tag information > > >>> - Remove extra blank lines in commit messages > > >>> - Version tag information from v2 to changes in v2 > > >>> - Link to v2: > > >>> > > https://lo/ > > >>> > > re.kernel.org%2Fall%2FTYUPR06MB6217DAA095A9863D4B58D57CD23B2%40T > > YUPR > > >>> > > 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 > > viv > > >>> > > o.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb5821 > > a797 > > >>> > > a6412ed%7C0%7C0%7C638726365197191059%7CUnknown%7CTWFpbGZsb3d > > 8eyJFbXB > > >>> > > 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp > > bCI > > >>> > > sIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=G4l0lGF093J44BuTeGpEYpYvC > > MGiK3d > > >>> TnR%2Fd2Zn8Q1U%3D&reserved=0 > > >>> > > >>> Changes in v2: > > >>> - Remove some address information from patch descriptions > > >>> - Link to v1: > > https://lore.k/ > > ernel.org%2Fall%2FTYUPR06MB621763AB815989161F4033AFD2762%40TYUPR > > 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 > > vivo.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb58 > > 21a797a6412ed%7C0%7C0%7C638726365197206594%7CUnknown%7CTWFpb > > GZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4z > > MiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6rGddQJ > > L9K%2Bqyr98IXYkf6zM8AzxN6%2FaJZztGybx%2FUw%3D&reserved=0 > > >>> - Link to suggestions: > > >>> > > https://lo/ > > >>> > > re.kernel.org%2Fall%2FTYUPR06MB6217DE28012FFEC5E808DD64D2962%40TY > > UPR > > >>> > > 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 > > viv > > >>> > > o.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb5821 > > a797 > > >>> > > a6412ed%7C0%7C0%7C638726365197216620%7CUnknown%7CTWFpbGZsb3d > > 8eyJFbXB > > >>> > > 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp > > bCI > > >>> > > sIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=7EvOpENpZd8s6S8PR1D%2B2 > > HnQmBLPa > > >>> UNotpV5UGSDiDE%3D&reserved=0 > > >>> > > >>> drivers/usb/gadget/function/u_serial.c | 8 ++++---- > > >>> 1 file changed, 4 insertions(+), 4 deletions(-) > > >>> > > >>> diff --git a/drivers/usb/gadget/function/u_serial.c > > >>> b/drivers/usb/gadget/function/u_serial.c > > >>> index 53d9fc41acc5..bc143a86c2dd 100644 > > >>> --- a/drivers/usb/gadget/function/u_serial.c > > >>> +++ b/drivers/usb/gadget/function/u_serial.c > > >>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) > > >>> /* REVISIT as above: how best to track this? */ > > >>> port->port_line_coding = gser->port_line_coding; > > >>> + /* disable endpoints, aborting down any active I/O */ > > >>> + usb_ep_disable(gser->out); > > >>> + usb_ep_disable(gser->in); > > >>> + > > >>> port->port_usb = NULL; > > >>> gser->ioport = NULL; > > >>> if (port->port.count > 0) { > > >>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) > > >>> spin_unlock(&port->port_lock); > > >>> spin_unlock_irqrestore(&serial_port_lock, flags); > > >>> - /* disable endpoints, aborting down any active I/O */ > > >>> - usb_ep_disable(gser->out); > > >>> - usb_ep_disable(gser->in); > > >>> - > > >>> /* finally, free any unused/unusable I/O buffers */ > > >>> spin_lock_irqsave(&port->port_lock, flags); > > >>> if (port->port.count == 0) > > >> > > >> > > >> We have observed a reboot regression on Tegra234 (I have not tried > > >> other > > >> boards) and bisect is pointing to this commit. Reverting this on top > > >> of mainline is fixing the problem. > > >> > > >> With this change, when the board reboots we see ... > > >> > > >> [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled > > >> [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled > > >> [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled > > >> [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU > > >> [ 80.917354] rcu: 6-....: (5248 ticks this GP) > > idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 > > >> [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) > > >> [ 80.932704] Sending NMI from CPU 6 to CPUs 2: > > >> [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted > > 6.13.0-rc7-00043-g619f0b6fad52 #1 > > >> [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer > > Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 > > >> [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS > > BTYPE=--) > > >> [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 > > >> [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > > >> [ 90.981578] sp : ffff800082eb3cd0 > > >> [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: > > ffff0000802933c0 > > >> [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: > > ffff800080132018 > > >> [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: > > ffff80008280d970 > > >> [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: > > 0000000000000000 > > >> [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: > > 0000000000000000 > > >> [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: > > 0000000000000000 > > >> [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : > > 0000000000000001 > > >> [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : > > ffff800080132018 > > >> [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : > > 0000000000000001 > > >> [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : > > 0000000000000004 > > >> [ 90.981599] Call trace: > > >> [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) > > >> [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > > >> [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 > > >> [ 90.981609] wait_rcu_exp_gp+0x18/0x30 > > >> [ 90.981611] kthread_worker_fn+0xd0/0x188 > > >> [ 90.981614] kthread+0x118/0x11c > > >> [ 90.981619] ret_from_fork+0x10/0x20 > > >> [ 101.416347] sched: DL replenish lagged too much > > >> > > > > > > Odd, you have a usb-serial gadget device in this system that is > > > disconnecting somehow? That oops doesn't point to anything in the usb > > > gadget codebase, "all" we have done is move the call to shutdown the > > > endpoints to earlier in the disconnect function. > > > > Yes the board starts usb-serial and usb-ethernet gadget and on reboot when > > tearing it down I am seeing the above. As soon as it disables the tegra-xudc > > endpoints (as seen above) the board appears to stall. > > > > > I'm glad to revert this, but it feels really odd that this is causing > > > you an rcu stall issue. > > > > Thanks. I can't say I understand it either, but I am certain it is caused by this > > change. > > > > Happy to run any tests to narrow this down a bit. > > I'm really sorry. If we are sure that the problem is caused by this patch, we will roll it back first. In addition, > what are the differences between the serial use of Tegra234 board and mobile phones? > When we consider other solutions, we need to take this usage scenario of Tegra234 board into consideration. Revert sent here now: https://lore.kernel.org/r/2025011711-yippee-fever-a737@gregkh ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-17 5:04 ` 答复: " 胡连勤 2025-01-17 8:24 ` gregkh @ 2025-01-17 9:48 ` Jon Hunter 1 sibling, 0 replies; 17+ messages in thread From: Jon Hunter @ 2025-01-17 9:48 UTC (permalink / raw) To: 胡连勤, gregkh@linuxfoundation.org Cc: Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis Hi Lianqin, On 17/01/2025 05:04, 胡连勤 wrote: > Hi Jon: > >>>>> Considering that in some extreme cases, when performing the >>>>> unbinding operation, gserial_disconnect has cleared gser->ioport, >>>>> which triggers gadget reconfiguration, and then calls >>>>> gs_read_complete, resulting in access to a null pointer. Therefore, >>>>> ep is disabled before gserial_disconnect sets port to null to prevent this >> from happening. >>>>> >>>>> Call trace: >>>>> gs_read_complete+0x58/0x240 >>>>> usb_gadget_giveback_request+0x40/0x160 >>>>> dwc3_remove_requests+0x170/0x484 >>>>> dwc3_ep0_out_start+0xb0/0x1d4 >>>>> __dwc3_gadget_start+0x25c/0x720 >>>>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>>>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>>>> udc_bind_to_driver+0x1d8/0x300 >>>>> usb_gadget_probe_driver+0xa8/0x1dc >>>>> gadget_dev_desc_UDC_store+0x13c/0x188 >>>>> configfs_write_iter+0x160/0x1f4 >>>>> vfs_write+0x2d0/0x40c >>>>> ksys_write+0x7c/0xf0 >>>>> __arm64_sys_write+0x20/0x30 >>>>> invoke_syscall+0x60/0x150 >>>>> el0_svc_common+0x8c/0xf8 >>>>> do_el0_svc+0x28/0xa0 >>>>> el0_svc+0x24/0x84 >>>>> >>>>> Fixes: c1dca562be8a ("usb gadget: split out serial core") >>>>> Cc: stable@vger.kernel.org >>>>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>>>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com> >>>>> --- >>>>> >>>>> Changes in v3: >>>>> - Add --- line above the version tag information >>>>> - Remove extra blank lines in commit messages >>>>> - Version tag information from v2 to changes in v2 >>>>> - Link to v2: >>>>> >> https://lo/ >>>>> >> re.kernel.org%2Fall%2FTYUPR06MB6217DAA095A9863D4B58D57CD23B2%40T >> YUPR >>>>> >> 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 >> viv >>>>> >> o.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb5821 >> a797 >>>>> >> a6412ed%7C0%7C0%7C638726365197191059%7CUnknown%7CTWFpbGZsb3d >> 8eyJFbXB >>>>> >> 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp >> bCI >>>>> >> sIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=G4l0lGF093J44BuTeGpEYpYvC >> MGiK3d >>>>> TnR%2Fd2Zn8Q1U%3D&reserved=0 >>>>> >>>>> Changes in v2: >>>>> - Remove some address information from patch descriptions >>>>> - Link to v1: >> https://lore.k/ >> ernel.org%2Fall%2FTYUPR06MB621763AB815989161F4033AFD2762%40TYUPR >> 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 >> vivo.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb58 >> 21a797a6412ed%7C0%7C0%7C638726365197206594%7CUnknown%7CTWFpb >> GZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4z >> MiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6rGddQJ >> L9K%2Bqyr98IXYkf6zM8AzxN6%2FaJZztGybx%2FUw%3D&reserved=0 >>>>> - Link to suggestions: >>>>> >> https://lo/ >>>>> >> re.kernel.org%2Fall%2FTYUPR06MB6217DE28012FFEC5E808DD64D2962%40TY >> UPR >>>>> >> 06MB6217.apcprd06.prod.outlook.com%2F&data=05%7C02%7Chulianqin%40 >> viv >>>>> >> o.com%7Cc88bd260faf9470d244f08dd363eb8df%7C923e42dc48d54cbeb5821 >> a797 >>>>> >> a6412ed%7C0%7C0%7C638726365197216620%7CUnknown%7CTWFpbGZsb3d >> 8eyJFbXB >>>>> >> 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp >> bCI >>>>> >> sIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=7EvOpENpZd8s6S8PR1D%2B2 >> HnQmBLPa >>>>> UNotpV5UGSDiDE%3D&reserved=0 >>>>> >>>>> drivers/usb/gadget/function/u_serial.c | 8 ++++---- >>>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/usb/gadget/function/u_serial.c >>>>> b/drivers/usb/gadget/function/u_serial.c >>>>> index 53d9fc41acc5..bc143a86c2dd 100644 >>>>> --- a/drivers/usb/gadget/function/u_serial.c >>>>> +++ b/drivers/usb/gadget/function/u_serial.c >>>>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) >>>>> /* REVISIT as above: how best to track this? */ >>>>> port->port_line_coding = gser->port_line_coding; >>>>> + /* disable endpoints, aborting down any active I/O */ >>>>> + usb_ep_disable(gser->out); >>>>> + usb_ep_disable(gser->in); >>>>> + >>>>> port->port_usb = NULL; >>>>> gser->ioport = NULL; >>>>> if (port->port.count > 0) { >>>>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) >>>>> spin_unlock(&port->port_lock); >>>>> spin_unlock_irqrestore(&serial_port_lock, flags); >>>>> - /* disable endpoints, aborting down any active I/O */ >>>>> - usb_ep_disable(gser->out); >>>>> - usb_ep_disable(gser->in); >>>>> - >>>>> /* finally, free any unused/unusable I/O buffers */ >>>>> spin_lock_irqsave(&port->port_lock, flags); >>>>> if (port->port.count == 0) >>>> >>>> >>>> We have observed a reboot regression on Tegra234 (I have not tried >>>> other >>>> boards) and bisect is pointing to this commit. Reverting this on top >>>> of mainline is fixing the problem. >>>> >>>> With this change, when the board reboots we see ... >>>> >>>> [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled >>>> [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled >>>> [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled >>>> [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU >>>> [ 80.917354] rcu: 6-....: (5248 ticks this GP) >> idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 >>>> [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) >>>> [ 80.932704] Sending NMI from CPU 6 to CPUs 2: >>>> [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted >> 6.13.0-rc7-00043-g619f0b6fad52 #1 >>>> [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer >> Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 >>>> [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS >> BTYPE=--) >>>> [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 >>>> [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 >>>> [ 90.981578] sp : ffff800082eb3cd0 >>>> [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: >> ffff0000802933c0 >>>> [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: >> ffff800080132018 >>>> [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: >> ffff80008280d970 >>>> [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: >> 0000000000000000 >>>> [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: >> 0000000000000000 >>>> [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: >> 0000000000000000 >>>> [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : >> 0000000000000001 >>>> [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : >> ffff800080132018 >>>> [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : >> 0000000000000001 >>>> [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : >> 0000000000000004 >>>> [ 90.981599] Call trace: >>>> [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) >>>> [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 >>>> [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 >>>> [ 90.981609] wait_rcu_exp_gp+0x18/0x30 >>>> [ 90.981611] kthread_worker_fn+0xd0/0x188 >>>> [ 90.981614] kthread+0x118/0x11c >>>> [ 90.981619] ret_from_fork+0x10/0x20 >>>> [ 101.416347] sched: DL replenish lagged too much >>>> >>> >>> Odd, you have a usb-serial gadget device in this system that is >>> disconnecting somehow? That oops doesn't point to anything in the usb >>> gadget codebase, "all" we have done is move the call to shutdown the >>> endpoints to earlier in the disconnect function. >> >> Yes the board starts usb-serial and usb-ethernet gadget and on reboot when >> tearing it down I am seeing the above. As soon as it disables the tegra-xudc >> endpoints (as seen above) the board appears to stall. >> >>> I'm glad to revert this, but it feels really odd that this is causing >>> you an rcu stall issue. >> >> Thanks. I can't say I understand it either, but I am certain it is caused by this >> change. >> >> Happy to run any tests to narrow this down a bit. > > I'm really sorry. If we are sure that the problem is caused by this patch, we will roll it back first. In addition, > what are the differences between the serial use of Tegra234 board and mobile phones? > When we consider other solutions, we need to take this usage scenario of Tegra234 board into consideration. Nothing specific that I am aware of. We just expose a serial and ethernet interface over USB for interfacing with the platform. I will see if I can narrow down the scenario a bit more. Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-16 15:01 ` Jon Hunter 2025-01-17 5:04 ` 答复: " 胡连勤 @ 2025-01-21 12:19 ` Jon Hunter 2025-01-23 14:56 ` Dan Carpenter 1 sibling, 1 reply; 17+ messages in thread From: Jon Hunter @ 2025-01-21 12:19 UTC (permalink / raw) To: gregkh@linuxfoundation.org Cc: 胡连勤, Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis On 16/01/2025 15:01, Jon Hunter wrote: > > On 16/01/2025 13:28, gregkh@linuxfoundation.org wrote: >> On Thu, Jan 16, 2025 at 01:11:36PM +0000, Jon Hunter wrote: >>> Hi Greg, Lianqin, >>> >>> On 17/12/2024 07:58, 胡连勤 wrote: >>>> From: Lianqin Hu <hulianqin@vivo.com> >>>> >>>> Considering that in some extreme cases, when performing the >>>> unbinding operation, gserial_disconnect has cleared gser->ioport, >>>> which triggers gadget reconfiguration, and then calls gs_read_complete, >>>> resulting in access to a null pointer. Therefore, ep is disabled before >>>> gserial_disconnect sets port to null to prevent this from happening. >>>> >>>> Call trace: >>>> gs_read_complete+0x58/0x240 >>>> usb_gadget_giveback_request+0x40/0x160 >>>> dwc3_remove_requests+0x170/0x484 >>>> dwc3_ep0_out_start+0xb0/0x1d4 >>>> __dwc3_gadget_start+0x25c/0x720 >>>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>>> kretprobe_trampoline.cfi_jt+0x0/0x8 >>>> udc_bind_to_driver+0x1d8/0x300 >>>> usb_gadget_probe_driver+0xa8/0x1dc >>>> gadget_dev_desc_UDC_store+0x13c/0x188 >>>> configfs_write_iter+0x160/0x1f4 >>>> vfs_write+0x2d0/0x40c >>>> ksys_write+0x7c/0xf0 >>>> __arm64_sys_write+0x20/0x30 >>>> invoke_syscall+0x60/0x150 >>>> el0_svc_common+0x8c/0xf8 >>>> do_el0_svc+0x28/0xa0 >>>> el0_svc+0x24/0x84 >>>> >>>> Fixes: c1dca562be8a ("usb gadget: split out serial core") >>>> Cc: stable@vger.kernel.org >>>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com> >>>> --- >>>> >>>> Changes in v3: >>>> - Add --- line above the version tag information >>>> - Remove extra blank lines in commit messages >>>> - Version tag information from v2 to changes in v2 >>>> - Link to v2: https://lore.kernel.org/all/ >>>> TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/ >>>> >>>> Changes in v2: >>>> - Remove some address information from patch descriptions >>>> - Link to v1: https://lore.kernel.org/all/ >>>> TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/ >>>> - Link to suggestions: https://lore.kernel.org/all/ >>>> TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/ >>>> >>>> drivers/usb/gadget/function/u_serial.c | 8 ++++---- >>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/ >>>> gadget/function/u_serial.c >>>> index 53d9fc41acc5..bc143a86c2dd 100644 >>>> --- a/drivers/usb/gadget/function/u_serial.c >>>> +++ b/drivers/usb/gadget/function/u_serial.c >>>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser) >>>> /* REVISIT as above: how best to track this? */ >>>> port->port_line_coding = gser->port_line_coding; >>>> + /* disable endpoints, aborting down any active I/O */ >>>> + usb_ep_disable(gser->out); >>>> + usb_ep_disable(gser->in); >>>> + >>>> port->port_usb = NULL; >>>> gser->ioport = NULL; >>>> if (port->port.count > 0) { >>>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser) >>>> spin_unlock(&port->port_lock); >>>> spin_unlock_irqrestore(&serial_port_lock, flags); >>>> - /* disable endpoints, aborting down any active I/O */ >>>> - usb_ep_disable(gser->out); >>>> - usb_ep_disable(gser->in); >>>> - >>>> /* finally, free any unused/unusable I/O buffers */ >>>> spin_lock_irqsave(&port->port_lock, flags); >>>> if (port->port.count == 0) >>> >>> >>> We have observed a reboot regression on Tegra234 (I have not tried other >>> boards) and bisect is pointing to this commit. Reverting this on top of >>> mainline is fixing the problem. >>> >>> With this change, when the board reboots we see ... >>> >>> [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled >>> [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled >>> [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled >>> [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU >>> [ 80.917354] rcu: 6-....: (5248 ticks this GP) >>> idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 >>> [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) >>> [ 80.932704] Sending NMI from CPU 6 to CPUs 2: >>> [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not >>> tainted 6.13.0-rc7-00043-g619f0b6fad52 #1 >>> [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer >>> Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024 >>> [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS >>> BTYPE=--) >>> [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 >>> [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 >>> [ 90.981578] sp : ffff800082eb3cd0 >>> [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: >>> ffff0000802933c0 >>> [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: >>> ffff800080132018 >>> [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: >>> ffff80008280d970 >>> [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: >>> 0000000000000000 >>> [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: >>> 0000000000000000 >>> [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: >>> 0000000000000000 >>> [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : >>> 0000000000000001 >>> [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : >>> ffff800080132018 >>> [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : >>> 0000000000000001 >>> [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : >>> 0000000000000004 >>> [ 90.981599] Call trace: >>> [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) >>> [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 >>> [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 >>> [ 90.981609] wait_rcu_exp_gp+0x18/0x30 >>> [ 90.981611] kthread_worker_fn+0xd0/0x188 >>> [ 90.981614] kthread+0x118/0x11c >>> [ 90.981619] ret_from_fork+0x10/0x20 >>> [ 101.416347] sched: DL replenish lagged too much >>> >> >> Odd, you have a usb-serial gadget device in this system that is >> disconnecting somehow? That oops doesn't point to anything in the usb >> gadget codebase, "all" we have done is move the call to shutdown the >> endpoints to earlier in the disconnect function. > > Yes the board starts usb-serial and usb-ethernet gadget and on reboot > when tearing it down I am seeing the above. As soon as it disables the > tegra-xudc endpoints (as seen above) the board appears to stall. > >> I'm glad to revert this, but it feels really odd that this is causing >> you an rcu stall issue. > > Thanks. I can't say I understand it either, but I am certain it is > caused by this change. > > Happy to run any tests to narrow this down a bit. I did a bit more looking at this and I see that we setup a USB gadget device via the configfs as described in this doc [0]. The RCU stall occurs when we attempt to disable the gadget on shutdown by ... $ echo "" > /path/to/UDC Jon [0] https://docs.kernel.org/usb/gadget_configfs.html -- nvpublic ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-21 12:19 ` Jon Hunter @ 2025-01-23 14:56 ` Dan Carpenter 2025-02-05 6:38 ` 答复: " 胡连勤 0 siblings, 1 reply; 17+ messages in thread From: Dan Carpenter @ 2025-01-23 14:56 UTC (permalink / raw) To: Jon Hunter Cc: gregkh@linuxfoundation.org, 胡连勤, Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis, Harshit Mogalapalli On Tue, Jan 21, 2025 at 12:19:37PM +0000, Jon Hunter wrote: > > > > [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled > > > > [ 59.923097] tegra-xudc 3550000.usb: ep 2 disabled > > > > [ 59.927955] tegra-xudc 3550000.usb: ep 5 disabled > > > > [ 80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU > > > > [ 80.917354] rcu: 6-....: (5248 ticks this GP) > > > > idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 > > > > [ 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) > > > > [ 80.932704] Sending NMI from CPU 6 to CPUs 2: > > > > [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not > > > > tainted 6.13.0-rc7-00043-g619f0b6fad52 #1 > > > > [ 90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin > > > > Developer Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de > > > > 12/16/2024 > > > > [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT > > > > -SSBS BTYPE=--) > > > > [ 90.981562] pc : smp_call_function_single+0xdc/0x1a0 > > > > [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > > > > [ 90.981578] sp : ffff800082eb3cd0 > > > > [ 90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: > > > > ffff0000802933c0 > > > > [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: > > > > ffff800080132018 > > > > [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: > > > > ffff80008280d970 > > > > [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: > > > > 0000000000000000 > > > > [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: > > > > 0000000000000000 > > > > [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: > > > > 0000000000000000 > > > > [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : > > > > 0000000000000001 > > > > [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : > > > > ffff800080132018 > > > > [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : > > > > 0000000000000001 > > > > [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : > > > > 0000000000000004 > > > > [ 90.981599] Call trace: > > > > [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) > > > > [ 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > > > > [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 > > > > [ 90.981609] wait_rcu_exp_gp+0x18/0x30 > > > > [ 90.981611] kthread_worker_fn+0xd0/0x188 > > > > [ 90.981614] kthread+0x118/0x11c > > > > [ 90.981619] ret_from_fork+0x10/0x20 > > > > [ 101.416347] sched: DL replenish lagged too much > > > > > > > > > > Odd, you have a usb-serial gadget device in this system that is > > > disconnecting somehow? That oops doesn't point to anything in the usb > > > gadget codebase, "all" we have done is move the call to shutdown the > > > endpoints to earlier in the disconnect function. > > > > Yes the board starts usb-serial and usb-ethernet gadget and on reboot > > when tearing it down I am seeing the above. As soon as it disables the > > tegra-xudc endpoints (as seen above) the board appears to stall. > > > > > I'm glad to revert this, but it feels really odd that this is causing > > > you an rcu stall issue. > > > > Thanks. I can't say I understand it either, but I am certain it is > > caused by this change. > > > > Happy to run any tests to narrow this down a bit. > > > I did a bit more looking at this and I see that we setup a USB gadget device > via the configfs as described in this doc [0]. The RCU stall occurs when we > attempt to disable the gadget on shutdown by ... > > $ echo "" > /path/to/UDC > > Jon > > [0] https://docs.kernel.org/usb/gadget_configfs.html It's an RCU stall so it's probably a locking bug. The original code dropped the port->port_lock, called usb_ep_disable() then took the lock again. There must have been a reason to drop the lock although I haven't found it yet. The new code moved the usb_ep_disable() under the lock. 1431 port->suspended = false; 1432 spin_unlock(&port->port_lock); 1433 spin_unlock_irqrestore(&serial_port_lock, flags); 1434 1435 /* disable endpoints, aborting down any active I/O */ 1436 usb_ep_disable(gser->out); 1437 usb_ep_disable(gser->in); 1438 1439 /* finally, free any unused/unusable I/O buffers */ 1440 spin_lock_irqsave(&port->port_lock, flags); 1441 if (port->port.count == 0) 1442 kfifo_free(&port->port_write_buf); It might help to try testing again with lockdep enabled. regards, dan carpenter ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-01-23 14:56 ` Dan Carpenter @ 2025-02-05 6:38 ` 胡连勤 2025-02-05 19:29 ` Pelle Windestam 0 siblings, 1 reply; 17+ messages in thread From: 胡连勤 @ 2025-02-05 6:38 UTC (permalink / raw) To: gregkh@linuxfoundation.org, Dan Carpenter, Jon Hunter Cc: Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis, Harshit Mogalapalli Hi Greg, Dan,Jon, other kernel experts: > On Tue, Jan 21, 2025 at 12:19:37PM +0000, Jon Hunter wrote: > > > > > [ 59.918177] tegra-xudc 3550000.usb: ep 3 disabled [ > > > > > 59.923097] tegra-xudc 3550000.usb: ep 2 disabled [ 59.927955] > > > > > tegra-xudc 3550000.usb: ep 5 disabled [ 80.911432] rcu: INFO: > > > > > rcu_preempt self-detected stall on CPU [ 80.917354] rcu: > > > > > 6-....: (5248 ticks this GP) > > > > > idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623 [ > > > > > 80.927146] rcu: (t=5253 jiffies g=3781 q=1490 ncpus=12) [ > > > > > 80.932704] Sending NMI from CPU 6 to CPUs 2: > > > > > [ 90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not > > > > > tainted 6.13.0-rc7-00043-g619f0b6fad52 #1 [ 90.981558] > > > > > Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer > > > > > Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de > > > > > 12/16/2024 > > > > > [ 90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT > > > > > -SSBS BTYPE=--) [ 90.981562] pc : > > > > > smp_call_function_single+0xdc/0x1a0 > > > > > [ 90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > > > > > [ 90.981578] sp : ffff800082eb3cd0 [ 90.981579] x29: > > > > > ffff800082eb3cd0 x28: 0000000000000010 x27: > > > > > ffff0000802933c0 > > > > > [ 90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: > > > > > ffff800080132018 > > > > > [ 90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: > > > > > ffff80008280d970 > > > > > [ 90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: > > > > > 0000000000000000 > > > > > [ 90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: > > > > > 0000000000000000 > > > > > [ 90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: > > > > > 0000000000000000 > > > > > [ 90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : > > > > > 0000000000000001 > > > > > [ 90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : > > > > > ffff800080132018 > > > > > [ 90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : > > > > > 0000000000000001 > > > > > [ 90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : > > > > > 0000000000000004 > > > > > [ 90.981599] Call trace: > > > > > [ 90.981601] smp_call_function_single+0xdc/0x1a0 (P) [ > > > > > 90.981605] __sync_rcu_exp_select_node_cpus+0x228/0x3c0 > > > > > [ 90.981607] sync_rcu_exp_select_cpus+0x13c/0x2a0 > > > > > [ 90.981609] wait_rcu_exp_gp+0x18/0x30 [ 90.981611] > > > > > kthread_worker_fn+0xd0/0x188 [ 90.981614] kthread+0x118/0x11c > > > > > [ 90.981619] ret_from_fork+0x10/0x20 [ 101.416347] sched: DL > > > > > replenish lagged too much > > > > > > > > > > > > > Odd, you have a usb-serial gadget device in this system that is > > > > disconnecting somehow? That oops doesn't point to anything in the > > > > usb gadget codebase, "all" we have done is move the call to > > > > shutdown the endpoints to earlier in the disconnect function. > > > > > > Yes the board starts usb-serial and usb-ethernet gadget and on > > > reboot when tearing it down I am seeing the above. As soon as it > > > disables the tegra-xudc endpoints (as seen above) the board appears to > stall. > > > > > > > I'm glad to revert this, but it feels really odd that this is > > > > causing you an rcu stall issue. > > > > > > Thanks. I can't say I understand it either, but I am certain it is > > > caused by this change. > > > > > > Happy to run any tests to narrow this down a bit. > > > > > > I did a bit more looking at this and I see that we setup a USB gadget > > device via the configfs as described in this doc [0]. The RCU stall > > occurs when we attempt to disable the gadget on shutdown by ... > > > > $ echo "" > /path/to/UDC > > > > Jon > > > > [0] https://docs.kernel.org/usb/gadget_configfs.html > > It's an RCU stall so it's probably a locking bug. The original code dropped the > port->port_lock, called usb_ep_disable() then took the lock again. There > must have been a reason to drop the lock although I haven't found it yet. > The new code moved the usb_ep_disable() under the lock. > > 1431 port->suspended = false; > 1432 spin_unlock(&port->port_lock); > 1433 spin_unlock_irqrestore(&serial_port_lock, flags); > 1434 > 1435 /* disable endpoints, aborting down any active I/O */ > 1436 usb_ep_disable(gser->out); > 1437 usb_ep_disable(gser->in); > 1438 > 1439 /* finally, free any unused/unusable I/O buffers */ > 1440 spin_lock_irqsave(&port->port_lock, flags); > 1441 if (port->port.count == 0) > 1442 kfifo_free(&port->port_write_buf); > > > It might help to try testing again with lockdep enabled. > To solve this problem, I plan to refer to the methods in gserial_suspend and gserial_resume functions and use spin lock to solve the competition between gs_read_complete and gserial_disconnect. The patch is as follows: Considering that in some extreme cases, when the unbind operation is being executed, gserial_disconnect has already cleared gser->ioport, triggering a gadget reconfiguration at this time and gs_read_complete gets called afterwards, which results in accessing null pointer, add a null pointer check to prevent this situation. Added a static spinlock to prevent gser->ioport from becoming null after the newly added check. Call trace: gs_read_complete+0x58/0x240 usb_gadget_giveback_request+0x40/0x160 dwc3_remove_requests+0x170/0x484 dwc3_ep0_out_start+0xb0/0x1d4 __dwc3_gadget_start+0x25c/0x720 kretprobe_trampoline.cfi_jt+0x0/0x8 kretprobe_trampoline.cfi_jt+0x0/0x8 udc_bind_to_driver+0x1d8/0x300 usb_gadget_probe_driver+0xa8/0x1dc gadget_dev_desc_UDC_store+0x13c/0x188 configfs_write_iter+0x160/0x1f4 vfs_write+0x2d0/0x40c ksys_write+0x7c/0xf0 __arm64_sys_write+0x20/0x30 invoke_syscall+0x60/0x150 el0_svc_common+0x8c/0xf8 do_el0_svc+0x28/0xa0 el0_svc+0x24/0x84 Fixes: c1dca562be8a ("usb gadget: split out serial core") Cc: stable@vger.kernel.org Suggested-by: Prashanth K <quic_prashk@quicinc.com> Signed-off-by: Lianqin Hu <hulianqin@vivo.com> drivers/usb/gadget/function/u_serial.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c index 0a8c05b2746b..6f2c9715e8df 100644 --- a/drivers/usb/gadget/function/u_serial.c +++ b/drivers/usb/gadget/function/u_serial.c @@ -456,13 +456,24 @@ static void gs_rx_push(struct work_struct *work) static void gs_read_complete(struct usb_ep *ep, struct usb_request *req) { - struct gs_port *port = ep->driver_data; + struct gs_port *port; + unsigned long flags; + + spin_lock_irqsave(&serial_port_lock, flags); + port = ep->driver_data; + + if (!port) { + spin_unlock_irqrestore(&serial_port_lock, flags); + return; + } - /* Queue all received data until the tty layer is ready for it. */ spin_lock(&port->port_lock); + spin_unlock(&serial_port_lock); + + /* Queue all received data until the tty layer is ready for it. */ list_add_tail(&req->list, &port->read_queue); schedule_delayed_work(&port->push, 0); - spin_unlock(&port->port_lock); + spin_unlock_irqrestore(&port->port_lock, flags); } static void gs_write_complete(struct usb_ep *ep, struct usb_request *req) -- 2.39.0 Please ask Linux kernel experts to help evaluate whether it is feasible. Thanks ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-02-05 6:38 ` 答复: " 胡连勤 @ 2025-02-05 19:29 ` Pelle Windestam 2025-02-06 2:36 ` 答复: " 胡连勤 2025-02-06 4:24 ` Prashanth K 0 siblings, 2 replies; 17+ messages in thread From: Pelle Windestam @ 2025-02-05 19:29 UTC (permalink / raw) To: 胡连勤, gregkh@linuxfoundation.org, Dan Carpenter, Jon Hunter Cc: Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis, Harshit Mogalapalli >>>>> Odd, you have a usb-serial gadget device in this system that is >>>>> disconnecting somehow? That oops doesn't point to anything in the >>>>> usb gadget codebase, "all" we have done is move the call to >>>>> shutdown the endpoints to earlier in the disconnect function. >>>> >>>> Yes the board starts usb-serial and usb-ethernet gadget and on >>>> reboot when tearing it down I am seeing the above. As soon as it >>>> disables the tegra-xudc endpoints (as seen above) the board appears to >> stall. >>>> >>>>> I'm glad to revert this, but it feels really odd that this is >>>>> causing you an rcu stall issue. >>>> >>>> Thanks. I can't say I understand it either, but I am certain it is >>>> caused by this change. >>>> I do not have much to add in terms of solutions but want to chime in that the same issue happened to me the other day when I upgraded my kernel. It manifests itself with the rcu stall whenever I try to reboot my device with the USB-cable connected (it is a usb-serial gadget device). Moving the usb_ep_disable() calls to outside the lock (where they were before the patch) solves it. ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-02-05 19:29 ` Pelle Windestam @ 2025-02-06 2:36 ` 胡连勤 2025-02-06 4:24 ` Prashanth K 1 sibling, 0 replies; 17+ messages in thread From: 胡连勤 @ 2025-02-06 2:36 UTC (permalink / raw) To: Pelle Windestam, gregkh@linuxfoundation.org, Dan Carpenter, Jon Hunter Cc: Prashanth K, mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis, Harshit Mogalapalli Hi Pelle Windestam: > >>>>> Odd, you have a usb-serial gadget device in this system that is > >>>>> disconnecting somehow? That oops doesn't point to anything in the > >>>>> usb gadget codebase, "all" we have done is move the call to > >>>>> shutdown the endpoints to earlier in the disconnect function. > >>>> > >>>> Yes the board starts usb-serial and usb-ethernet gadget and on > >>>> reboot when tearing it down I am seeing the above. As soon as it > >>>> disables the tegra-xudc endpoints (as seen above) the board appears > >>>> to > >> stall. > >>>> > >>>>> I'm glad to revert this, but it feels really odd that this is > >>>>> causing you an rcu stall issue. > >>>> > >>>> Thanks. I can't say I understand it either, but I am certain it is > >>>> caused by this change. > >>>> > > I do not have much to add in terms of solutions but want to chime in that the > same issue happened to me the other day when I upgraded my kernel. It > manifests itself with the rcu stall whenever I try to reboot my device with the > USB-cable connected (it is a usb-serial gadget device). Moving the > usb_ep_disable() calls to outside the lock (where they were before the > patch) solves it. I apologize again. I didn't consider it comprehensively. The modification plan has introduced serious negative problems for you. The main line of the patch has been rolled back. https://lore.kernel.org/all/2025011711-yippee-fever-a737@gregkh/ Thanks ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-02-05 19:29 ` Pelle Windestam 2025-02-06 2:36 ` 答复: " 胡连勤 @ 2025-02-06 4:24 ` Prashanth K 2025-02-06 7:47 ` Pelle Windestam 1 sibling, 1 reply; 17+ messages in thread From: Prashanth K @ 2025-02-06 4:24 UTC (permalink / raw) To: Pelle Windestam, 胡连勤, gregkh@linuxfoundation.org, Dan Carpenter, Jon Hunter Cc: mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis, Harshit Mogalapalli On 06-02-25 12:59 am, Pelle Windestam wrote: >>>>>> Odd, you have a usb-serial gadget device in this system that is >>>>>> disconnecting somehow? That oops doesn't point to anything in the >>>>>> usb gadget codebase, "all" we have done is move the call to >>>>>> shutdown the endpoints to earlier in the disconnect function. >>>>> >>>>> Yes the board starts usb-serial and usb-ethernet gadget and on >>>>> reboot when tearing it down I am seeing the above. As soon as it >>>>> disables the tegra-xudc endpoints (as seen above) the board appears to >>> stall. >>>>> >>>>>> I'm glad to revert this, but it feels really odd that this is >>>>>> causing you an rcu stall issue. >>>>> >>>>> Thanks. I can't say I understand it either, but I am certain it is >>>>> caused by this change. >>>>> > > I do not have much to add in terms of solutions but want to chime in > that the same issue happened to me the other day when I upgraded my > kernel. It manifests itself with the rcu stall whenever I try to reboot > my device with the USB-cable connected (it is a usb-serial gadget > device). Moving the usb_ep_disable() calls to outside the lock (where > they were before the patch) solves it. Are you also using tegra-xudc ? ep_disable routine may be called in an atomic (interrupt) context. Regards, Prashanth K ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 2025-02-06 4:24 ` Prashanth K @ 2025-02-06 7:47 ` Pelle Windestam 0 siblings, 0 replies; 17+ messages in thread From: Pelle Windestam @ 2025-02-06 7:47 UTC (permalink / raw) To: Prashanth K, 胡连勤, gregkh@linuxfoundation.org, Dan Carpenter, Jon Hunter Cc: mwalle@kernel.org, quic_jjohnson@quicinc.com, David Brownell, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, opensource.kernel, linux-tegra@vger.kernel.org, Brad Griffis, Harshit Mogalapalli > Are you also using tegra-xudc ? ep_disable routine may be called in an > atomic (interrupt) context. Nope, I have a board based on a NXP iMX7D CPU, I'm not 100% sure what the tegra-xudc equivalent would be for that. //Pelle ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-02-06 7:47 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-17 7:58 [PATCH v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null 胡连勤 2024-12-24 4:53 ` Prashanth K 2024-12-25 7:03 ` 答复: " 胡连勤 2024-12-26 4:59 ` Prashanth K 2025-01-16 13:11 ` Jon Hunter 2025-01-16 13:28 ` gregkh 2025-01-16 15:01 ` Jon Hunter 2025-01-17 5:04 ` 答复: " 胡连勤 2025-01-17 8:24 ` gregkh 2025-01-17 9:48 ` Jon Hunter 2025-01-21 12:19 ` Jon Hunter 2025-01-23 14:56 ` Dan Carpenter 2025-02-05 6:38 ` 答复: " 胡连勤 2025-02-05 19:29 ` Pelle Windestam 2025-02-06 2:36 ` 答复: " 胡连勤 2025-02-06 4:24 ` Prashanth K 2025-02-06 7:47 ` Pelle Windestam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox