* Question of possible concurrent xhci debugfs file
@ 2023-03-28 6:30 YoungJun.Park
2023-03-28 10:42 ` Greg KH
2023-03-30 12:34 ` Mathias Nyman
0 siblings, 2 replies; 3+ messages in thread
From: YoungJun.Park @ 2023-03-28 6:30 UTC (permalink / raw)
To: linux-usb; +Cc: youngjun.park, her0gyu
I got a panic dump which happend on kernel(
I check the both of CentOS and mainline kernel
And I assume it could be happend on mainline kernel.
(like I said assume)
I think xhci-debugfs file creation check the creation of
NULL parent dentry. Is it right? (drivers/usb/host/xhci-debugfs.c)
The contents below is the what I analyze.
1. A lot of usb error
...
[1994159.974407] usb 1-2: device descriptor read/64, error -71
[1994160.187902] usb 1-2: new low-speed USB device number 36 using xhci_hcd
[1994160.302634] usb 1-2: device descriptor read/64, error -71
[1994160.562027] usb 1-2: device descriptor read/64, error -71
[1994160.663789] usb usb1-port2: unable to enumerate USB device
[1994162.256029] usb 1-2: new low-speed USB device number 37 using xhci_hcd
[1994162.370764] usb 1-2: device descriptor read/64, error -71
[1994162.585258] usb 1-2: device descriptor read/64, error -71
[1994162.797751] usb 1-2: new low-speed USB device number 38 using xhci_hcd
[1994162.912484] usb 1-2: device descriptor read/64, error -71
[1994163.141944] usb 1-2: device descriptor read/64, error -71
[1994163.243712] usb usb1-port2: attempt power cycle.
...
2. After that panic happend and I got a dump.
The cause of panic reading "/sys/kernel/debug/trbs" which is abnormal.
(which have bad xhci_ring private data)
the file must be on the "/sys/kernel/debug/usb/~~~".
sh> bt
PID: 91416 TASK: ffff8d68fcc54200 CPU: 1 COMMAND: "fbmons"
#0 [ffff8d679a6cbad0] machine_kexec at ffffffff9e2662c4
#1 [ffff8d679a6cbb30] __crash_kexec at ffffffff9e322a32
#2 [ffff8d679a6cbc00] crash_kexec at ffffffff9e322b20
#3 [ffff8d679a6cbc18] oops_end at ffffffff9e98d798
#4 [ffff8d679a6cbc40] no_context at ffffffff9e275d14
#5 [ffff8d679a6cbc90] __bad_area_nosemaphore at ffffffff9e275fe2
#6 [ffff8d679a6cbce0] bad_area_nosemaphore at ffffffff9e276104
#7 [ffff8d679a6cbcf0] __do_page_fault at ffffffff9e990750
#8 [ffff8d679a6cbd60] do_page_fault at ffffffff9e990975
#9 [ffff8d679a6cbd90] page_fault at ffffffff9e98c778
[exception RIP: xhci_ring_trb_show+29]
RIP: ffffffff9e76005d RSP: ffff8d679a6cbe40 RFLAGS: 00010246
RAX: ffff8d497b0fe018 RBX: 0000000000000000 RCX: 0000001309207f9b
RDX: fffffffffffffff4 RSI: 0000000000000001 RDI: ffff8d6939b8fd40
RBP: ffff8d679a6cbe60 R8: ffff8d49ffa5f1a0 R9: ffff8d3b3fc07300
R10: ffff8d3b3fc07300 R11: ffffffff9e3de9fd R12: 0000000000000000
R13: 0000000000000000 R14: ffff8d6939b8fd40 R15: ffff8d6939b8fd40
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8d679a6cbe68] seq_read at ffffffff9e476d10
#11 [ffff8d679a6cbed8] vfs_read at ffffffff9e44e3ff
#12 [ffff8d679a6cbf08] sys_read at ffffffff9e44f27f
#13 [ffff8d679a6cbf50] system_call_fastpath at ffffffff9e995f92
RIP: 00007f5f9749f6fd RSP: 00007f5f84c73700 RFLAGS: 00000293
RAX: 0000000000000000 RBX: 00007f5f7066a390 RCX: ffffffffffffffff
RDX: 0000000000000400 RSI: 00007f5f706d2270 RDI: 0000000000000016
RBP: 00007f5f706d2270 R8: 00007f5f953e73a0 R9: 00007f5f953e7388
R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000400
R13: 00007f5f84c7354c R14: 0000000022100004 R15: 00007f5f866c5368
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
3. I found a another abnormal xhci debugfs file ep_context in dump.
Check xhci_slot_priv is alive and find the root is NULL.
crash> files -d 0xffff8d68fe8f0000
DENTRY INODE SUPERBLK TYPE PATH
ffff8d68fe8f0000 ffff8d489e979e10 ffff8d3b19169800 REG /sys/kernel/debug/ep-context
crash > struct xchi_slot_priv 0xffff8d4812492c0
struct xhci_slot_priv {
...
root = 0x0,
...
dev = 0xffff8d497b0fe000
}
4. Looking into the mainline kernel code, I finally concluded that
/sys/fs/debug/"interface file" could be made.
drivers/usb/host/xhci-debugfs.c
xhci_debugfs_create_slot function
...
priv->root = debugfs_create_dir(priv->name, xhci->debugfs_slots); // root can be NULL
...
xhci_debugfs_create_ring_dir(xhci, &dev->eps[0].ring,
"ep00", priv->root); // make a file
...
fs/debugfs/inode
start_creating function
...
if (!parent)
parent = debugfs_mount->mnt_root; // make a file on /sys/kernel/debug
...
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question of possible concurrent xhci debugfs file
2023-03-28 6:30 Question of possible concurrent xhci debugfs file YoungJun.Park
@ 2023-03-28 10:42 ` Greg KH
2023-03-30 12:34 ` Mathias Nyman
1 sibling, 0 replies; 3+ messages in thread
From: Greg KH @ 2023-03-28 10:42 UTC (permalink / raw)
To: YoungJun.Park; +Cc: linux-usb, youngjun.park, her0gyu
On Mon, Mar 27, 2023 at 11:30:20PM -0700, YoungJun.Park wrote:
> I got a panic dump which happend on kernel(
> I check the both of CentOS and mainline kernel
> And I assume it could be happend on mainline kernel.
> (like I said assume)
Note, CentOS uses a very old and obsolete kernel, nothing we can do
about that mess, please get support for that from the company that
provided it.
> I think xhci-debugfs file creation check the creation of
> NULL parent dentry. Is it right? (drivers/usb/host/xhci-debugfs.c)
Why is that needed?
> The contents below is the what I analyze.
>
> 1. A lot of usb error
> ...
> [1994159.974407] usb 1-2: device descriptor read/64, error -71
> [1994160.187902] usb 1-2: new low-speed USB device number 36 using xhci_hcd
> [1994160.302634] usb 1-2: device descriptor read/64, error -71
> [1994160.562027] usb 1-2: device descriptor read/64, error -71
> [1994160.663789] usb usb1-port2: unable to enumerate USB device
> [1994162.256029] usb 1-2: new low-speed USB device number 37 using xhci_hcd
> [1994162.370764] usb 1-2: device descriptor read/64, error -71
> [1994162.585258] usb 1-2: device descriptor read/64, error -71
> [1994162.797751] usb 1-2: new low-speed USB device number 38 using xhci_hcd
> [1994162.912484] usb 1-2: device descriptor read/64, error -71
> [1994163.141944] usb 1-2: device descriptor read/64, error -71
> [1994163.243712] usb usb1-port2: attempt power cycle.
> ...
That has nothing to do with debugfs.
> 2. After that panic happend and I got a dump.
> The cause of panic reading "/sys/kernel/debug/trbs" which is abnormal.
> (which have bad xhci_ring private data)
> the file must be on the "/sys/kernel/debug/usb/~~~".
>
> sh> bt
> PID: 91416 TASK: ffff8d68fcc54200 CPU: 1 COMMAND: "fbmons"
> #0 [ffff8d679a6cbad0] machine_kexec at ffffffff9e2662c4
> #1 [ffff8d679a6cbb30] __crash_kexec at ffffffff9e322a32
> #2 [ffff8d679a6cbc00] crash_kexec at ffffffff9e322b20
> #3 [ffff8d679a6cbc18] oops_end at ffffffff9e98d798
> #4 [ffff8d679a6cbc40] no_context at ffffffff9e275d14
> #5 [ffff8d679a6cbc90] __bad_area_nosemaphore at ffffffff9e275fe2
> #6 [ffff8d679a6cbce0] bad_area_nosemaphore at ffffffff9e276104
> #7 [ffff8d679a6cbcf0] __do_page_fault at ffffffff9e990750
> #8 [ffff8d679a6cbd60] do_page_fault at ffffffff9e990975
> #9 [ffff8d679a6cbd90] page_fault at ffffffff9e98c778
> [exception RIP: xhci_ring_trb_show+29]
> RIP: ffffffff9e76005d RSP: ffff8d679a6cbe40 RFLAGS: 00010246
> RAX: ffff8d497b0fe018 RBX: 0000000000000000 RCX: 0000001309207f9b
> RDX: fffffffffffffff4 RSI: 0000000000000001 RDI: ffff8d6939b8fd40
> RBP: ffff8d679a6cbe60 R8: ffff8d49ffa5f1a0 R9: ffff8d3b3fc07300
> R10: ffff8d3b3fc07300 R11: ffffffff9e3de9fd R12: 0000000000000000
> R13: 0000000000000000 R14: ffff8d6939b8fd40 R15: ffff8d6939b8fd40
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #10 [ffff8d679a6cbe68] seq_read at ffffffff9e476d10
> #11 [ffff8d679a6cbed8] vfs_read at ffffffff9e44e3ff
> #12 [ffff8d679a6cbf08] sys_read at ffffffff9e44f27f
> #13 [ffff8d679a6cbf50] system_call_fastpath at ffffffff9e995f92
> RIP: 00007f5f9749f6fd RSP: 00007f5f84c73700 RFLAGS: 00000293
> RAX: 0000000000000000 RBX: 00007f5f7066a390 RCX: ffffffffffffffff
> RDX: 0000000000000400 RSI: 00007f5f706d2270 RDI: 0000000000000016
> RBP: 00007f5f706d2270 R8: 00007f5f953e73a0 R9: 00007f5f953e7388
> R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000400
> R13: 00007f5f84c7354c R14: 0000000022100004 R15: 00007f5f866c5368
> ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
So the xhci driver is crashing here?
> 3. I found a another abnormal xhci debugfs file ep_context in dump.
> Check xhci_slot_priv is alive and find the root is NULL.
>
> crash> files -d 0xffff8d68fe8f0000
> DENTRY INODE SUPERBLK TYPE PATH
> ffff8d68fe8f0000 ffff8d489e979e10 ffff8d3b19169800 REG /sys/kernel/debug/ep-context
>
> crash > struct xchi_slot_priv 0xffff8d4812492c0
> struct xhci_slot_priv {
> ...
> root = 0x0,
> ...
> dev = 0xffff8d497b0fe000
> }
>
> 4. Looking into the mainline kernel code, I finally concluded that
> /sys/fs/debug/"interface file" could be made.
>
> drivers/usb/host/xhci-debugfs.c
> xhci_debugfs_create_slot function
> ...
> priv->root = debugfs_create_dir(priv->name, xhci->debugfs_slots); // root can be NULL
How is root NULL here? What caused that to happen? Shouldn't you fix
that issue here, that's not a general debugfs problem.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question of possible concurrent xhci debugfs file
2023-03-28 6:30 Question of possible concurrent xhci debugfs file YoungJun.Park
2023-03-28 10:42 ` Greg KH
@ 2023-03-30 12:34 ` Mathias Nyman
1 sibling, 0 replies; 3+ messages in thread
From: Mathias Nyman @ 2023-03-30 12:34 UTC (permalink / raw)
To: YoungJun.Park, linux-usb; +Cc: youngjun.park, her0gyu, Greg KH
On 28.3.2023 9.30, YoungJun.Park wrote:
> I got a panic dump which happend on kernel(
> I check the both of CentOS and mainline kernel
> And I assume it could be happend on mainline kernel.
> (like I said assume)
>
> I think xhci-debugfs file creation check the creation of
> NULL parent dentry. Is it right? (drivers/usb/host/xhci-debugfs.c)
>
> The contents below is the what I analyze.
>
> 1. A lot of usb error
> ...
> [1994159.974407] usb 1-2: device descriptor read/64, error -71
> [1994160.187902] usb 1-2: new low-speed USB device number 36 using xhci_hcd
> [1994160.302634] usb 1-2: device descriptor read/64, error -71
> [1994160.562027] usb 1-2: device descriptor read/64, error -71
> [1994160.663789] usb usb1-port2: unable to enumerate USB device
> [1994162.256029] usb 1-2: new low-speed USB device number 37 using xhci_hcd
> [1994162.370764] usb 1-2: device descriptor read/64, error -71
> [1994162.585258] usb 1-2: device descriptor read/64, error -71
> [1994162.797751] usb 1-2: new low-speed USB device number 38 using xhci_hcd
> [1994162.912484] usb 1-2: device descriptor read/64, error -71
> [1994163.141944] usb 1-2: device descriptor read/64, error -71
> [1994163.243712] usb usb1-port2: attempt power cycle.
> ...
>
> 2. After that panic happend and I got a dump.
> The cause of panic reading "/sys/kernel/debug/trbs" which is abnormal.
> (which have bad xhci_ring private data)
> the file must be on the "/sys/kernel/debug/usb/~~~".
>
> sh> bt
> PID: 91416 TASK: ffff8d68fcc54200 CPU: 1 COMMAND: "fbmons"
> #0 [ffff8d679a6cbad0] machine_kexec at ffffffff9e2662c4
> #1 [ffff8d679a6cbb30] __crash_kexec at ffffffff9e322a32
> #2 [ffff8d679a6cbc00] crash_kexec at ffffffff9e322b20
> #3 [ffff8d679a6cbc18] oops_end at ffffffff9e98d798
> #4 [ffff8d679a6cbc40] no_context at ffffffff9e275d14
> #5 [ffff8d679a6cbc90] __bad_area_nosemaphore at ffffffff9e275fe2
> #6 [ffff8d679a6cbce0] bad_area_nosemaphore at ffffffff9e276104
> #7 [ffff8d679a6cbcf0] __do_page_fault at ffffffff9e990750
> #8 [ffff8d679a6cbd60] do_page_fault at ffffffff9e990975
> #9 [ffff8d679a6cbd90] page_fault at ffffffff9e98c778
> [exception RIP: xhci_ring_trb_show+29]
> RIP: ffffffff9e76005d RSP: ffff8d679a6cbe40 RFLAGS: 00010246
> RAX: ffff8d497b0fe018 RBX: 0000000000000000 RCX: 0000001309207f9b
> RDX: fffffffffffffff4 RSI: 0000000000000001 RDI: ffff8d6939b8fd40
> RBP: ffff8d679a6cbe60 R8: ffff8d49ffa5f1a0 R9: ffff8d3b3fc07300
> R10: ffff8d3b3fc07300 R11: ffffffff9e3de9fd R12: 0000000000000000
> R13: 0000000000000000 R14: ffff8d6939b8fd40 R15: ffff8d6939b8fd40
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #10 [ffff8d679a6cbe68] seq_read at ffffffff9e476d10
> #11 [ffff8d679a6cbed8] vfs_read at ffffffff9e44e3ff
> #12 [ffff8d679a6cbf08] sys_read at ffffffff9e44f27f
> #13 [ffff8d679a6cbf50] system_call_fastpath at ffffffff9e995f92
> RIP: 00007f5f9749f6fd RSP: 00007f5f84c73700 RFLAGS: 00000293
> RAX: 0000000000000000 RBX: 00007f5f7066a390 RCX: ffffffffffffffff
> RDX: 0000000000000400 RSI: 00007f5f706d2270 RDI: 0000000000000016
> RBP: 00007f5f706d2270 R8: 00007f5f953e73a0 R9: 00007f5f953e7388
> R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000400
> R13: 00007f5f84c7354c R14: 0000000022100004 R15: 00007f5f866c5368
> ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
>
> 3. I found a another abnormal xhci debugfs file ep_context in dump.
> Check xhci_slot_priv is alive and find the root is NULL.
>
> crash> files -d 0xffff8d68fe8f0000
> DENTRY INODE SUPERBLK TYPE PATH
> ffff8d68fe8f0000 ffff8d489e979e10 ffff8d3b19169800 REG /sys/kernel/debug/ep-context
>
> crash > struct xchi_slot_priv 0xffff8d4812492c0
> struct xhci_slot_priv {
> ...
> root = 0x0,
> ...
> dev = 0xffff8d497b0fe000
> }
>
> 4. Looking into the mainline kernel code, I finally concluded that
> /sys/fs/debug/"interface file" could be made.
>
> drivers/usb/host/xhci-debugfs.c
> xhci_debugfs_create_slot function
> ...
> priv->root = debugfs_create_dir(priv->name, xhci->debugfs_slots); // root can be NULL
If you can reproduce this could you check if debugfs_create_dir() actually returned
NULL here, or if there was some race zeroing priv->root a bit later.
I tried to reproduce this myself by intentionally failing the device descriptor read
during enumeration. This gave similar "device descriptor read/64, error -71" and
"attempt power cycle" log entries, but didn't trigger the issue.
For me the slot related debugfs files were successfully created after usb_alloc_dev() was called,
and removed after hub_free_dev().
Thanks
Mathias
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-03-30 12:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-28 6:30 Question of possible concurrent xhci debugfs file YoungJun.Park
2023-03-28 10:42 ` Greg KH
2023-03-30 12:34 ` Mathias Nyman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).