* kprobe not always triggering
@ 2025-01-15 8:26 Daniel Wagner
2025-01-15 15:17 ` Steven Rostedt
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2025-01-15 8:26 UTC (permalink / raw)
To: linux-trace-kernel
Hi,
it's likely a layer 8 problem but I can't figure out what I am doing
wrong.
I am debugging the nvme state machine with a nvme-tcp setup. I attached
a kprobe to nvme_change_ctrl_state to trace all the state changes, but
not all of them are logged. It's consistent behavior and I tried
different things with the same result.
The state transition I'd like to see is
LIVE (1) -> RESETTING (2) -> CONNECTING (3) -> LIVE (1)
Though the 1 -> 2 transition is always missing, though a printk says it
is happening.
cd /sys/kernel/debug/tracing
echo "r:nvme_change_ctrl_state nvme_change_ctrl_state new_state=\$arg2:x32" > kprobe_events
echo 1 > events/kprobes/enable
echo 1 > tracing_on
cat trace_pipe
nvme-570 [004] ..... 97.559620: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1
<...>-65 [000] ..... 143.540287: nvme_change_ctrl_state: (nvme_reset_ctrl_work+0x76/0xc0 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x3
kworker/u34:3-76 [001] ..... 170.236138: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1
[ 97.548827] nvme_change_ctrl_state:566 3 -> 1
[ 97.550547] nvme nvme1: new ctrl: NQN "nqn.io-1", addr 192.168.154.25:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36
[ 133.273552] nvme nvme1: I/O tag 1 (e001) type 4 opcode 0x18 (Keep Alive) QID 0 timeout
[ 133.274359] nvme nvme1: long keepalive RTT (5056 ms)
[ 133.274684] nvme nvme1: failed nvme_keep_alive_end_io error=10
[ 143.514090] nvme_change_ctrl_state:566 1 -> 2
[ 143.529185] nvme_change_ctrl_state:566 2 -> 3
[ 146.584904] nvme nvme1: failed to connect socket: -110
[ 146.585553] nvme nvme1: Reconnecting in 10 seconds...
[ 159.897184] nvme nvme1: failed to connect socket: -110
[ 159.897633] nvme nvme1: Failed reconnect attempt 2/60
[ 159.898006] nvme nvme1: Reconnecting in 10 seconds...
[ 170.153915] nvme nvme1: creating 8 I/O queues.
The system is idle, so I hope it's not just a dropped event. Any ideas
what I am doing wrong?
Linux 6.13.0-rc3+, clang compiler.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: kprobe not always triggering 2025-01-15 8:26 kprobe not always triggering Daniel Wagner @ 2025-01-15 15:17 ` Steven Rostedt 2025-01-15 16:29 ` Daniel Wagner 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2025-01-15 15:17 UTC (permalink / raw) To: Daniel Wagner; +Cc: linux-trace-kernel, Masami Hiramatsu [ Adding Masami ] On Wed, 15 Jan 2025 09:26:36 +0100 Daniel Wagner <dwagner@suse.de> wrote: > Hi, > > it's likely a layer 8 problem but I can't figure out what I am doing > wrong. > > I am debugging the nvme state machine with a nvme-tcp setup. I attached > a kprobe to nvme_change_ctrl_state to trace all the state changes, but > not all of them are logged. It's consistent behavior and I tried > different things with the same result. Is it consistently the same locations being dropped? There's been cases of a compiler inlining some functions for the code called in the same file. If that happened, then those will likely be dropped. That can be tested by adding: bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl, enum nvme_ctrl_state new_state) You would think it shouldn't inline it, but there's nothing in the spec that says it can't do so. > > The state transition I'd like to see is > > LIVE (1) -> RESETTING (2) -> CONNECTING (3) -> LIVE (1) > > Though the 1 -> 2 transition is always missing, though a printk says it > is happening. > > cd /sys/kernel/debug/tracing > echo "r:nvme_change_ctrl_state nvme_change_ctrl_state new_state=\$arg2:x32" > kprobe_events > echo 1 > events/kprobes/enable > echo 1 > tracing_on > cat trace_pipe > > nvme-570 [004] ..... 97.559620: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1 > <...>-65 [000] ..... 143.540287: nvme_change_ctrl_state: (nvme_reset_ctrl_work+0x76/0xc0 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x3 > kworker/u34:3-76 [001] ..... 170.236138: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1 > > > [ 97.548827] nvme_change_ctrl_state:566 3 -> 1 > [ 97.550547] nvme nvme1: new ctrl: NQN "nqn.io-1", addr 192.168.154.25:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36 > [ 133.273552] nvme nvme1: I/O tag 1 (e001) type 4 opcode 0x18 (Keep Alive) QID 0 timeout > [ 133.274359] nvme nvme1: long keepalive RTT (5056 ms) > [ 133.274684] nvme nvme1: failed nvme_keep_alive_end_io error=10 > [ 143.514090] nvme_change_ctrl_state:566 1 -> 2 > [ 143.529185] nvme_change_ctrl_state:566 2 -> 3 > [ 146.584904] nvme nvme1: failed to connect socket: -110 > [ 146.585553] nvme nvme1: Reconnecting in 10 seconds... > [ 159.897184] nvme nvme1: failed to connect socket: -110 > [ 159.897633] nvme nvme1: Failed reconnect attempt 2/60 > [ 159.898006] nvme nvme1: Reconnecting in 10 seconds... > [ 170.153915] nvme nvme1: creating 8 I/O queues. > > The system is idle, so I hope it's not just a dropped event. Any ideas > what I am doing wrong? I don't think it's a dropped event. It's not busy enough. But the compiler making a copy of the function (by inlining or what not) can cause something like this to happen. -- Steve > > Linux 6.13.0-rc3+, clang compiler. > > Thanks, > Daniel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering 2025-01-15 15:17 ` Steven Rostedt @ 2025-01-15 16:29 ` Daniel Wagner 2025-01-15 16:37 ` Steven Rostedt 2025-01-15 23:48 ` Masami Hiramatsu 0 siblings, 2 replies; 8+ messages in thread From: Daniel Wagner @ 2025-01-15 16:29 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-trace-kernel, Masami Hiramatsu On Wed, Jan 15, 2025 at 10:17:15AM -0500, Steven Rostedt wrote: > [ Adding Masami ] > Is it consistently the same locations being dropped? Yes, it is very consistent. > There's been cases of a compiler inlining some functions for the code > called in the same file. If that happened, then those will likely be > dropped. That can be tested by adding: > > bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl, > enum nvme_ctrl_state new_state) > > You would think it shouldn't inline it, but there's nothing in the spec > that says it can't do so. This did the trick, now I see the missing state transitions. > > The system is idle, so I hope it's not just a dropped event. Any ideas > > what I am doing wrong? > > I don't think it's a dropped event. It's not busy enough. But the compiler > making a copy of the function (by inlining or what not) can cause something > like this to happen. I didn't expect that the compiler does inlining and not inlining at the same time. Again what learned :) Thanks, Daniel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering 2025-01-15 16:29 ` Daniel Wagner @ 2025-01-15 16:37 ` Steven Rostedt 2025-01-15 16:50 ` Daniel Wagner 2025-01-15 23:48 ` Masami Hiramatsu 1 sibling, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2025-01-15 16:37 UTC (permalink / raw) To: Daniel Wagner; +Cc: linux-trace-kernel, Masami Hiramatsu On Wed, 15 Jan 2025 17:29:34 +0100 Daniel Wagner <dwagner@suse.de> wrote: > > There's been cases of a compiler inlining some functions for the code > > called in the same file. If that happened, then those will likely be > > dropped. That can be tested by adding: > > > > bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl, > > enum nvme_ctrl_state new_state) > > > > You would think it shouldn't inline it, but there's nothing in the spec > > that says it can't do so. > > This did the trick, now I see the missing state transitions. Grumble. I expect this issue to come up more often. We need a way to detect this. Come to think of it, I believe this is more of a clang "feature". As I do recall some of the other reports were from the clang compiler as well. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering 2025-01-15 16:37 ` Steven Rostedt @ 2025-01-15 16:50 ` Daniel Wagner 0 siblings, 0 replies; 8+ messages in thread From: Daniel Wagner @ 2025-01-15 16:50 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-trace-kernel, Masami Hiramatsu On Wed, Jan 15, 2025 at 11:37:33AM -0500, Steven Rostedt wrote: > Grumble. I expect this issue to come up more often. We need a way to detect > this. Come to think of it, I believe this is more of a clang "feature". As > I do recall some of the other reports were from the clang compiler as well. clang doesn't seem to offer a more fine grained way to control the inline decisions. There is no '--inline-all-or-nothing' option. clang version 19.1.5 (Fedora 19.1.5-1.fc41) Target: x86_64-redhat-linux-gnu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering 2025-01-15 16:29 ` Daniel Wagner 2025-01-15 16:37 ` Steven Rostedt @ 2025-01-15 23:48 ` Masami Hiramatsu 2025-01-16 12:04 ` Daniel Wagner 1 sibling, 1 reply; 8+ messages in thread From: Masami Hiramatsu @ 2025-01-15 23:48 UTC (permalink / raw) To: Daniel Wagner; +Cc: Steven Rostedt, linux-trace-kernel, Masami Hiramatsu Hi Daniel, On Wed, 15 Jan 2025 17:29:34 +0100 Daniel Wagner <dwagner@suse.de> wrote: > On Wed, Jan 15, 2025 at 10:17:15AM -0500, Steven Rostedt wrote: > > [ Adding Masami ] > > Is it consistently the same locations being dropped? > > Yes, it is very consistent. > > > There's been cases of a compiler inlining some functions for the code > > called in the same file. If that happened, then those will likely be > > dropped. That can be tested by adding: > > > > bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl, > > enum nvme_ctrl_state new_state) > > > > You would think it shouldn't inline it, but there's nothing in the spec > > that says it can't do so. > > This did the trick, now I see the missing state transitions. Have you tried to use `perf probe`? Even if it is inlined, you can find them by perf probe -a nvme_change_ctrl_state new_state But perhaps, some of them are inlined with constant numbers (maybe you can find them as nvme_change_ctrl_state.constprop.XXX in kallsyms) If my guess is correct, it is not inlined but the constant propagation optimization[1] has been done by the compiler. [1] https://en.wikipedia.org/wiki/Constant_folding#Constant_propagation IIRC, commit 66f69b219716 ("perf probe: Support DW_AT_const_value constant value") should support this case. Thank you, > > > > The system is idle, so I hope it's not just a dropped event. Any ideas > > > what I am doing wrong? > > > > I don't think it's a dropped event. It's not busy enough. But the compiler > > making a copy of the function (by inlining or what not) can cause something > > like this to happen. > > I didn't expect that the compiler does inlining and not inlining at the > same time. Again what learned :) > > Thanks, > Daniel -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering 2025-01-15 23:48 ` Masami Hiramatsu @ 2025-01-16 12:04 ` Daniel Wagner 2025-01-17 8:52 ` Masami Hiramatsu 0 siblings, 1 reply; 8+ messages in thread From: Daniel Wagner @ 2025-01-16 12:04 UTC (permalink / raw) To: Masami Hiramatsu; +Cc: Steven Rostedt, linux-trace-kernel On Thu, Jan 16, 2025 at 08:48:18AM +0900, Masami Hiramatsu wrote: > Have you tried to use `perf probe`? Even if it is inlined, you can find them > by > > perf probe -a nvme_change_ctrl_state new_state I had to build my own perf, as the version from the distro didn't like my kernel and just crashed. Thus I added a bunch libraries to my build system. The resulting perf installed the probe and the missing events show up: <idle>-0 [000] ..s1. 245.406139: nvme_complete_rq: nvme1: qid=0, cmdid=36865, res=0x0, retries=0, flags=0x0, status=0x371 <idle>-0 [000] .Ns1. 245.407192: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2 kworker/u33:0-65 [000] .N... 245.407258: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2 kworker/u33:1-78 [000] ..... 245.421290: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x3 I double checked my previous setup with kprobes and now the missing events show up as well. It looks like my setup changes makes clang to not inline this function anymore, or there is no additional debug info available? > But perhaps, some of them are inlined with constant numbers (maybe you can > find them as nvme_change_ctrl_state.constprop.XXX in kallsyms) # cat /proc/kallsyms | grep nvme_change_ctrl_state ffffffffc03dc04d r __kstrtab_nvme_change_ctrl_state [nvme_core] ffffffffc03dc064 r __kstrtabns_nvme_change_ctrl_state [nvme_core] ffffffffc03ca130 r __ksymtab_nvme_change_ctrl_state [nvme_core] ffffffffc0405de0 t __pfx_nvme_change_ctrl_state [nvme_core] ffffffffc0405df0 t nvme_change_ctrl_state [nvme_core] This is with the new kernel, so after installing the libraries for the perf build. > If my guess is correct, it is not inlined but the constant propagation > optimization[1] has been done by the compiler. > > [1] https://en.wikipedia.org/wiki/Constant_folding#Constant_propagation > > IIRC, commit 66f69b219716 ("perf probe: Support DW_AT_const_value constant value") > should support this case. Okay, this commit is already in (v5.4), so it was always in. Anyway, I can't reproduce anymore. Daniel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering 2025-01-16 12:04 ` Daniel Wagner @ 2025-01-17 8:52 ` Masami Hiramatsu 0 siblings, 0 replies; 8+ messages in thread From: Masami Hiramatsu @ 2025-01-17 8:52 UTC (permalink / raw) To: Daniel Wagner; +Cc: Steven Rostedt, linux-trace-kernel On Thu, 16 Jan 2025 13:04:18 +0100 Daniel Wagner <dwagner@suse.de> wrote: > On Thu, Jan 16, 2025 at 08:48:18AM +0900, Masami Hiramatsu wrote: > > Have you tried to use `perf probe`? Even if it is inlined, you can find them > > by > > > > perf probe -a nvme_change_ctrl_state new_state > > I had to build my own perf, as the version from the distro didn't like > my kernel and just crashed. Thus I added a bunch libraries to my build > system. The resulting perf installed the probe and the missing events > show up: > > <idle>-0 [000] ..s1. 245.406139: nvme_complete_rq: nvme1: qid=0, cmdid=36865, res=0x0, retries=0, flags=0x0, status=0x371 > <idle>-0 [000] .Ns1. 245.407192: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2 > kworker/u33:0-65 [000] .N... 245.407258: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2 > kworker/u33:1-78 [000] ..... 245.421290: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x3 > Nice. > I double checked my previous setup with kprobes and now the missing > events show up as well. It looks like my setup changes makes clang to > not inline this function anymore, or there is no additional debug info > available? Hmm, I don't think (don't hope) the clang change the optimization silently if debuginfo is enabled. This is a kind of heisenbug :( > > > But perhaps, some of them are inlined with constant numbers (maybe you can > > find them as nvme_change_ctrl_state.constprop.XXX in kallsyms) > > # cat /proc/kallsyms | grep nvme_change_ctrl_state > ffffffffc03dc04d r __kstrtab_nvme_change_ctrl_state [nvme_core] > ffffffffc03dc064 r __kstrtabns_nvme_change_ctrl_state [nvme_core] > ffffffffc03ca130 r __ksymtab_nvme_change_ctrl_state [nvme_core] > ffffffffc0405de0 t __pfx_nvme_change_ctrl_state [nvme_core] > ffffffffc0405df0 t nvme_change_ctrl_state [nvme_core] > > This is with the new kernel, so after installing the libraries for the > perf build. IIRC, constprop.XXX is gcc's suffix, clang has different one. > > > If my guess is correct, it is not inlined but the constant propagation > > optimization[1] has been done by the compiler. > > > > [1] https://en.wikipedia.org/wiki/Constant_folding#Constant_propagation > > > > IIRC, commit 66f69b219716 ("perf probe: Support DW_AT_const_value constant value") > > should support this case. > > Okay, this commit is already in (v5.4), so it was always in. > > Anyway, I can't reproduce anymore. Feel free to notice if you see similar issue. Thank you, > > Daniel -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-17 8:52 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-15 8:26 kprobe not always triggering Daniel Wagner 2025-01-15 15:17 ` Steven Rostedt 2025-01-15 16:29 ` Daniel Wagner 2025-01-15 16:37 ` Steven Rostedt 2025-01-15 16:50 ` Daniel Wagner 2025-01-15 23:48 ` Masami Hiramatsu 2025-01-16 12:04 ` Daniel Wagner 2025-01-17 8:52 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox