* kprobe not always triggering
@ 2025-01-15 8:26 Daniel Wagner
2025-01-15 15:17 ` Steven Rostedt
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2025-01-15 8:26 UTC (permalink / raw)
To: linux-trace-kernel
Hi,
it's likely a layer 8 problem but I can't figure out what I am doing
wrong.
I am debugging the nvme state machine with a nvme-tcp setup. I attached
a kprobe to nvme_change_ctrl_state to trace all the state changes, but
not all of them are logged. It's consistent behavior and I tried
different things with the same result.
The state transition I'd like to see is
LIVE (1) -> RESETTING (2) -> CONNECTING (3) -> LIVE (1)
Though the 1 -> 2 transition is always missing, though a printk says it
is happening.
cd /sys/kernel/debug/tracing
echo "r:nvme_change_ctrl_state nvme_change_ctrl_state new_state=\$arg2:x32" > kprobe_events
echo 1 > events/kprobes/enable
echo 1 > tracing_on
cat trace_pipe
nvme-570 [004] ..... 97.559620: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1
<...>-65 [000] ..... 143.540287: nvme_change_ctrl_state: (nvme_reset_ctrl_work+0x76/0xc0 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x3
kworker/u34:3-76 [001] ..... 170.236138: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1
[ 97.548827] nvme_change_ctrl_state:566 3 -> 1
[ 97.550547] nvme nvme1: new ctrl: NQN "nqn.io-1", addr 192.168.154.25:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36
[ 133.273552] nvme nvme1: I/O tag 1 (e001) type 4 opcode 0x18 (Keep Alive) QID 0 timeout
[ 133.274359] nvme nvme1: long keepalive RTT (5056 ms)
[ 133.274684] nvme nvme1: failed nvme_keep_alive_end_io error=10
[ 143.514090] nvme_change_ctrl_state:566 1 -> 2
[ 143.529185] nvme_change_ctrl_state:566 2 -> 3
[ 146.584904] nvme nvme1: failed to connect socket: -110
[ 146.585553] nvme nvme1: Reconnecting in 10 seconds...
[ 159.897184] nvme nvme1: failed to connect socket: -110
[ 159.897633] nvme nvme1: Failed reconnect attempt 2/60
[ 159.898006] nvme nvme1: Reconnecting in 10 seconds...
[ 170.153915] nvme nvme1: creating 8 I/O queues.
The system is idle, so I hope it's not just a dropped event. Any ideas
what I am doing wrong?
Linux 6.13.0-rc3+, clang compiler.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-15 8:26 kprobe not always triggering Daniel Wagner
@ 2025-01-15 15:17 ` Steven Rostedt
2025-01-15 16:29 ` Daniel Wagner
0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2025-01-15 15:17 UTC (permalink / raw)
To: Daniel Wagner; +Cc: linux-trace-kernel, Masami Hiramatsu
[ Adding Masami ]
On Wed, 15 Jan 2025 09:26:36 +0100
Daniel Wagner <dwagner@suse.de> wrote:
> Hi,
>
> it's likely a layer 8 problem but I can't figure out what I am doing
> wrong.
>
> I am debugging the nvme state machine with a nvme-tcp setup. I attached
> a kprobe to nvme_change_ctrl_state to trace all the state changes, but
> not all of them are logged. It's consistent behavior and I tried
> different things with the same result.
Is it consistently the same locations being dropped?
There's been cases of a compiler inlining some functions for the code
called in the same file. If that happened, then those will likely be
dropped. That can be tested by adding:
bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
enum nvme_ctrl_state new_state)
You would think it shouldn't inline it, but there's nothing in the spec
that says it can't do so.
>
> The state transition I'd like to see is
>
> LIVE (1) -> RESETTING (2) -> CONNECTING (3) -> LIVE (1)
>
> Though the 1 -> 2 transition is always missing, though a printk says it
> is happening.
>
> cd /sys/kernel/debug/tracing
> echo "r:nvme_change_ctrl_state nvme_change_ctrl_state new_state=\$arg2:x32" > kprobe_events
> echo 1 > events/kprobes/enable
> echo 1 > tracing_on
> cat trace_pipe
>
> nvme-570 [004] ..... 97.559620: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1
> <...>-65 [000] ..... 143.540287: nvme_change_ctrl_state: (nvme_reset_ctrl_work+0x76/0xc0 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x3
> kworker/u34:3-76 [001] ..... 170.236138: nvme_change_ctrl_state: (nvme_tcp_setup_ctrl+0x324/0x720 [nvme_tcp] <- nvme_change_ctrl_state) new_state=0x1
>
>
> [ 97.548827] nvme_change_ctrl_state:566 3 -> 1
> [ 97.550547] nvme nvme1: new ctrl: NQN "nqn.io-1", addr 192.168.154.25:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36
> [ 133.273552] nvme nvme1: I/O tag 1 (e001) type 4 opcode 0x18 (Keep Alive) QID 0 timeout
> [ 133.274359] nvme nvme1: long keepalive RTT (5056 ms)
> [ 133.274684] nvme nvme1: failed nvme_keep_alive_end_io error=10
> [ 143.514090] nvme_change_ctrl_state:566 1 -> 2
> [ 143.529185] nvme_change_ctrl_state:566 2 -> 3
> [ 146.584904] nvme nvme1: failed to connect socket: -110
> [ 146.585553] nvme nvme1: Reconnecting in 10 seconds...
> [ 159.897184] nvme nvme1: failed to connect socket: -110
> [ 159.897633] nvme nvme1: Failed reconnect attempt 2/60
> [ 159.898006] nvme nvme1: Reconnecting in 10 seconds...
> [ 170.153915] nvme nvme1: creating 8 I/O queues.
>
> The system is idle, so I hope it's not just a dropped event. Any ideas
> what I am doing wrong?
I don't think it's a dropped event. It's not busy enough. But the compiler
making a copy of the function (by inlining or what not) can cause something
like this to happen.
-- Steve
>
> Linux 6.13.0-rc3+, clang compiler.
>
> Thanks,
> Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-15 15:17 ` Steven Rostedt
@ 2025-01-15 16:29 ` Daniel Wagner
2025-01-15 16:37 ` Steven Rostedt
2025-01-15 23:48 ` Masami Hiramatsu
0 siblings, 2 replies; 8+ messages in thread
From: Daniel Wagner @ 2025-01-15 16:29 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-trace-kernel, Masami Hiramatsu
On Wed, Jan 15, 2025 at 10:17:15AM -0500, Steven Rostedt wrote:
> [ Adding Masami ]
> Is it consistently the same locations being dropped?
Yes, it is very consistent.
> There's been cases of a compiler inlining some functions for the code
> called in the same file. If that happened, then those will likely be
> dropped. That can be tested by adding:
>
> bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> enum nvme_ctrl_state new_state)
>
> You would think it shouldn't inline it, but there's nothing in the spec
> that says it can't do so.
This did the trick, now I see the missing state transitions.
> > The system is idle, so I hope it's not just a dropped event. Any ideas
> > what I am doing wrong?
>
> I don't think it's a dropped event. It's not busy enough. But the compiler
> making a copy of the function (by inlining or what not) can cause something
> like this to happen.
I didn't expect that the compiler does inlining and not inlining at the
same time. Again what learned :)
Thanks,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-15 16:29 ` Daniel Wagner
@ 2025-01-15 16:37 ` Steven Rostedt
2025-01-15 16:50 ` Daniel Wagner
2025-01-15 23:48 ` Masami Hiramatsu
1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2025-01-15 16:37 UTC (permalink / raw)
To: Daniel Wagner; +Cc: linux-trace-kernel, Masami Hiramatsu
On Wed, 15 Jan 2025 17:29:34 +0100
Daniel Wagner <dwagner@suse.de> wrote:
> > There's been cases of a compiler inlining some functions for the code
> > called in the same file. If that happened, then those will likely be
> > dropped. That can be tested by adding:
> >
> > bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> > enum nvme_ctrl_state new_state)
> >
> > You would think it shouldn't inline it, but there's nothing in the spec
> > that says it can't do so.
>
> This did the trick, now I see the missing state transitions.
Grumble. I expect this issue to come up more often. We need a way to detect
this. Come to think of it, I believe this is more of a clang "feature". As
I do recall some of the other reports were from the clang compiler as well.
-- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-15 16:37 ` Steven Rostedt
@ 2025-01-15 16:50 ` Daniel Wagner
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Wagner @ 2025-01-15 16:50 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-trace-kernel, Masami Hiramatsu
On Wed, Jan 15, 2025 at 11:37:33AM -0500, Steven Rostedt wrote:
> Grumble. I expect this issue to come up more often. We need a way to detect
> this. Come to think of it, I believe this is more of a clang "feature". As
> I do recall some of the other reports were from the clang compiler as well.
clang doesn't seem to offer a more fine grained way to control the
inline decisions. There is no '--inline-all-or-nothing' option.
clang version 19.1.5 (Fedora 19.1.5-1.fc41)
Target: x86_64-redhat-linux-gnu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-15 16:29 ` Daniel Wagner
2025-01-15 16:37 ` Steven Rostedt
@ 2025-01-15 23:48 ` Masami Hiramatsu
2025-01-16 12:04 ` Daniel Wagner
1 sibling, 1 reply; 8+ messages in thread
From: Masami Hiramatsu @ 2025-01-15 23:48 UTC (permalink / raw)
To: Daniel Wagner; +Cc: Steven Rostedt, linux-trace-kernel, Masami Hiramatsu
Hi Daniel,
On Wed, 15 Jan 2025 17:29:34 +0100
Daniel Wagner <dwagner@suse.de> wrote:
> On Wed, Jan 15, 2025 at 10:17:15AM -0500, Steven Rostedt wrote:
> > [ Adding Masami ]
> > Is it consistently the same locations being dropped?
>
> Yes, it is very consistent.
>
> > There's been cases of a compiler inlining some functions for the code
> > called in the same file. If that happened, then those will likely be
> > dropped. That can be tested by adding:
> >
> > bool noinline nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> > enum nvme_ctrl_state new_state)
> >
> > You would think it shouldn't inline it, but there's nothing in the spec
> > that says it can't do so.
>
> This did the trick, now I see the missing state transitions.
Have you tried to use `perf probe`? Even if it is inlined, you can find them
by
perf probe -a nvme_change_ctrl_state new_state
But perhaps, some of them are inlined with constant numbers (maybe you can
find them as nvme_change_ctrl_state.constprop.XXX in kallsyms)
If my guess is correct, it is not inlined but the constant propagation
optimization[1] has been done by the compiler.
[1] https://en.wikipedia.org/wiki/Constant_folding#Constant_propagation
IIRC, commit 66f69b219716 ("perf probe: Support DW_AT_const_value constant value")
should support this case.
Thank you,
>
> > > The system is idle, so I hope it's not just a dropped event. Any ideas
> > > what I am doing wrong?
> >
> > I don't think it's a dropped event. It's not busy enough. But the compiler
> > making a copy of the function (by inlining or what not) can cause something
> > like this to happen.
>
> I didn't expect that the compiler does inlining and not inlining at the
> same time. Again what learned :)
>
> Thanks,
> Daniel
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-15 23:48 ` Masami Hiramatsu
@ 2025-01-16 12:04 ` Daniel Wagner
2025-01-17 8:52 ` Masami Hiramatsu
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2025-01-16 12:04 UTC (permalink / raw)
To: Masami Hiramatsu; +Cc: Steven Rostedt, linux-trace-kernel
On Thu, Jan 16, 2025 at 08:48:18AM +0900, Masami Hiramatsu wrote:
> Have you tried to use `perf probe`? Even if it is inlined, you can find them
> by
>
> perf probe -a nvme_change_ctrl_state new_state
I had to build my own perf, as the version from the distro didn't like
my kernel and just crashed. Thus I added a bunch libraries to my build
system. The resulting perf installed the probe and the missing events
show up:
<idle>-0 [000] ..s1. 245.406139: nvme_complete_rq: nvme1: qid=0, cmdid=36865, res=0x0, retries=0, flags=0x0, status=0x371
<idle>-0 [000] .Ns1. 245.407192: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2
kworker/u33:0-65 [000] .N... 245.407258: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2
kworker/u33:1-78 [000] ..... 245.421290: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x3
I double checked my previous setup with kprobes and now the missing
events show up as well. It looks like my setup changes makes clang to
not inline this function anymore, or there is no additional debug info
available?
> But perhaps, some of them are inlined with constant numbers (maybe you can
> find them as nvme_change_ctrl_state.constprop.XXX in kallsyms)
# cat /proc/kallsyms | grep nvme_change_ctrl_state
ffffffffc03dc04d r __kstrtab_nvme_change_ctrl_state [nvme_core]
ffffffffc03dc064 r __kstrtabns_nvme_change_ctrl_state [nvme_core]
ffffffffc03ca130 r __ksymtab_nvme_change_ctrl_state [nvme_core]
ffffffffc0405de0 t __pfx_nvme_change_ctrl_state [nvme_core]
ffffffffc0405df0 t nvme_change_ctrl_state [nvme_core]
This is with the new kernel, so after installing the libraries for the
perf build.
> If my guess is correct, it is not inlined but the constant propagation
> optimization[1] has been done by the compiler.
>
> [1] https://en.wikipedia.org/wiki/Constant_folding#Constant_propagation
>
> IIRC, commit 66f69b219716 ("perf probe: Support DW_AT_const_value constant value")
> should support this case.
Okay, this commit is already in (v5.4), so it was always in.
Anyway, I can't reproduce anymore.
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kprobe not always triggering
2025-01-16 12:04 ` Daniel Wagner
@ 2025-01-17 8:52 ` Masami Hiramatsu
0 siblings, 0 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2025-01-17 8:52 UTC (permalink / raw)
To: Daniel Wagner; +Cc: Steven Rostedt, linux-trace-kernel
On Thu, 16 Jan 2025 13:04:18 +0100
Daniel Wagner <dwagner@suse.de> wrote:
> On Thu, Jan 16, 2025 at 08:48:18AM +0900, Masami Hiramatsu wrote:
> > Have you tried to use `perf probe`? Even if it is inlined, you can find them
> > by
> >
> > perf probe -a nvme_change_ctrl_state new_state
>
> I had to build my own perf, as the version from the distro didn't like
> my kernel and just crashed. Thus I added a bunch libraries to my build
> system. The resulting perf installed the probe and the missing events
> show up:
>
> <idle>-0 [000] ..s1. 245.406139: nvme_complete_rq: nvme1: qid=0, cmdid=36865, res=0x0, retries=0, flags=0x0, status=0x371
> <idle>-0 [000] .Ns1. 245.407192: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2
> kworker/u33:0-65 [000] .N... 245.407258: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x2
> kworker/u33:1-78 [000] ..... 245.421290: nvme_change_ctrl_state: (nvme_change_ctrl_state+0x0/0x180 [nvme_core]) new_state=0x3
>
Nice.
> I double checked my previous setup with kprobes and now the missing
> events show up as well. It looks like my setup changes makes clang to
> not inline this function anymore, or there is no additional debug info
> available?
Hmm, I don't think (don't hope) the clang change the optimization
silently if debuginfo is enabled. This is a kind of heisenbug :(
>
> > But perhaps, some of them are inlined with constant numbers (maybe you can
> > find them as nvme_change_ctrl_state.constprop.XXX in kallsyms)
>
> # cat /proc/kallsyms | grep nvme_change_ctrl_state
> ffffffffc03dc04d r __kstrtab_nvme_change_ctrl_state [nvme_core]
> ffffffffc03dc064 r __kstrtabns_nvme_change_ctrl_state [nvme_core]
> ffffffffc03ca130 r __ksymtab_nvme_change_ctrl_state [nvme_core]
> ffffffffc0405de0 t __pfx_nvme_change_ctrl_state [nvme_core]
> ffffffffc0405df0 t nvme_change_ctrl_state [nvme_core]
>
> This is with the new kernel, so after installing the libraries for the
> perf build.
IIRC, constprop.XXX is gcc's suffix, clang has different one.
>
> > If my guess is correct, it is not inlined but the constant propagation
> > optimization[1] has been done by the compiler.
> >
> > [1] https://en.wikipedia.org/wiki/Constant_folding#Constant_propagation
> >
> > IIRC, commit 66f69b219716 ("perf probe: Support DW_AT_const_value constant value")
> > should support this case.
>
> Okay, this commit is already in (v5.4), so it was always in.
>
> Anyway, I can't reproduce anymore.
Feel free to notice if you see similar issue.
Thank you,
>
> Daniel
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-17 8:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-15 8:26 kprobe not always triggering Daniel Wagner
2025-01-15 15:17 ` Steven Rostedt
2025-01-15 16:29 ` Daniel Wagner
2025-01-15 16:37 ` Steven Rostedt
2025-01-15 16:50 ` Daniel Wagner
2025-01-15 23:48 ` Masami Hiramatsu
2025-01-16 12:04 ` Daniel Wagner
2025-01-17 8:52 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox