* [PATCH v3 1/5] ftrace: Fix accounting of adding subops to a manager ops
2025-02-20 20:20 [PATCH v3 0/5] ftrace: Fix fprobe with function graph accounting Steven Rostedt
@ 2025-02-20 20:20 ` Steven Rostedt
2025-02-20 23:39 ` Masami Hiramatsu
2025-02-20 20:20 ` [PATCH v3 2/5] ftrace: Do not add duplicate entries in subops " Steven Rostedt
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2025-02-20 20:20 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Heiko Carstens, Sven Schnelle, Vasily Gorbik, Alexander Gordeev,
stable
From: Steven Rostedt <rostedt@goodmis.org>
Function graph uses a subops and manager ops mechanism to attach to
ftrace. The manager ops connects to ftrace and the functions it connects
to is defined by a list of subops that it manages.
The function hash that defines what the above ops attaches to limits the
functions to attach if the hash has any content. If the hash is empty, it
means to trace all functions.
The creation of the manager ops hash is done by iterating over all the
subops hashes. If any of the subops hashes is empty, it means that the
manager ops hash must trace all functions as well.
The issue is in the creation of the manager ops. When a second subops is
attached, a new hash is created by starting it as NULL and adding the
subops one at a time. But the NULL ops is mistaken as an empty hash, and
once an empty hash is found, it stops the loop of subops and just enables
all functions.
# echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
kernel_clone (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
# echo "f:myevent2 schedule_timeout" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
trace_initcall_start_cb (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
try_to_run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
x86_pmu_show_pmu_cap (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
cleanup_rapl_pmus (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_free_pcibus_map (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_types_exit (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_pci_exit.part.0 (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
kvm_shutdown (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
vmx_dump_msrs (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
vmx_cleanup_l1d_flush (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
[..]
Fix this by initializing the new hash to NULL and if the hash is NULL do
not treat it as an empty hash but instead allocate by copying the content
of the first sub ops. Then on subsequent iterations, the new hash will not
be NULL, but the content of the previous subops. If that first subops
attached to all functions, then new hash may assume that the manager ops
also needs to attach to all functions.
Cc: stable@vger.kernel.org
Fixes: 5fccc7552ccbc ("ftrace: Add subops logic to allow one ops to manage many")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v2: https://lore.kernel.org/20250219220510.888959028@goodmis.org
- Have append_hashes() return EMPTY_HASH and not NULL if the resulting
new hash is empty.
kernel/trace/ftrace.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 728ecda6e8d4..bec54dc27204 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3220,15 +3220,22 @@ static struct ftrace_hash *copy_hash(struct ftrace_hash *src)
* The filter_hash updates uses just the append_hash() function
* and the notrace_hash does not.
*/
-static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash)
+static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash,
+ int size_bits)
{
struct ftrace_func_entry *entry;
int size;
int i;
- /* An empty hash does everything */
- if (ftrace_hash_empty(*hash))
- return 0;
+ if (*hash) {
+ /* An empty hash does everything */
+ if (ftrace_hash_empty(*hash))
+ return 0;
+ } else {
+ *hash = alloc_ftrace_hash(size_bits);
+ if (!*hash)
+ return -ENOMEM;
+ }
/* If new_hash has everything make hash have everything */
if (ftrace_hash_empty(new_hash)) {
@@ -3292,16 +3299,18 @@ static int intersect_hash(struct ftrace_hash **hash, struct ftrace_hash *new_has
/* Return a new hash that has a union of all @ops->filter_hash entries */
static struct ftrace_hash *append_hashes(struct ftrace_ops *ops)
{
- struct ftrace_hash *new_hash;
+ struct ftrace_hash *new_hash = NULL;
struct ftrace_ops *subops;
+ int size_bits;
int ret;
- new_hash = alloc_ftrace_hash(ops->func_hash->filter_hash->size_bits);
- if (!new_hash)
- return NULL;
+ if (ops->func_hash->filter_hash)
+ size_bits = ops->func_hash->filter_hash->size_bits;
+ else
+ size_bits = FTRACE_HASH_DEFAULT_BITS;
list_for_each_entry(subops, &ops->subop_list, list) {
- ret = append_hash(&new_hash, subops->func_hash->filter_hash);
+ ret = append_hash(&new_hash, subops->func_hash->filter_hash, size_bits);
if (ret < 0) {
free_ftrace_hash(new_hash);
return NULL;
@@ -3310,7 +3319,8 @@ static struct ftrace_hash *append_hashes(struct ftrace_ops *ops)
if (ftrace_hash_empty(new_hash))
break;
}
- return new_hash;
+ /* Can't return NULL as that means this failed */
+ return new_hash ? : EMPTY_HASH;
}
/* Make @ops trace evenything except what all its subops do not trace */
@@ -3505,7 +3515,8 @@ int ftrace_startup_subops(struct ftrace_ops *ops, struct ftrace_ops *subops, int
filter_hash = alloc_and_copy_ftrace_hash(size_bits, ops->func_hash->filter_hash);
if (!filter_hash)
return -ENOMEM;
- ret = append_hash(&filter_hash, subops->func_hash->filter_hash);
+ ret = append_hash(&filter_hash, subops->func_hash->filter_hash,
+ size_bits);
if (ret < 0) {
free_ftrace_hash(filter_hash);
return ret;
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/5] ftrace: Fix accounting of adding subops to a manager ops
2025-02-20 20:20 ` [PATCH v3 1/5] ftrace: Fix accounting of adding subops to a manager ops Steven Rostedt
@ 2025-02-20 23:39 ` Masami Hiramatsu
0 siblings, 0 replies; 10+ messages in thread
From: Masami Hiramatsu @ 2025-02-20 23:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Heiko Carstens, Sven Schnelle,
Vasily Gorbik, Alexander Gordeev, stable
On Thu, 20 Feb 2025 15:20:10 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Function graph uses a subops and manager ops mechanism to attach to
> ftrace. The manager ops connects to ftrace and the functions it connects
> to is defined by a list of subops that it manages.
>
> The function hash that defines what the above ops attaches to limits the
> functions to attach if the hash has any content. If the hash is empty, it
> means to trace all functions.
>
> The creation of the manager ops hash is done by iterating over all the
> subops hashes. If any of the subops hashes is empty, it means that the
> manager ops hash must trace all functions as well.
>
> The issue is in the creation of the manager ops. When a second subops is
> attached, a new hash is created by starting it as NULL and adding the
> subops one at a time. But the NULL ops is mistaken as an empty hash, and
> once an empty hash is found, it stops the loop of subops and just enables
> all functions.
>
> # echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events
> # cat /sys/kernel/tracing/enabled_functions
> kernel_clone (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
>
> # echo "f:myevent2 schedule_timeout" >> /sys/kernel/tracing/dynamic_events
> # cat /sys/kernel/tracing/enabled_functions
> trace_initcall_start_cb (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> try_to_run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> x86_pmu_show_pmu_cap (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> cleanup_rapl_pmus (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> uncore_free_pcibus_map (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> uncore_types_exit (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> uncore_pci_exit.part.0 (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> kvm_shutdown (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> vmx_dump_msrs (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> vmx_cleanup_l1d_flush (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
> [..]
>
> Fix this by initializing the new hash to NULL and if the hash is NULL do
> not treat it as an empty hash but instead allocate by copying the content
> of the first sub ops. Then on subsequent iterations, the new hash will not
> be NULL, but the content of the previous subops. If that first subops
> attached to all functions, then new hash may assume that the manager ops
> also needs to attach to all functions.
>
Looks good to me.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Thanks,
> Cc: stable@vger.kernel.org
> Fixes: 5fccc7552ccbc ("ftrace: Add subops logic to allow one ops to manage many")
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> Changes since v2: https://lore.kernel.org/20250219220510.888959028@goodmis.org
>
> - Have append_hashes() return EMPTY_HASH and not NULL if the resulting
> new hash is empty.
>
> kernel/trace/ftrace.c | 33 ++++++++++++++++++++++-----------
> 1 file changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 728ecda6e8d4..bec54dc27204 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -3220,15 +3220,22 @@ static struct ftrace_hash *copy_hash(struct ftrace_hash *src)
> * The filter_hash updates uses just the append_hash() function
> * and the notrace_hash does not.
> */
> -static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash)
> +static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash,
> + int size_bits)
> {
> struct ftrace_func_entry *entry;
> int size;
> int i;
>
> - /* An empty hash does everything */
> - if (ftrace_hash_empty(*hash))
> - return 0;
> + if (*hash) {
> + /* An empty hash does everything */
> + if (ftrace_hash_empty(*hash))
> + return 0;
> + } else {
> + *hash = alloc_ftrace_hash(size_bits);
> + if (!*hash)
> + return -ENOMEM;
> + }
>
> /* If new_hash has everything make hash have everything */
> if (ftrace_hash_empty(new_hash)) {
> @@ -3292,16 +3299,18 @@ static int intersect_hash(struct ftrace_hash **hash, struct ftrace_hash *new_has
> /* Return a new hash that has a union of all @ops->filter_hash entries */
> static struct ftrace_hash *append_hashes(struct ftrace_ops *ops)
> {
> - struct ftrace_hash *new_hash;
> + struct ftrace_hash *new_hash = NULL;
> struct ftrace_ops *subops;
> + int size_bits;
> int ret;
>
> - new_hash = alloc_ftrace_hash(ops->func_hash->filter_hash->size_bits);
> - if (!new_hash)
> - return NULL;
> + if (ops->func_hash->filter_hash)
> + size_bits = ops->func_hash->filter_hash->size_bits;
> + else
> + size_bits = FTRACE_HASH_DEFAULT_BITS;
>
> list_for_each_entry(subops, &ops->subop_list, list) {
> - ret = append_hash(&new_hash, subops->func_hash->filter_hash);
> + ret = append_hash(&new_hash, subops->func_hash->filter_hash, size_bits);
> if (ret < 0) {
> free_ftrace_hash(new_hash);
> return NULL;
> @@ -3310,7 +3319,8 @@ static struct ftrace_hash *append_hashes(struct ftrace_ops *ops)
> if (ftrace_hash_empty(new_hash))
> break;
> }
> - return new_hash;
> + /* Can't return NULL as that means this failed */
> + return new_hash ? : EMPTY_HASH;
> }
>
> /* Make @ops trace evenything except what all its subops do not trace */
> @@ -3505,7 +3515,8 @@ int ftrace_startup_subops(struct ftrace_ops *ops, struct ftrace_ops *subops, int
> filter_hash = alloc_and_copy_ftrace_hash(size_bits, ops->func_hash->filter_hash);
> if (!filter_hash)
> return -ENOMEM;
> - ret = append_hash(&filter_hash, subops->func_hash->filter_hash);
> + ret = append_hash(&filter_hash, subops->func_hash->filter_hash,
> + size_bits);
> if (ret < 0) {
> free_ftrace_hash(filter_hash);
> return ret;
> --
> 2.47.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/5] ftrace: Do not add duplicate entries in subops manager ops
2025-02-20 20:20 [PATCH v3 0/5] ftrace: Fix fprobe with function graph accounting Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 1/5] ftrace: Fix accounting of adding subops to a manager ops Steven Rostedt
@ 2025-02-20 20:20 ` Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 3/5] fprobe: Always unregister fgraph function from ops Steven Rostedt
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2025-02-20 20:20 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Heiko Carstens, Sven Schnelle, Vasily Gorbik, Alexander Gordeev,
stable
From: Steven Rostedt <rostedt@goodmis.org>
Check if a function is already in the manager ops of a subops. A manager
ops contains multiple subops, and if two or more subops are tracing the
same function, the manager ops only needs a single entry in its hash.
Cc: stable@vger.kernel.org
Fixes: 4f554e955614f ("ftrace: Add ftrace_set_filter_ips function")
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ftrace.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index bec54dc27204..6b0c25761ccb 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5718,6 +5718,9 @@ __ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
return -ENOENT;
free_hash_entry(hash, entry);
return 0;
+ } else if (__ftrace_lookup_ip(hash, ip) != NULL) {
+ /* Already exists */
+ return 0;
}
entry = add_hash_entry(hash, ip);
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/5] fprobe: Always unregister fgraph function from ops
2025-02-20 20:20 [PATCH v3 0/5] ftrace: Fix fprobe with function graph accounting Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 1/5] ftrace: Fix accounting of adding subops to a manager ops Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 2/5] ftrace: Do not add duplicate entries in subops " Steven Rostedt
@ 2025-02-20 20:20 ` Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 4/5] fprobe: Fix accounting of when to unregister from function graph Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file Steven Rostedt
4 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2025-02-20 20:20 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Heiko Carstens, Sven Schnelle, Vasily Gorbik, Alexander Gordeev,
stable
From: Steven Rostedt <rostedt@goodmis.org>
When the last fprobe is removed, it calls unregister_ftrace_graph() to
remove the graph_ops from function graph. The issue is when it does so, it
calls return before removing the function from its graph ops via
ftrace_set_filter_ips(). This leaves the last function lingering in the
fprobe's fgraph ops and if a probe is added it also enables that last
function (even though the callback will just drop it, it does add unneeded
overhead to make that call).
# echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
kernel_clone (1) tramp: 0xffffffffc02f3000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
# echo "f:myevent2 schedule_timeout" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
kernel_clone (1) tramp: 0xffffffffc02f3000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
schedule_timeout (1) tramp: 0xffffffffc02f3000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
# > /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
# echo "f:myevent3 kmem_cache_free" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
kmem_cache_free (1) tramp: 0xffffffffc0219000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
schedule_timeout (1) tramp: 0xffffffffc0219000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
The above enabled a fprobe on kernel_clone, and then on schedule_timeout.
The content of the enabled_functions shows the functions that have a
callback attached to them. The fprobe attached to those functions
properly. Then the fprobes were cleared, and enabled_functions was empty
after that. But after adding a fprobe on kmem_cache_free, the
enabled_functions shows that the schedule_timeout was attached again. This
is because it was still left in the fprobe ops that is used to tell
function graph what functions it wants callbacks from.
Cc: stable@vger.kernel.org
Fixes: 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/fprobe.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 2560b312ad57..62e8f7d56602 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -403,11 +403,9 @@ static void fprobe_graph_remove_ips(unsigned long *addrs, int num)
lockdep_assert_held(&fprobe_mutex);
fprobe_graph_active--;
- if (!fprobe_graph_active) {
- /* Q: should we unregister it ? */
+ /* Q: should we unregister it ? */
+ if (!fprobe_graph_active)
unregister_ftrace_graph(&fprobe_graph_ops);
- return;
- }
ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 1, 0);
}
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 4/5] fprobe: Fix accounting of when to unregister from function graph
2025-02-20 20:20 [PATCH v3 0/5] ftrace: Fix fprobe with function graph accounting Steven Rostedt
` (2 preceding siblings ...)
2025-02-20 20:20 ` [PATCH v3 3/5] fprobe: Always unregister fgraph function from ops Steven Rostedt
@ 2025-02-20 20:20 ` Steven Rostedt
2025-02-20 20:20 ` [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file Steven Rostedt
4 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2025-02-20 20:20 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Heiko Carstens, Sven Schnelle, Vasily Gorbik, Alexander Gordeev,
stable
From: Steven Rostedt <rostedt@goodmis.org>
When adding a new fprobe, it will update the function hash to the
functions the fprobe is attached to and register with function graph to
have it call the registered functions. The fprobe_graph_active variable
keeps track of the number of fprobes that are using function graph.
If two fprobes attach to the same function, it increments the
fprobe_graph_active for each of them. But when they are removed, the first
fprobe to be removed will see that the function it is attached to is also
used by another fprobe and it will not remove that function from
function_graph. The logic will skip decrementing the fprobe_graph_active
variable.
This causes the fprobe_graph_active variable to not go to zero when all
fprobes are removed, and in doing so it does not unregister from
function graph. As the fgraph ops hash will now be empty, and an empty
filter hash means all functions are enabled, this triggers function graph
to add a callback to the fprobe infrastructure for every function!
# echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events
# echo "f:myevent2 kernel_clone%return" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
kernel_clone (1) tramp: 0xffffffffc0024000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
# > /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
trace_initcall_start_cb (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
run_init_process (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
try_to_run_init_process (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
x86_pmu_show_pmu_cap (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
cleanup_rapl_pmus (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
uncore_free_pcibus_map (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
uncore_types_exit (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
uncore_pci_exit.part.0 (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
kvm_shutdown (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
vmx_dump_msrs (1) tramp: 0xffffffffc0026000 (function_trace_call+0x0/0x170) ->function_trace_call+0x0/0x170
[..]
# cat /sys/kernel/tracing/enabled_functions | wc -l
54702
If a fprobe is being removed and all its functions are also traced by
other fprobes, still decrement the fprobe_graph_active counter.
Cc: stable@vger.kernel.org
Fixes: 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
Closes: https://lore.kernel.org/all/20250217114918.10397-A-hca@linux.ibm.com/
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/fprobe.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 62e8f7d56602..33082c4e8154 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -407,7 +407,8 @@ static void fprobe_graph_remove_ips(unsigned long *addrs, int num)
if (!fprobe_graph_active)
unregister_ftrace_graph(&fprobe_graph_ops);
- ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 1, 0);
+ if (num)
+ ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 1, 0);
}
static int symbols_cmp(const void *a, const void *b)
@@ -677,8 +678,7 @@ int unregister_fprobe(struct fprobe *fp)
}
del_fprobe_hash(fp);
- if (count)
- fprobe_graph_remove_ips(addrs, count);
+ fprobe_graph_remove_ips(addrs, count);
kfree_rcu(hlist_array, rcu);
fp->hlist_array = NULL;
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file
2025-02-20 20:20 [PATCH v3 0/5] ftrace: Fix fprobe with function graph accounting Steven Rostedt
` (3 preceding siblings ...)
2025-02-20 20:20 ` [PATCH v3 4/5] fprobe: Fix accounting of when to unregister from function graph Steven Rostedt
@ 2025-02-20 20:20 ` Steven Rostedt
2025-02-26 10:50 ` Heiko Carstens
4 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2025-02-20 20:20 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Heiko Carstens, Sven Schnelle, Vasily Gorbik, Alexander Gordeev
From: Steven Rostedt <rostedt@goodmis.org>
A few bugs were found in the fprobe accounting logic along with it using
the function graph infrastructure. Update the fprobe selftest to catch
those bugs in case they or something similar shows up in the future.
The test now checks the enabled_functions file which shows all the
functions attached to ftrace or fgraph. When enabling a fprobe, make sure
that its corresponding function is also added to that file. Also add two
more fprobes to enable to make sure that the fprobe logic works properly
with multiple probes.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
.../test.d/dynevent/add_remove_fprobe.tc | 54 +++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
index dc25bcf4f9e2..449f9d8be746 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
@@ -7,12 +7,38 @@ echo 0 > events/enable
echo > dynamic_events
PLACE=$FUNCTION_FORK
+PLACE2="kmem_cache_free"
+PLACE3="schedule_timeout"
echo "f:myevent1 $PLACE" >> dynamic_events
+
+# Make sure the event is attached and is the only one
+grep -q $PLACE enabled_functions
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 1 ]; then
+ exit_fail
+fi
+
echo "f:myevent2 $PLACE%return" >> dynamic_events
+# It should till be the only attached function
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 1 ]; then
+ exit_fail
+fi
+
+# add another event
+echo "f:myevent3 $PLACE2" >> dynamic_events
+
+grep -q $PLACE2 enabled_functions
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 2 ]; then
+ exit_fail
+fi
+
grep -q myevent1 dynamic_events
grep -q myevent2 dynamic_events
+grep -q myevent3 dynamic_events
test -d events/fprobes/myevent1
test -d events/fprobes/myevent2
@@ -21,6 +47,34 @@ echo "-:myevent2" >> dynamic_events
grep -q myevent1 dynamic_events
! grep -q myevent2 dynamic_events
+# should still have 2 left
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 2 ]; then
+ exit_fail
+fi
+
echo > dynamic_events
+# Should have none left
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 0 ]; then
+ exit_fail
+fi
+
+echo "f:myevent4 $PLACE" >> dynamic_events
+
+# Should only have one enabled
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 1 ]; then
+ exit_fail
+fi
+
+echo > dynamic_events
+
+# Should have none left
+cnt=`cat enabled_functions | wc -l`
+if [ $cnt -ne 0 ]; then
+ exit_fail
+fi
+
clear_trace
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file
2025-02-20 20:20 ` [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file Steven Rostedt
@ 2025-02-26 10:50 ` Heiko Carstens
2025-02-26 13:47 ` Steven Rostedt
0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2025-02-26 10:50 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Sven Schnelle, Vasily Gorbik,
Alexander Gordeev
On Thu, Feb 20, 2025 at 03:20:14PM -0500, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> A few bugs were found in the fprobe accounting logic along with it using
> the function graph infrastructure. Update the fprobe selftest to catch
> those bugs in case they or something similar shows up in the future.
>
> The test now checks the enabled_functions file which shows all the
> functions attached to ftrace or fgraph. When enabling a fprobe, make sure
> that its corresponding function is also added to that file. Also add two
> more fprobes to enable to make sure that the fprobe logic works properly
> with multiple probes.
>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Tested-by: Heiko Carstens <hca@linux.ibm.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> .../test.d/dynevent/add_remove_fprobe.tc | 54 +++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
> index dc25bcf4f9e2..449f9d8be746 100644
> --- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
> @@ -7,12 +7,38 @@ echo 0 > events/enable
> echo > dynamic_events
>
> PLACE=$FUNCTION_FORK
> +PLACE2="kmem_cache_free"
> +PLACE3="schedule_timeout"
>
> echo "f:myevent1 $PLACE" >> dynamic_events
> +
> +# Make sure the event is attached and is the only one
> +grep -q $PLACE enabled_functions
> +cnt=`cat enabled_functions | wc -l`
> +if [ $cnt -ne 1 ]; then
> + exit_fail
> +fi
Bah.. :) this doesn't work always, since at least with Fedora 41 the
assumption that there are zero enabled functions before this test is
executed is not necessarily true:
# cat tracing/enabled_functions
free_user_ns (1) R
bpf_lsm_path_mkdir (1) R D M tramp: ftrace_regs_caller+0x0/0x68 (call_direct_funcs+0x0/0x20)
direct-->bpf_trampoline_6442505669+0x0/0x148
bpf_lsm_path_mknod (1) R D M tramp: ftrace_regs_caller+0x0/0x68 (call_direct_funcs+0x0/0x20)
direct-->bpf_trampoline_6442505671+0x0/0x14e
...
I didn't stumble across this before, since I tried a monolithic kernel
without modules when verifying your series; and then there aren't any
enabled functions. But with modules there are.
This could be worked around for example with something like the patch
below (against linux-next). But no idea what your preferred way to
handle this would be.
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
index 449f9d8be746..b0f24c57b8e1 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
@@ -10,12 +10,14 @@ PLACE=$FUNCTION_FORK
PLACE2="kmem_cache_free"
PLACE3="schedule_timeout"
+ocnt=`cat enabled_functions | wc -l`
+
echo "f:myevent1 $PLACE" >> dynamic_events
# Make sure the event is attached and is the only one
grep -q $PLACE enabled_functions
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 1 ]; then
+if [ $cnt -ne $((ocnt + 1)) ]; then
exit_fail
fi
@@ -23,7 +25,7 @@ echo "f:myevent2 $PLACE%return" >> dynamic_events
# It should till be the only attached function
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 1 ]; then
+if [ $cnt -ne $((ocnt + 1)) ]; then
exit_fail
fi
@@ -32,7 +34,7 @@ echo "f:myevent3 $PLACE2" >> dynamic_events
grep -q $PLACE2 enabled_functions
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 2 ]; then
+if [ $cnt -ne $((ocnt + 2)) ]; then
exit_fail
fi
@@ -49,7 +51,7 @@ grep -q myevent1 dynamic_events
# should still have 2 left
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 2 ]; then
+if [ $cnt -ne $((ocnt + 2)) ]; then
exit_fail
fi
@@ -57,7 +59,7 @@ echo > dynamic_events
# Should have none left
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 0 ]; then
+if [ $cnt -ne $ocnt ]; then
exit_fail
fi
@@ -65,7 +67,7 @@ echo "f:myevent4 $PLACE" >> dynamic_events
# Should only have one enabled
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 1 ]; then
+if [ $cnt -ne $((ocnt + 1)) ]; then
exit_fail
fi
@@ -73,7 +75,7 @@ echo > dynamic_events
# Should have none left
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne 0 ]; then
+if [ $cnt -ne $ocnt ]; then
exit_fail
fi
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file
2025-02-26 10:50 ` Heiko Carstens
@ 2025-02-26 13:47 ` Steven Rostedt
2025-02-26 13:57 ` Heiko Carstens
0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2025-02-26 13:47 UTC (permalink / raw)
To: Heiko Carstens
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Sven Schnelle, Vasily Gorbik,
Alexander Gordeev
On Wed, 26 Feb 2025 11:50:28 +0100
Heiko Carstens <hca@linux.ibm.com> wrote:
> Bah.. :) this doesn't work always, since at least with Fedora 41 the
> assumption that there are zero enabled functions before this test is
> executed is not necessarily true:
>
> # cat tracing/enabled_functions
> free_user_ns (1) R
> bpf_lsm_path_mkdir (1) R D M tramp: ftrace_regs_caller+0x0/0x68 (call_direct_funcs+0x0/0x20)
> direct-->bpf_trampoline_6442505669+0x0/0x148
> bpf_lsm_path_mknod (1) R D M tramp: ftrace_regs_caller+0x0/0x68 (call_direct_funcs+0x0/0x20)
> direct-->bpf_trampoline_6442505671+0x0/0x14e
After I submitted the patches, I then remembered that some user space tools
add BPF programs that attach to functions, and those will show up in the
enabled_functions table (that's a feature as it is always good to know what
is modifying your kernel!). And I figured it will break this test.
I decided to wait until someone complains about it before fixing it ;-)
> ...
>
> I didn't stumble across this before, since I tried a monolithic kernel
> without modules when verifying your series; and then there aren't any
> enabled functions. But with modules there are.
>
> This could be worked around for example with something like the patch
> below (against linux-next). But no idea what your preferred way to
> handle this would be.
Actually, when I thought about fixing this, your patch is pretty much what
I was thinking of doing.
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] selftests/ftrace: Update fprobe test to check enabled_functions file
2025-02-26 13:47 ` Steven Rostedt
@ 2025-02-26 13:57 ` Heiko Carstens
0 siblings, 0 replies; 10+ messages in thread
From: Heiko Carstens @ 2025-02-26 13:57 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Sven Schnelle, Vasily Gorbik,
Alexander Gordeev
On Wed, Feb 26, 2025 at 08:47:59AM -0500, Steven Rostedt wrote:
> On Wed, 26 Feb 2025 11:50:28 +0100
> Heiko Carstens <hca@linux.ibm.com> wrote:
> > # cat tracing/enabled_functions
> > free_user_ns (1) R
> > bpf_lsm_path_mkdir (1) R D M tramp: ftrace_regs_caller+0x0/0x68 (call_direct_funcs+0x0/0x20)
> > direct-->bpf_trampoline_6442505669+0x0/0x148
> > bpf_lsm_path_mknod (1) R D M tramp: ftrace_regs_caller+0x0/0x68 (call_direct_funcs+0x0/0x20)
> > direct-->bpf_trampoline_6442505671+0x0/0x14e
>
> After I submitted the patches, I then remembered that some user space tools
> add BPF programs that attach to functions, and those will show up in the
> enabled_functions table (that's a feature as it is always good to know what
> is modifying your kernel!). And I figured it will break this test.
>
> I decided to wait until someone complains about it before fixing it ;-)
...
> >
> > This could be worked around for example with something like the patch
> > below (against linux-next). But no idea what your preferred way to
> > handle this would be.
>
> Actually, when I thought about fixing this, your patch is pretty much what
> I was thinking of doing.
Ok, I'll send a proper patch then.
^ permalink raw reply [flat|nested] 10+ messages in thread