* [for-linus][PATCH 0/2] tracing: Fixes for v6.8-rc3
@ 2024-02-09 10:22 Steven Rostedt
2024-02-09 10:22 ` [for-linus][PATCH 1/2] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default Steven Rostedt
2024-02-09 10:22 ` [for-linus][PATCH 2/2] tracing: Fix wasted memory in saved_cmdlines logic Steven Rostedt
0 siblings, 2 replies; 4+ messages in thread
From: Steven Rostedt @ 2024-02-09 10:22 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
Tracing fixes for v6.8-rc3:
- Fix broken direct trampolines being called when another callback is
attached the same function. ARM 64 does not support FTRACE_WITH_REGS, and
when it added direct trampoline calls from ftrace, it removed the
"WITH_REGS" flag from the ftrace_ops for direct trampolines. This broke
x86 as x86 requires direct trampolines to have WITH_REGS. This wasn't
noticed because direct trampolines work as long as the function it is
attached to is not shared with other callbacks (like the function tracer).
When there's other callbacks, a helper trampoline is called, to call all
the non direct callbacks and when it returns, the direct trampoline is
called. For x86, the direct trampoline sets a flag in the regs field to
tell the x86 specific code to call the direct trampoline. But this only
works if the ftrace_ops had WITH_REGS set. ARM does things differently
that does not require this. For now, set WITH_REGS if the arch supports
WITH_REGS (which ARM does not), and this makes it work for both ARM64 and
x86.
- Fix wasted memory in the saved_cmdlines logic.
The saved_cmdlines is a cache that maps PIDs to COMMs that tracing can
use. Most trace events only save the PID in the event. The saved_cmdlines
file lists PIDs to COMMs so that the tracing tools can show an actual name
and not just a PID for each event. There's an array of PIDs that map to a
small set of saved COMM strings. The array is set to PID_MAX_DEFAULT which
is usually set to 32768. When a PID comes in, it will add itself to this
array along with the index into the COMM array (note if the system allows
more than PID_MAX_DEFAULT, this cache is similar to cache lines as an
update of a PID that has the same PID_MAX_DEFAULT bits set will flush out
another task with the same matching bits set).
A while ago, the size of this cache was changed to be dynamic and the
array was moved into a structure and created with kmalloc(). But this
new structure had the size of 131104 bytes, or 0x20020 in hex. As kmalloc
allocates in powers of two, it was actually allocating 0x40000 bytes
(262144) leaving 131040 bytes of wasted memory. The last element of this
structure was a pointer to the COMM string array which defaulted to just
saving 128 COMMS.
By changing the last field of this structure to a variable length string,
and just having it round up to fill the allocated memory, the default
size of the saved COMM cache is now 8190. This not only uses the wasted
space, but actually saves space by removing the extra allocation for the
COMM names.
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/urgent
Head SHA1: f2d3d1fa979e0a8e418a03e96902a9e6b4a6e9ae
Masami Hiramatsu (Google) (1):
ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
Steven Rostedt (Google) (1):
tracing: Fix wasted memory in saved_cmdlines logic
----
kernel/trace/ftrace.c | 10 +++++++
kernel/trace/trace.c | 73 ++++++++++++++++++++++++---------------------------
2 files changed, 44 insertions(+), 39 deletions(-)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [for-linus][PATCH 1/2] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
2024-02-09 10:22 [for-linus][PATCH 0/2] tracing: Fixes for v6.8-rc3 Steven Rostedt
@ 2024-02-09 10:22 ` Steven Rostedt
2024-02-09 10:22 ` [for-linus][PATCH 2/2] tracing: Fix wasted memory in saved_cmdlines logic Steven Rostedt
1 sibling, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2024-02-09 10:22 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Florent Revest, Jiri Olsa
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
are multiple ftrace_ops at the same function, but since the x86 only
support to jump to direct_call from ftrace_regs_caller, when we set
the function tracer on the same target function on x86, ftrace-direct
does not work as below (this actually works on arm64.)
At first, insmod ftrace-direct.ko to put a direct_call on
'wake_up_process()'.
# insmod kernel/samples/ftrace/ftrace-direct.ko
# less trace
...
<idle>-0 [006] ..s1. 564.686958: my_direct_func: waking up rcu_preempt-17
<idle>-0 [007] ..s1. 564.687836: my_direct_func: waking up kcompactd0-63
<idle>-0 [006] ..s1. 564.690926: my_direct_func: waking up rcu_preempt-17
<idle>-0 [006] ..s1. 564.696872: my_direct_func: waking up rcu_preempt-17
<idle>-0 [007] ..s1. 565.191982: my_direct_func: waking up kcompactd0-63
Setup a function filter to the 'wake_up_process' too, and enable it.
# cd /sys/kernel/tracing/
# echo wake_up_process > set_ftrace_filter
# echo function > current_tracer
# less trace
...
<idle>-0 [006] ..s3. 686.180972: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s3. 686.186919: wake_up_process <-call_timer_fn
<idle>-0 [002] ..s3. 686.264049: wake_up_process <-call_timer_fn
<idle>-0 [002] d.h6. 686.515216: wake_up_process <-kick_pool
<idle>-0 [002] d.h6. 686.691386: wake_up_process <-kick_pool
Then, only function tracer is shown on x86.
But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
on the same function, it is shown again.
# echo 'p wake_up_process' >> dynamic_events
# echo 1 > events/kprobes/p_wake_up_process_0/enable
# echo > trace
# less trace
...
<idle>-0 [006] ..s2. 2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20)
<idle>-0 [006] ..s3. 2710.345923: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s1. 2710.345928: my_direct_func: waking up rcu_preempt-17
<idle>-0 [006] ..s2. 2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20)
<idle>-0 [006] ..s3. 2710.349934: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s1. 2710.349937: my_direct_func: waking up rcu_preempt-17
To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
direct_call by default.
Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390270842.stgit@devnote2
Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS")
Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ftrace.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b01ae7d36021..c060d5b47910 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
static int register_ftrace_function_nolock(struct ftrace_ops *ops);
+/*
+ * If there are multiple ftrace_ops, use SAVE_REGS by default, so that direct
+ * call will be jumped from ftrace_regs_caller. Only if the architecture does
+ * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it
+ * jumps from ftrace_caller for multiple ftrace_ops.
+ */
+#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS
#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
+#else
+#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
+#endif
static int check_direct_multi(struct ftrace_ops *ops)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [for-linus][PATCH 2/2] tracing: Fix wasted memory in saved_cmdlines logic
2024-02-09 10:22 [for-linus][PATCH 0/2] tracing: Fixes for v6.8-rc3 Steven Rostedt
2024-02-09 10:22 ` [for-linus][PATCH 1/2] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default Steven Rostedt
@ 2024-02-09 10:22 ` Steven Rostedt
2024-02-09 11:08 ` Steven Rostedt
1 sibling, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2024-02-09 10:22 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Vincent Donnefort, Sven Schnelle, Mete Durlu
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
While looking at improving the saved_cmdlines cache I found a huge amount
of wasted memory that should be used for the cmdlines.
The tracing data saves pids during the trace. At sched switch, if a trace
occurred, it will save the comm of the task that did the trace. This is
saved in a "cache" that maps pids to comms and exposed to user space via
the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
default 128 comms.
The structure that uses this creates an array to store the pids using
PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
to be of the size of 131104 bytes on 64 bit machines.
In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
this structure. That leaves 131040 bytes of wasted space.
Worse, the structure points to an allocated array to store the comm names,
which is 16 bytes times the amount of names to save (currently 128), which
is 2048 bytes. Instead of allocating a separate array, make the structure
end with a variable length string and use the extra space for that.
This is similar to a recommendation that Linus had made about eventfs_inode names:
https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/
Instead of allocating a separate string array to hold the saved comms,
have the structure end with: char saved_cmdlines[]; and round up to the
next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
It will use this extra space for the saved_cmdline portion.
Now, instead of saving only 128 comms by default, by using this wasted
space at the end of the structure it can save over 8000 comms and even
saves space by removing the need for allocating the other array.
Link: https://lore.kernel.org/linux-trace-kernel/20240208105328.7e73f71d@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 73 +++++++++++++++++++++-----------------------
1 file changed, 34 insertions(+), 39 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2a7c6fd934e9..ea6d13ff256c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2320,7 +2320,7 @@ struct saved_cmdlines_buffer {
unsigned *map_cmdline_to_pid;
unsigned cmdline_num;
int cmdline_idx;
- char *saved_cmdlines;
+ char saved_cmdlines[];
};
static struct saved_cmdlines_buffer *savedcmd;
@@ -2334,47 +2334,54 @@ static inline void set_cmdline(int idx, const char *cmdline)
strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN);
}
-static int allocate_cmdlines_buffer(unsigned int val,
- struct saved_cmdlines_buffer *s)
+static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
+{
+ int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
+
+ kfree(s->map_cmdline_to_pid);
+ free_pages((unsigned long)s, order);
+}
+
+static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
{
+ struct saved_cmdlines_buffer *s;
+ struct page *page;
+ int orig_size, size;
+ int order;
+
+ /* Figure out how much is needed to hold the given number of cmdlines */
+ orig_size = sizeof(*s) + val * TASK_COMM_LEN;
+ order = get_order(orig_size);
+ size = 1 << (order + PAGE_SHIFT);
+ page = alloc_pages(GFP_KERNEL, order);
+ if (!page)
+ return NULL;
+
+ s = page_address(page);
+ memset(s, 0, sizeof(*s));
+
+ /* Round up to actual allocation */
+ val = (size - sizeof(*s)) / TASK_COMM_LEN;
+ s->cmdline_num = val;
+
s->map_cmdline_to_pid = kmalloc_array(val,
sizeof(*s->map_cmdline_to_pid),
GFP_KERNEL);
- if (!s->map_cmdline_to_pid)
- return -ENOMEM;
-
- s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
- if (!s->saved_cmdlines) {
- kfree(s->map_cmdline_to_pid);
- return -ENOMEM;
- }
s->cmdline_idx = 0;
- s->cmdline_num = val;
memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
sizeof(s->map_pid_to_cmdline));
memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
val * sizeof(*s->map_cmdline_to_pid));
- return 0;
+ return s;
}
static int trace_create_savedcmd(void)
{
- int ret;
-
- savedcmd = kmalloc(sizeof(*savedcmd), GFP_KERNEL);
- if (!savedcmd)
- return -ENOMEM;
-
- ret = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT, savedcmd);
- if (ret < 0) {
- kfree(savedcmd);
- savedcmd = NULL;
- return -ENOMEM;
- }
+ savedcmd = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT);
- return 0;
+ return savedcmd ? 0 : -ENOMEM;
}
int is_tracing_stopped(void)
@@ -6056,26 +6063,14 @@ tracing_saved_cmdlines_size_read(struct file *filp, char __user *ubuf,
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
-static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
-{
- kfree(s->saved_cmdlines);
- kfree(s->map_cmdline_to_pid);
- kfree(s);
-}
-
static int tracing_resize_saved_cmdlines(unsigned int val)
{
struct saved_cmdlines_buffer *s, *savedcmd_temp;
- s = kmalloc(sizeof(*s), GFP_KERNEL);
+ s = allocate_cmdlines_buffer(val);
if (!s)
return -ENOMEM;
- if (allocate_cmdlines_buffer(val, s) < 0) {
- kfree(s);
- return -ENOMEM;
- }
-
preempt_disable();
arch_spin_lock(&trace_cmdline_lock);
savedcmd_temp = savedcmd;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [for-linus][PATCH 2/2] tracing: Fix wasted memory in saved_cmdlines logic
2024-02-09 10:22 ` [for-linus][PATCH 2/2] tracing: Fix wasted memory in saved_cmdlines logic Steven Rostedt
@ 2024-02-09 11:08 ` Steven Rostedt
0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2024-02-09 11:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Vincent Donnefort, Sven Schnelle, Mete Durlu
On Fri, 09 Feb 2024 05:22:22 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> s->map_cmdline_to_pid = kmalloc_array(val,
> sizeof(*s->map_cmdline_to_pid),
> GFP_KERNEL);
> - if (!s->map_cmdline_to_pid)
> - return -ENOMEM;
> -
The above was accidentally removed. Putting it back (with an update for
the new algorithm) and retesting now.
-- Steve
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-09 11:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-09 10:22 [for-linus][PATCH 0/2] tracing: Fixes for v6.8-rc3 Steven Rostedt
2024-02-09 10:22 ` [for-linus][PATCH 1/2] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default Steven Rostedt
2024-02-09 10:22 ` [for-linus][PATCH 2/2] tracing: Fix wasted memory in saved_cmdlines logic Steven Rostedt
2024-02-09 11:08 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox