Linux Trace Kernel
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	"Michal Koutný" <mkoutny@suse.com>
Subject: Re: [PATCH 2/2] tracing: Keep pid and comm[] in the same structure
Date: Tue, 30 Jun 2026 11:01:56 +0100	[thread overview]
Message-ID: <20260630110156.5314e2e6@pumpkin> (raw)
In-Reply-To: <20260629164912.4c1c2855@robin>

On Mon, 29 Jun 2026 16:49:12 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 26 Jun 2026 22:23:56 +0100
> David Laight <david.laight.linux@gmail.com> wrote:
> 
> > Rather than have two separate dynamic arrays on the end of struct
> > saved_commandlines_buffer have a single dynamic array where each
> > entry contains the pid and associated task->comm[].
> > This simplifies the initialisation and lookup.
> > 
> > Don't bother trying to initialise the pid field no a non-zero value,
> > it only matters in the tracing_saved_cmdlines_seq_ops code.
> > Allocate entry [0] first so that the tracing_saved_cmdlines_seq_ops
> > code can just index the array with the file offset.
> > 
> > The code now uses the correct size when determining the page 'order'
> > to free the structure. The smaller size will always give the same
> > 'order'.
> > 
> > Signed-off-by: David Laight <david.laight.linux@gmail.com>
> > ---
> > 
> > Is there any reason why this code uses alloc_pages() rather
> > than vmalloc()?  
> 
> It's been a long time since I worked on this, but IIRC, it was to keep
> the pressure down on the TLB when tracing. It updates at every
> sched_switch that has a trace event occurring so, I likely used normal
> pages which are part of the huge pages the kernel sets up and doesn't
> affect the TLB as much. vmalloc does have impact on the TLB pressure,
> and tracing should always try to avoid that.

Isn't this a cache so that the pid numbers can be converted to strings
when the trace is read out after the actual process has exited?
That does mean that cache doesn't need to be updated on every trace
request - it might be enough to just save on process exit and lookup the
pid itself for running processes (the whole thing relies on pids not
being reused).

> 
> > map_pid_to_cmdline[] is 64k*sizeof(int) so the whole structure
> > expands to 512k with about 64k/20 (about 3200) pid entries even
> > though the default is 128.  
> 
> That's because it is not dynamic. That array needs to be able to hold
> most PIDs. The default is 128 but it will expand to how much it can
> hold to allocate the full map_pid_to_cmdline. The real default for 4098
> page sized architectures is 6552 entries.

That is double my 'quick calculation' - but both are a lot of entries.

> > AFAICT there is only one copy of the data - so it could be static.
> > Perhaps with pointers to map_pid_cmdline[] and (after this patch)
> > pid_comm[], both of which could be separately resized.  
> 
> map_pid_t_cmdline[] is to hold the PID_MAX_DEFAULT amount of PIDs to
> avoid collisions. I wouldn't resize it.

If comm[] is only saved on process exit you'd likely get away with far
fewer entries - getting collisions for processes that have exited is
rather unlikely.
(I wonder if I could make that work.)

Does that memory get allocated at boot time?
512k is a lot to allocated for a feature that won't usually be used.
OTOH you won't reliably get that much contiguous memory later on.
Deferring to a later time (maybe as late as the first tracing_on())
might be more reasonable - but that would have to use vmalloc().

I'm also not sure about the code that lets you trace from boot.
That must be able to initialise early - but I'm not sure how early.

	David

> 
> > 
> > I also noticed that map_pid_to_cmdline[] contains indexes into
> > pid_comm[], restricting these to 16bits would half the data area.  
> 
> Hmm, yeah, this could be useful, as it doesn't appear one could make
> saved_cmdline_size greater than 65536 (or even close to that).
> 
> -- Steve


      reply	other threads:[~2026-06-30 10:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-26 21:23 [PATCH rfc 0/2] Improvements to ftrace comm[] handling David Laight
2026-06-26 21:23 ` [PATCH 1/2] tracing: Embed 'char comm[16]' in a structure David Laight
2026-06-29 20:26   ` Steven Rostedt
2026-06-30  9:26     ` David Laight
2026-06-26 21:23 ` [PATCH 2/2] tracing: Keep pid and comm[] in the same structure David Laight
2026-06-29 20:49   ` Steven Rostedt
2026-06-30 10:01     ` David Laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260630110156.5314e2e6@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox