public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Dmitry Ilvokhin <d@ilvokhin.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND] sched/stats: Optimize /proc/schedstat printing
Date: Wed, 29 Oct 2025 15:55:13 +0100	[thread overview]
Message-ID: <20251029145513.GO3245006@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <aQIoySXrIVcKXXGS@shell.ilvokhin.com>

On Wed, Oct 29, 2025 at 02:46:33PM +0000, Dmitry Ilvokhin wrote:
> On Wed, Oct 29, 2025 at 03:07:55PM +0100, Peter Zijlstra wrote:
> > On Wed, Oct 29, 2025 at 01:07:15PM +0000, Dmitry Ilvokhin wrote:
> > > Function seq_printf supports rich format string for decimals printing,
> > > but there is no need for it in /proc/schedstat, since majority of the
> > > data is space separared decimals. Use seq_put_decimal_ull instead as
> > > faster alternative.
> > > 
> > > Performance counter stats (truncated) for sh -c 'cat /proc/schedstat >
> > > /dev/null' before and after applying the patch from machine with 72 CPUs
> > > are below.
> > > 
> > > Before:
> > > 
> > >       2.94 msec task-clock               #    0.820 CPUs utilized
> > >          1      context-switches         #  340.551 /sec
> > >          0      cpu-migrations           #    0.000 /sec
> > >        340      page-faults              #  115.787 K/sec
> > > 10,327,200      instructions             #    1.89  insn per cycle
> > >                                          #    0.10  stalled cycles per insn
> > >  5,458,307      cycles                   #    1.859 GHz
> > >  1,052,733      stalled-cycles-frontend  #   19.29% frontend cycles idle
> > >  2,066,321      branches                 #  703.687 M/sec
> > >     25,621      branch-misses            #    1.24% of all branches
> > > 
> > > 0.00357974 +- 0.00000209 seconds time elapsed  ( +-  0.06% )
> > > 
> > > After:
> > > 
> > >       2.50 msec task-clock              #    0.785 CPUs utilized
> > >          1      context-switches        #  399.780 /sec
> > >          0      cpu-migrations          #    0.000 /sec
> > >        340      page-faults             #  135.925 K/sec
> > >  7,371,867      instructions            #    1.59  insn per cycle
> > >                                         #    0.13  stalled cycles per insn
> > >  4,647,053      cycles                  #    1.858 GHz
> > >    986,487      stalled-cycles-frontend #   21.23% frontend cycles idle
> > >  1,591,374      branches                #  636.199 M/sec
> > >     28,973      branch-misses           #    1.82% of all branches
> > > 
> > > 0.00318461 +- 0.00000295 seconds time elapsed  ( +-  0.09% )
> > > 
> > > This is ~11% (relative) improvement in time elapsed.
> > 
> > Yeah, but who cares? Why do we want less obvious code for a silly stats
> > file?
> 
> Thanks for the feedback, Peter.
> 
> Fair point that /proc/schedstat isn’t a hot path in the kernel itself,
> but it is a hot path for monitoring software (Prometheus for example).

Aliens! I like Xenomorphs :-) But I doubt that's what you're talking
about.

> In large fleets, these files are polled periodically (often every few
> seconds) on every machine. The cumulative overhead adds up quickly
> across thousands of nodes, so reducing the cost of generating these
> stats does have a measurable operational impact. With the ongoing trend
> toward higher core counts per machine, this cost becomes even more
> noticeable over time.
> 
> I've tried to keep the code as readable as possible, but I understand if
> you think an ~11% improvement isn't worth the added complexity. If you
> have suggestions for making the code cleaner or the intent clearer, I’d
> be happy to rework it.

What are they doing this for? I would much rather rework all this such
that all the schedstat crap becomes tracepoints and all the existing
cruft optional consumers of that.

Like I argued here:

  https://lkml.kernel.org/r/20250703141800.GX1613200@noisy.programming.kicks-ass.net

Then people can consume them however makes most sense, ideally with a
binary interface if it is high bandwidth.

  reply	other threads:[~2025-10-29 14:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 13:07 [PATCH RESEND] sched/stats: Optimize /proc/schedstat printing Dmitry Ilvokhin
2025-10-29 14:07 ` Peter Zijlstra
2025-10-29 14:46   ` Dmitry Ilvokhin
2025-10-29 14:55     ` Peter Zijlstra [this message]
2025-10-29 15:49       ` Dmitry Ilvokhin
2025-11-05 15:04         ` Dmitry Ilvokhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251029145513.GO3245006@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=d@ilvokhin.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox