All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Changwoo Min <multics69@gmail.com>
Cc: tj@kernel.org, void@manifault.com, mingo@redhat.com,
	peterz@infradead.org, changwoo@igalia.com, kernel-dev@igalia.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/6] sched_ext: Support high-performance monotonically non-decreasing clock
Date: Mon, 9 Dec 2024 10:51:44 +0100	[thread overview]
Message-ID: <Z1a9sKyELhH-e4lJ@gpd3> (raw)
In-Reply-To: <20241209061531.257531-1-changwoo@igalia.com>

On Mon, Dec 09, 2024 at 03:15:25PM +0900, Changwoo Min wrote:
> Many BPF schedulers (such as scx_central, scx_lavd, scx_rusty, scx_bpfland,
> and scx_flash) frequently call bpf_ktime_get_ns() for tracking tasks' runtime
> properties. If supported, bpf_ktime_get_ns() eventually reads a hardware
> timestamp counter (TSC). However, reading a hardware TSC is not
> performant in some hardware platforms, degrading IPC.
> 
> This patchset addresses the performance problem of reading hardware TSC
> by leveraging the rq clock in the scheduler core, introducing a
> scx_bpf_now_ns() function for BPF schedulers. Whenever the rq clock
> is fresh and valid, scx_bpf_now_ns() provides the rq clock, which is
> already updated by the scheduler core (update_rq_clock), so it can reduce
> reading the hardware TSC.
> 
> When the rq lock is released (rq_unpin_lock), the rq clock is invalidated,
> so a subsequent scx_bpf_now_ns() call gets the fresh sched_clock for the caller.
> 
> In addition, scx_bpf_now_ns() guarantees the clock is monotonically
> non-decreasing for the same CPU, so the clock cannot go backward
> in the same CPU.
> 
> Using scx_bpf_now_ns() reduces the number of reading hardware TSC
> by 40-70% (65% for scx_lavd, 58% for scx_bpfland, and 43% for scx_rusty)
> for the following benchmark:
> 
>     perf bench -f simple sched messaging -t -g 20 -l 6000
> 
> The patchset begins by managing the status of rq clock in the scheduler
> core, then implementing scx_bpf_now_ns(), and finally applying it to the
> BPF schedulers.

I left a few comments, but overall it looks good to me. I also ran some
tests with this applied and a modified scx_bpfland to use the new
scx_bpf_now_ns(), no issue to report, therefore:

Acked-by: Andrea Righi <arighi@nvidia.com>

> 
> ChangwLog v3 -> v4:
>   - Separate the code relocation related to scx_enabled() into a
>     separate patch.
>   - Remove scx_rq_clock_stale() after (or before) ops.running() and
>     ops.update_idle() calls
>   - Rename scx_bpf_clock_get_ns() into scx_bpf_now_ns() and revise it to
>     address the comments
>   - Move the per-CPU variable holding a prev clock into scx_rq
>     (rq->scx.prev_clock)
>   - Add a comment describing when the clock could go backward in
>     scx_bpf_now_ns()
>   - Rebase the code to the tip of Tejun's sched_ext repo (for-next
>     branch)
> 
> ChangeLog v2 -> v3:
>   - To avoid unnecessarily modifying cache lines, scx_rq_clock_update()
>     and scx_rq_clock_stale() update the clock and flags only when a
>     sched_ext scheduler is enabled.
> 
> ChangeLog v1 -> v2:
>   - Rename SCX_RQ_CLK_UPDATED to SCX_RQ_CLK_VALID to denote the validity
>     of an rq clock clearly.
>   - Rearrange the clock and flags fields in struct scx_rq to make sure
>     they are in the same cacheline to minimize the cache misses 
>   - Add an additional explanation to the commit message in the 2/5 patch
>     describing when the rq clock will be reused with an example.
>   - Fix typos
>   - Rebase the code to the tip of Tejun's sched_ext repo
> 
> Changwoo Min (6):
>   sched_ext: Relocate scx_enabled() related code
>   sched_ext: Implement scx_rq_clock_update/stale()
>   sched_ext: Manage the validity of scx_rq_clock
>   sched_ext: Implement scx_bpf_now_ns()
>   sched_ext: Add scx_bpf_now_ns() for BPF scheduler
>   sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now_ns()
> 
>  kernel/sched/core.c                      |  6 +-
>  kernel/sched/ext.c                       | 73 ++++++++++++++++++++++++
>  kernel/sched/sched.h                     | 52 ++++++++++++-----
>  tools/sched_ext/include/scx/common.bpf.h |  1 +
>  tools/sched_ext/include/scx/compat.bpf.h |  5 ++
>  tools/sched_ext/scx_central.bpf.c        |  4 +-
>  tools/sched_ext/scx_flatcg.bpf.c         |  2 +-
>  7 files changed, 124 insertions(+), 19 deletions(-)
> 
> -- 
> 2.47.1
> 

-Andrea

      parent reply	other threads:[~2024-12-09  9:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-09  6:15 [PATCH v4 0/6] sched_ext: Support high-performance monotonically non-decreasing clock Changwoo Min
2024-12-09  6:15 ` [PATCH v4 1/6] sched_ext: Relocate scx_enabled() related code Changwoo Min
2024-12-11  7:27   ` Tejun Heo
2024-12-11  7:37     ` Tejun Heo
2024-12-09  6:15 ` [PATCH v4 2/6] sched_ext: Implement scx_rq_clock_update/stale() Changwoo Min
2024-12-09  9:40   ` Andrea Righi
2024-12-10  7:21     ` Changwoo Min
2024-12-11  7:43   ` Tejun Heo
2024-12-13  1:16     ` Changwoo Min
2024-12-09  6:15 ` [PATCH v4 3/6] sched_ext: Manage the validity of scx_rq_clock Changwoo Min
2024-12-09  6:15 ` [PATCH v4 4/6] sched_ext: Implement scx_bpf_now_ns() Changwoo Min
2024-12-11  8:14   ` Tejun Heo
2024-12-13  1:41     ` Changwoo Min
2024-12-11  9:32   ` Peter Zijlstra
2024-12-13  2:01     ` Changwoo Min
2024-12-09  6:15 ` [PATCH v4 5/6] sched_ext: Add scx_bpf_now_ns() for BPF scheduler Changwoo Min
2024-12-09  6:15 ` [PATCH v4 6/6] sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now_ns() Changwoo Min
2024-12-09  9:51 ` Andrea Righi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1a9sKyELhH-e4lJ@gpd3 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=multics69@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.