Re: [PATCH v8 2/6] sched_ext: Implement scx_bpf_now()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Changwoo Min <changwoo@igalia.com>
Cc: tj@kernel.org, void@manifault.com, arighi@nvidia.com,
	mingo@redhat.com, kernel-dev@igalia.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v8 2/6] sched_ext: Implement scx_bpf_now()
Date: Fri, 10 Jan 2025 09:31:02 +0100	[thread overview]
Message-ID: <20250110083102.GA4213@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250109131456.7055-3-changwoo@igalia.com>

On Thu, Jan 09, 2025 at 10:14:52PM +0900, Changwoo Min wrote:
> Returns a high-performance monotonically non-decreasing clock for the current
> CPU. The clock returned is in nanoseconds.
> 
> It provides the following properties:
> 
> 1) High performance: Many BPF schedulers call bpf_ktime_get_ns() frequently
>  to account for execution time and track tasks' runtime properties.
>  Unfortunately, in some hardware platforms, bpf_ktime_get_ns() -- which
>  eventually reads a hardware timestamp counter -- is neither performant nor
>  scalable. scx_bpf_now() aims to provide a high-performance clock by
>  using the rq clock in the scheduler core whenever possible.
> 
> 2) High enough resolution for the BPF scheduler use cases: In most BPF
>  scheduler use cases, the required clock resolution is lower than the most
>  accurate hardware clock (e.g., rdtsc in x86). scx_bpf_now() basically
>  uses the rq clock in the scheduler core whenever it is valid. It considers
>  that the rq clock is valid from the time the rq clock is updated
>  (update_rq_clock) until the rq is unlocked (rq_unpin_lock).
> 
> 3) Monotonically non-decreasing clock for the same CPU: scx_bpf_now()
>  guarantees the clock never goes backward when comparing them in the same
>  CPU. On the other hand, when comparing clocks in different CPUs, there
>  is no such guarantee -- the clock can go backward. It provides a
>  monotonically *non-decreasing* clock so that it would provide the same
>  clock values in two different scx_bpf_now() calls in the same CPU
>  during the same period of when the rq clock is valid.
> 
> An rq clock becomes valid when it is updated using update_rq_clock()
> and invalidated when the rq is unlocked using rq_unpin_lock().
> 
> Let's suppose the following timeline in the scheduler core:
> 
>    T1. rq_lock(rq)
>    T2. update_rq_clock(rq)
>    T3. a sched_ext BPF operation
>    T4. rq_unlock(rq)
>    T5. a sched_ext BPF operation
>    T6. rq_lock(rq)
>    T7. update_rq_clock(rq)
> 
> For [T2, T4), we consider that rq clock is valid (SCX_RQ_CLK_VALID is
> set), so scx_bpf_now() calls during [T2, T4) (including T3) will
> return the rq clock updated at T2. For duration [T4, T7), when a BPF
> scheduler can still call scx_bpf_now() (T5), we consider the rq clock
> is invalid (SCX_RQ_CLK_VALID is unset at T4). So when calling
> scx_bpf_now() at T5, we will return a fresh clock value by calling
> sched_clock_cpu() internally. Also, to prevent getting outdated rq clocks
> from a previous scx scheduler, invalidate all the rq clocks when unloading
> a BPF scheduler.
> 
> One example of calling scx_bpf_now(), when the rq clock is invalid
> (like T5), is in scx_central [1]. The scx_central scheduler uses a BPF
> timer for preemptive scheduling. In every msec, the timer callback checks
> if the currently running tasks exceed their timeslice. At the beginning of
> the BPF timer callback (central_timerfn in scx_central.bpf.c), scx_central
> gets the current time. When the BPF timer callback runs, the rq clock could
> be invalid, the same as T5. In this case, scx_bpf_now() returns a fresh
> clock value rather than returning the old one (T2).
> 
> [1] https://github.com/sched-ext/scx/blob/main/scheds/c/scx_central.bpf.c
> 
> Signed-off-by: Changwoo Min <changwoo@igalia.com>

This one looks good, thanks!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

next prev parent reply	other threads:[~2025-01-10  8:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 13:14 [PATCH v8 0/6] sched_ext: Support high-performance monotonically non-decreasing clock Changwoo Min
2025-01-09 13:14 ` [PATCH v8 1/6] sched_ext: Relocate scx_enabled() related code Changwoo Min
2025-01-09 13:14 ` [PATCH v8 2/6] sched_ext: Implement scx_bpf_now() Changwoo Min
2025-01-10  8:31   ` Peter Zijlstra [this message]
2025-01-09 13:14 ` [PATCH v8 3/6] sched_ext: Add scx_bpf_now() for BPF scheduler Changwoo Min
2025-01-09 13:14 ` [PATCH v8 4/6] sched_ext: Add time helpers for BPF schedulers Changwoo Min
2025-01-09 13:14 ` [PATCH v8 5/6] sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now() Changwoo Min
2025-01-09 13:14 ` [PATCH v8 6/6] sched_ext: Use time helpers in BPF schedulers Changwoo Min
2025-01-10  9:08 ` [PATCH v8 0/6] sched_ext: Support high-performance monotonically non-decreasing clock Andrea Righi
2025-01-10 18:30 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250110083102.GA4213@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.