Re: [PATCH v6 2/6] sched_ext: Implement scx_bpf_now_ns()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Changwoo Min <multics69@gmail.com>
Cc: tj@kernel.org, void@manifault.com, mingo@redhat.com,
	peterz@infradead.org, changwoo@igalia.com, kernel-dev@igalia.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 2/6] sched_ext: Implement scx_bpf_now_ns()
Date: Fri, 20 Dec 2024 22:30:07 +0100	[thread overview]
Message-ID: <Z2Xh3x8CCLPardgm@gpd3> (raw)
In-Reply-To: <20241220062025.27724-3-changwoo@igalia.com>

Hi Changwoo,

On Fri, Dec 20, 2024 at 03:20:21PM +0900, Changwoo Min wrote:
...
> +/**
> + * scx_bpf_now_ns - Returns a high-performance monotonically non-decreasing
> + * clock for the current CPU. The clock returned is in nanoseconds.
> + *
> + * It provides the following properties:
> + *
> + * 1) High performance: Many BPF schedulers call bpf_ktime_get_ns() frequently
> + *  to account for execution time and track tasks' runtime properties.
> + *  Unfortunately, in some hardware platforms, bpf_ktime_get_ns() -- which
> + *  eventually reads a hardware timestamp counter -- is neither performant nor
> + *  scalable. scx_bpf_now_ns() aims to provide a high-performance clock by
> + *  using the rq clock in the scheduler core whenever possible.
> + *
> + * 2) High enough resolution for the BPF scheduler use cases: In most BPF
> + *  scheduler use cases, the required clock resolution is lower than the most
> + *  accurate hardware clock (e.g., rdtsc in x86). scx_bpf_now_ns() basically
> + *  uses the rq clock in the scheduler core whenever it is valid. It considers
> + *  that the rq clock is valid from the time the rq clock is updated
> + *  (update_rq_clock) until the rq is unlocked (rq_unpin_lock).
> + *
> + * 3) Monotonically wq	X`on-decreasing clock for the same CPU: scx_bpf_now_ns()
> + *  guarantees the clock never goes backward when comparing them in the same
> + *  CPU. On the other hand, when comparing clocks in different CPUs, there
> + *  is no such guarantee -- the clock can go backward. It provides a
> + *  monotonically *non-decreasing* clock so that it would provide the same
> + *  clock values in two different scx_bpf_now_ns() calls in the same CPU
> + *  during the same period of when the rq clock is valid.
> + */
> +__bpf_kfunc u64 scx_bpf_now_ns(void)
> +{
> +	struct rq *rq;
> +	u64 clock;
> +
> +	preempt_disable();
> +
> +	/*
> +	 * If the rq clock is valid, use the cached rq clock.
> +	 * Otherwise, return a fresh rq glock.

s/glock/clock/

> +	 *
> +	 * Note that scx_bpf_now_ns() is re-entrant between a process
> +	 * context and an interrupt context (e.g., timer interrupt).
> +	 * However, we don't need to consider the race between them
> +	 * because such race is not observable from a caller.
> +	 */
> +	rq = this_rq();
> +	clock = READ_ONCE(rq->scx.clock);
> +
> +	if (!(READ_ONCE(rq->scx.flags) & SCX_RQ_CLK_VALID)) {
> +		clock = sched_clock_cpu(cpu_of(rq));
> +
> +		/*
> +		 * The rq clock is updated outside of the rq lock.
> +		 * In this case, keep the updated rq clock invalid so the next
> +		 * kfunc call outside the rq lock gets a fresh rq clock.
> +		 */
> +		scx_rq_clock_update(rq, clock, false);
> +	}

I was wondering if we could use a special value for clock (like ~0ULL or
similar) to mark the clock as invalid.

This way, we could get rid of the extra READ_ONCE(rq->scx.flags) logic for
checking the clock validity. And if the actual clock happens to match the
special value, we'd simply re-read the TSC, which shouldn't be a big issue
in theory.

That said, I'm not sure if this would yield any real performance benefits,
so the current approach is probably fine as it is, therefore feel free to
ignore this.

-Andrea

next prev parent reply	other threads:[~2024-12-20 21:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20  6:20 [PATCH v6 0/6] sched_ext: Support high-performance monotonically non-decreasing clock Changwoo Min
2024-12-20  6:20 ` [PATCH v6 1/6] sched_ext: Relocate scx_enabled() related code Changwoo Min
2024-12-20  6:20 ` [PATCH v6 2/6] sched_ext: Implement scx_bpf_now_ns() Changwoo Min
2024-12-20 21:30   ` Andrea Righi [this message]
2024-12-22  4:32     ` Changwoo Min
2024-12-24 21:47   ` Tejun Heo
2024-12-27  0:38     ` Changwoo Min
2024-12-20  6:20 ` [PATCH v6 3/6] sched_ext: Add scx_bpf_now_ns() for BPF scheduler Changwoo Min
2024-12-20  6:20 ` [PATCH v6 4/6] sched_ext: Add time helpers for BPF schedulers Changwoo Min
2024-12-24 21:49   ` Tejun Heo
2024-12-27  0:32     ` Changwoo Min
2024-12-20  6:20 ` [PATCH v6 5/6] sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now_ns() Changwoo Min
2024-12-20  6:20 ` [PATCH v6 6/6] sched_ext: Use time helpers in BPF schedulers Changwoo Min
2024-12-20 22:29 ` [PATCH v6 0/6] sched_ext: Support high-performance monotonically non-decreasing clock Andrea Righi
2024-12-22  6:37   ` Changwoo Min

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2Xh3x8CCLPardgm@gpd3 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=multics69@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.