All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	linux-kernel@vger.kernel.org, sched-ext@meta.com
Subject: Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks
Date: Fri, 7 Feb 2025 23:36:08 +0100	[thread overview]
Message-ID: <Z6aK2J4F7fnzugxs@gpd3> (raw)
In-Reply-To: <Z6Zz79AUub1YuHwD@slm.duckdns.org>

Hi Tejun,

On Fri, Feb 07, 2025 at 10:58:23AM -1000, Tejun Heo wrote:
> Migration disabled tasks are special and pinned to their previous CPUs. They
> tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may
> not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers
> by automatically dispatching them to the pinned local DSQs by default. If a
> BPF scheduler wants to handle migration disabled tasks explicitly, it can
> set SCX_OPS_ENQ_MIGRATION_DISABLED.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/ext.c |   23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -123,6 +123,19 @@ enum scx_ops_flags {
>  	SCX_OPS_SWITCH_PARTIAL	= 1LLU << 3,
>  
>  	/*
> +	 * A migration disabled task can only execute on its current CPU. By
> +	 * default, such tasks are automatically put on the CPU's local DSQ with
> +	 * the default slice on enqueue. If this ops flag is set, they also go
> +	 * through ops.enqueue().
> +	 *
> +	 * A migration disabled task never invokes ops.select_cpu() as it can
> +	 * only select the current CPU. Also, p->cpus_ptr will only contain its
> +	 * current CPU while p->nr_cpus_allowed keeps tracking p->user_cpus_ptr
> +	 * and thus may disagree with cpumask_weight(p->cpus_ptr).
> +	 */
> +	SCX_OPS_ENQ_MIGRATION_DISABLED = 1LLU << 4,
> +
> +	/*
>  	 * CPU cgroup support flags
>  	 */
>  	SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16,	/* cpu.weight */
> @@ -130,6 +143,7 @@ enum scx_ops_flags {
>  	SCX_OPS_ALL_FLAGS	= SCX_OPS_KEEP_BUILTIN_IDLE |
>  				  SCX_OPS_ENQ_LAST |
>  				  SCX_OPS_ENQ_EXITING |
> +				  SCX_OPS_ENQ_MIGRATION_DISABLED |
>  				  SCX_OPS_SWITCH_PARTIAL |
>  				  SCX_OPS_HAS_CGROUP_WEIGHT,
>  };
> @@ -882,6 +896,7 @@ static bool scx_warned_zero_slice;
>  
>  static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last);
>  static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting);
> +static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_migration_disabled);
>  static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt);
>  static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
>  
> @@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
>  	    unlikely(p->flags & PF_EXITING))
>  		goto local;
>  
> +	/* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
> +	if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
> +	    is_migration_disabled(p))
> +		goto local;

Maybe not in this patch set, but it'd be nice to have an event counter for
this, as skipping ops.enqueue() might introduce latency issues. Having a
feedback could help to determine if we need to enable
SCX_OPS_ENQ_MIGRATION_DISABLED in some schedulers.

I'm also a bit conflicted if the default should be on or off, we're
changing the previous behavior, but OTOH this is going to prevent some
potential breakage (due to the nr_cpus_allowed mismatch) and server
workload is going to benefit from this, so it seems that there are more
pros than cons at dispatching migration_disabled tasks directly by default.

And I also did a quick test with this and it seems good, so:

Acked-by: Andrea Righi <arighi@nvidia.com>

-Andrea

  parent reply	other threads:[~2025-02-07 22:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-07 20:58 [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Tejun Heo
2025-02-07 20:59 ` [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches Tejun Heo
2025-02-09  6:33   ` Tejun Heo
2025-02-07 22:36 ` Andrea Righi [this message]
2025-02-07 22:44   ` [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Tejun Heo
2025-02-07 22:59     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6aK2J4F7fnzugxs@gpd3 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@meta.com \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.