public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com,
	void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
	cgroups@vger.kernel.org, sched-ext@lists.linux.dev,
	liuwenfang@honor.com, tglx@linutronix.de
Subject: Re: [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED
Date: Thu, 11 Sep 2025 10:40:06 -1000	[thread overview]
Message-ID: <aMMzpnyx__ZgZGRc@slm.duckdns.org> (raw)
In-Reply-To: <20250911094240.GW3289052@noisy.programming.kicks-ass.net>

Hello,

On Thu, Sep 11, 2025 at 11:42:40AM +0200, Peter Zijlstra wrote:
...
> I didn't immediately see how to do that. Doesn't that
> list_for_each_entry_safe_reverse() rely on rq->lock to retain integrity?

Ah, sorry, I was thinking it was iterating scx_tasks list. Yes, as
implemented, it needs to hold rq lock throughout.

> Moreover, since the goal is to allow:
> 
>  __schedule()
>    lock(rq->lock);
>    next = pick_task() := pick_task_scx()
>      lock(dsq->lock);
>      p = some_dsq_task(dsq);
>      task_unlink_from_dsq(p, dsq);
>      set_task_cpu(p, cpu_of(rq));
>      move_task_to_local_dsq(p, ...);
>      return p;
> 
> without dropping rq->lock, by relying on dsq->lock to serialize things,
> I don't see how we can retain the runnable list at all.
>
> And at this point, I'm not sure I understand ext well enough to know
> what this bypass stuff does at all, let alone suggest means to
> re architect this.

Bypass mode is enabled when the kernel side can't trust the BPF scheduling
anymore and wants to fall back to dumb FIFO scheduling to guarantee forward
progress (e.g. so that we can switch back to fair).

It comes down to flipping scx_rq_bypassing() on, which makes scheduling
paths bypass most BPF parts and fall back to FIFO behavior, and then making
sure every thread is on FIFO behavior. The latter part is what the loop is
doing. It scans all currently runnable tasks and dequeues and re-enqueues
them. As scx_rq_bypass() is true at this point, if a task were queued on the
BPF side, the cycling takes it out of the BPF side and puts it on the
fallback FIFO queue.

If we want to get rid of the locking requirement:

- Walk scx_tasks list which is iterated with a cursor and allows dropping
  locks while iterating. However, on some hardware, there are cases where
  CPUs are extremely slowed down from BPF scheduler making bad decisions and
  causing a lot of sync cacheline pingponging across e.g. NUMA nodes. As
  scx_bypass() is what's supposed to extricate the system from this state,
  walking all tasks while going through each's locking probably isn't going
  to be great.

- We can update ->runnable_list iteration to allow dropping rq lock e.g.
  with a cursor based iteration. Maybe some code can be shared with
  scx_tasks iteration. Cycling through locks still isn't going to be great
  but here it's likely a lot fewer of them at least.

Neither option is great. Leave it as-is for now?

Thanks.

-- 
tejun

  reply	other threads:[~2025-09-11 20:40 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 15:44 [PATCH 00/14] sched: Support shared runqueue locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 01/14] sched: Employ sched_change guards Peter Zijlstra
2025-09-11  9:06   ` K Prateek Nayak
2025-09-11  9:55     ` Peter Zijlstra
2025-09-11 10:10       ` Peter Zijlstra
2025-09-11 10:37         ` K Prateek Nayak
2025-10-06 15:21   ` Shrikanth Hegde
2025-10-06 18:14     ` Peter Zijlstra
2025-10-07  5:12       ` Shrikanth Hegde
2025-10-07  9:34         ` Peter Zijlstra
2025-10-16  9:33       ` [tip: sched/core] sched: Mandate shared flags for sched_change tip-bot2 for Peter Zijlstra
2025-09-10 15:44 ` [PATCH 02/14] sched: Re-arrange the {EN,DE}QUEUE flags Peter Zijlstra
2025-09-10 15:44 ` [PATCH 03/14] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 04/14] sched: Cleanup sched_delayed handling for class switches Peter Zijlstra
2025-09-10 15:44 ` [PATCH 05/14] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2025-09-11  1:44   ` Tejun Heo
2025-09-10 15:44 ` [PATCH 06/14] sched: Fix migrate_disable_switch() locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 07/14] sched: Fix do_set_cpus_allowed() locking Peter Zijlstra
2025-10-30  0:12   ` Mark Brown
2025-10-30  9:07     ` Peter Zijlstra
2025-10-30 12:47       ` Mark Brown
2025-09-10 15:44 ` [PATCH 08/14] sched: Rename do_set_cpus_allowed() Peter Zijlstra
2025-09-10 15:44 ` [PATCH 09/14] sched: Make __do_set_cpus_allowed() use the sched_change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 10/14] sched: Add locking comments to sched_class methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 11/14] sched: Add flags to {put_prev,set_next}_task() methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 12/14] sched: Add shared runqueue locking to __task_rq_lock() Peter Zijlstra
2025-09-12  0:19   ` Tejun Heo
2025-09-12 11:54     ` Peter Zijlstra
2025-09-12 14:11       ` Peter Zijlstra
2025-09-12 17:56       ` Tejun Heo
2025-09-15  8:38         ` Peter Zijlstra
2025-09-16 22:29           ` Tejun Heo
2025-09-16 22:41             ` Tejun Heo
2025-09-25  8:35               ` Peter Zijlstra
2025-09-25 21:43                 ` Tejun Heo
2025-09-26  9:59                   ` Peter Zijlstra
2025-09-26 16:48                     ` Tejun Heo
2025-09-26 10:36                   ` Peter Zijlstra
2025-09-26 21:39                     ` Tejun Heo
2025-09-29 10:06                       ` Peter Zijlstra
2025-09-30 23:49                         ` Tejun Heo
2025-10-01 11:54                           ` Peter Zijlstra
2025-10-02 23:32                             ` Tejun Heo
2025-09-10 15:44 ` [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED Peter Zijlstra
2025-09-11  2:01   ` Tejun Heo
2025-09-11  9:42     ` Peter Zijlstra
2025-09-11 20:40       ` Tejun Heo [this message]
2025-09-12 14:19         ` Peter Zijlstra
2025-09-12 16:32           ` Tejun Heo
2025-09-13 22:32             ` Tejun Heo
2025-09-15  8:48               ` Peter Zijlstra
2025-09-25 13:10             ` Peter Zijlstra
2025-09-25 15:40               ` Tejun Heo
2025-09-25 15:53                 ` Peter Zijlstra
2025-09-25 18:44                   ` Tejun Heo
2025-09-10 15:44 ` [PATCH 14/14] sched/ext: Implement p->srq_lock support Peter Zijlstra
2025-09-10 16:07   ` Peter Zijlstra
2025-09-10 17:32 ` [PATCH 00/14] sched: Support shared runqueue locking Andrea Righi
2025-09-10 18:19   ` Peter Zijlstra
2025-09-10 18:35   ` Peter Zijlstra
2025-09-10 19:00     ` Andrea Righi
2025-09-11  9:58     ` Peter Zijlstra
2025-09-11 14:51       ` Andrea Righi
2025-09-11 14:00   ` Peter Zijlstra
2025-09-11 14:30     ` Peter Zijlstra
2025-09-11 14:48       ` Andrea Righi
2025-09-18 15:15 ` Christian Loehle
2025-09-25  9:00   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMMzpnyx__ZgZGRc@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuwenfang@honor.com \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox