All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com,
	void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
	cgroups@vger.kernel.org, sched-ext@lists.linux.dev,
	liuwenfang@honor.com, tglx@linutronix.de
Subject: Re: [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED
Date: Thu, 25 Sep 2025 15:10:25 +0200	[thread overview]
Message-ID: <20250925131025.GA4067720@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <aMRLIEtmcWc0XNmg@slm.duckdns.org>

On Fri, Sep 12, 2025 at 06:32:32AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Sep 12, 2025 at 04:19:04PM +0200, Peter Zijlstra wrote:
> ...
> > Ah, but I think we *have* to change it :/ The thing is that with the new
> > pick you can change 'rq' without holding the source rq->lock. So we
> > can't maintain this list.
> > 
> > Could something like so work?
> > 
> > 	scoped_guard (rcu) for_each_process_thread(g, p) {
> > 		if (p->flags & PF_EXITING || p->sched_class != ext_sched_class)
> > 			continue;
> > 
> > 		guard(task_rq_lock)(p);
> > 		scoped_guard (sched_change, p) {
> > 			/* no-op */
> > 		}
> > 	}	
> 
> Yeah, or I can make scx_tasks iteration smarter so that it can skip through
> the list for tasks which aren't runnable. As long as it doesn't do lock ops
> on every task, it should be fine. I think this is solvable one way or
> another. Let's continue in the other subthread.

Well, either this or scx_tasks iterator will result in lock ops for
every task, this is unavoidable if we want the normal p->pi_lock,
rq->lock (dsq->lock) taken for every sched_change caller.

I have the below which I would like to include in the series such that I
can clean up all that DEQUEUE_LOCKED stuff a bit, this being the only
sched_change that's 'weird'.

Added 'bonus' is of course one less user of the runnable_list.

(also, I have to note, for_each_cpu with preemption disabled is asking
for trouble, the enormous core count machines are no longer super
esoteric)

--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4817,6 +4817,7 @@ static void scx_bypass(bool bypass)
 {
 	static DEFINE_RAW_SPINLOCK(bypass_lock);
 	static unsigned long bypass_timestamp;
+	struct task_struct *g, *p;
 	struct scx_sched *sch;
 	unsigned long flags;
 	int cpu;
@@ -4849,16 +4850,16 @@ static void scx_bypass(bool bypass)
 	 * queued tasks are re-queued according to the new scx_rq_bypassing()
 	 * state. As an optimization, walk each rq's runnable_list instead of
 	 * the scx_tasks list.
-	 *
-	 * This function can't trust the scheduler and thus can't use
-	 * cpus_read_lock(). Walk all possible CPUs instead of online.
+	 */
+
+	/*
+	 * XXX online_mask is stable due to !preempt (per bypass_lock)
+	 * so could this be for_each_online_cpu() ?
 	 */
 	for_each_possible_cpu(cpu) {
 		struct rq *rq = cpu_rq(cpu);
-		struct task_struct *p, *n;
 
 		raw_spin_rq_lock(rq);
-
 		if (bypass) {
 			WARN_ON_ONCE(rq->scx.flags & SCX_RQ_BYPASSING);
 			rq->scx.flags |= SCX_RQ_BYPASSING;
@@ -4866,36 +4867,33 @@ static void scx_bypass(bool bypass)
 			WARN_ON_ONCE(!(rq->scx.flags & SCX_RQ_BYPASSING));
 			rq->scx.flags &= ~SCX_RQ_BYPASSING;
 		}
+		raw_spin_rq_unlock(rq);
+	}
+
+	/* implicit RCU section due to bypass_lock */
+	for_each_process_thread(g, p) {
+		unsigned int state;
 
-		/*
-		 * We need to guarantee that no tasks are on the BPF scheduler
-		 * while bypassing. Either we see enabled or the enable path
-		 * sees scx_rq_bypassing() before moving tasks to SCX.
-		 */
-		if (!scx_enabled()) {
-			raw_spin_rq_unlock(rq);
+		guard(raw_spinlock)(&p->pi_lock);
+		if (p->flags & PF_EXITING || p->sched_class != &ext_sched_class)
+			continue;
+
+		state = READ_ONCE(p->__state);
+		if (state != TASK_RUNNING && state != TASK_WAKING)
 			continue;
-		}
 
-		/*
-		 * The use of list_for_each_entry_safe_reverse() is required
-		 * because each task is going to be removed from and added back
-		 * to the runnable_list during iteration. Because they're added
-		 * to the tail of the list, safe reverse iteration can still
-		 * visit all nodes.
-		 */
-		list_for_each_entry_safe_reverse(p, n, &rq->scx.runnable_list,
-						 scx.runnable_node) {
-			/* cycling deq/enq is enough, see the function comment */
-			scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) {
-				/* nothing */ ;
-			}
+		guard(__task_rq_lock)(p);
+		scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) {
+			/* nothing */ ;
 		}
+	}
 
-		/* resched to restore ticks and idle state */
-		if (cpu_online(cpu) || cpu == smp_processor_id())
-			resched_curr(rq);
+	/* implicit !preempt section due to bypass_lock */
+	for_each_online_cpu(cpu) {
+		struct rq *rq = cpu_rq(cpu);
 
+		raw_spin_rq_lock(rq);
+		resched_curr(cpu_rq(cpu));
 		raw_spin_rq_unlock(rq);
 	}
 

  parent reply	other threads:[~2025-09-25 13:10 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 15:44 [PATCH 00/14] sched: Support shared runqueue locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 01/14] sched: Employ sched_change guards Peter Zijlstra
2025-09-11  9:06   ` K Prateek Nayak
2025-09-11  9:55     ` Peter Zijlstra
2025-09-11 10:10       ` Peter Zijlstra
2025-09-11 10:37         ` K Prateek Nayak
2025-10-06 15:21   ` Shrikanth Hegde
2025-10-06 18:14     ` Peter Zijlstra
2025-10-07  5:12       ` Shrikanth Hegde
2025-10-07  9:34         ` Peter Zijlstra
2025-10-16  9:33       ` [tip: sched/core] sched: Mandate shared flags for sched_change tip-bot2 for Peter Zijlstra
2025-09-10 15:44 ` [PATCH 02/14] sched: Re-arrange the {EN,DE}QUEUE flags Peter Zijlstra
2025-09-10 15:44 ` [PATCH 03/14] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 04/14] sched: Cleanup sched_delayed handling for class switches Peter Zijlstra
2025-09-10 15:44 ` [PATCH 05/14] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2025-09-11  1:44   ` Tejun Heo
2025-09-10 15:44 ` [PATCH 06/14] sched: Fix migrate_disable_switch() locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 07/14] sched: Fix do_set_cpus_allowed() locking Peter Zijlstra
2025-10-30  0:12   ` Mark Brown
2025-10-30  9:07     ` Peter Zijlstra
2025-10-30 12:47       ` Mark Brown
2025-09-10 15:44 ` [PATCH 08/14] sched: Rename do_set_cpus_allowed() Peter Zijlstra
2025-09-10 15:44 ` [PATCH 09/14] sched: Make __do_set_cpus_allowed() use the sched_change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 10/14] sched: Add locking comments to sched_class methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 11/14] sched: Add flags to {put_prev,set_next}_task() methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 12/14] sched: Add shared runqueue locking to __task_rq_lock() Peter Zijlstra
2025-09-12  0:19   ` Tejun Heo
2025-09-12 11:54     ` Peter Zijlstra
2025-09-12 14:11       ` Peter Zijlstra
2025-09-12 17:56       ` Tejun Heo
2025-09-15  8:38         ` Peter Zijlstra
2025-09-16 22:29           ` Tejun Heo
2025-09-16 22:41             ` Tejun Heo
2025-09-25  8:35               ` Peter Zijlstra
2025-09-25 21:43                 ` Tejun Heo
2025-09-26  9:59                   ` Peter Zijlstra
2025-09-26 16:48                     ` Tejun Heo
2025-09-26 10:36                   ` Peter Zijlstra
2025-09-26 21:39                     ` Tejun Heo
2025-09-29 10:06                       ` Peter Zijlstra
2025-09-30 23:49                         ` Tejun Heo
2025-10-01 11:54                           ` Peter Zijlstra
2025-10-02 23:32                             ` Tejun Heo
2025-09-10 15:44 ` [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED Peter Zijlstra
2025-09-11  2:01   ` Tejun Heo
2025-09-11  9:42     ` Peter Zijlstra
2025-09-11 20:40       ` Tejun Heo
2025-09-12 14:19         ` Peter Zijlstra
2025-09-12 16:32           ` Tejun Heo
2025-09-13 22:32             ` Tejun Heo
2025-09-15  8:48               ` Peter Zijlstra
2025-09-25 13:10             ` Peter Zijlstra [this message]
2025-09-25 15:40               ` Tejun Heo
2025-09-25 15:53                 ` Peter Zijlstra
2025-09-25 18:44                   ` Tejun Heo
2025-09-10 15:44 ` [PATCH 14/14] sched/ext: Implement p->srq_lock support Peter Zijlstra
2025-09-10 16:07   ` Peter Zijlstra
2025-09-10 17:32 ` [PATCH 00/14] sched: Support shared runqueue locking Andrea Righi
2025-09-10 18:19   ` Peter Zijlstra
2025-09-10 18:35   ` Peter Zijlstra
2025-09-10 19:00     ` Andrea Righi
2025-09-11  9:58     ` Peter Zijlstra
2025-09-11 14:51       ` Andrea Righi
2025-09-11 14:00   ` Peter Zijlstra
2025-09-11 14:30     ` Peter Zijlstra
2025-09-11 14:48       ` Andrea Righi
2025-09-18 15:15 ` Christian Loehle
2025-09-25  9:00   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250925131025.GA4067720@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuwenfang@honor.com \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.