All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <ziqianlu@bytedance.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>,
	Ben Segall <bsegall@google.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Xi Wang <xii@google.com>,
	linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Florian Bezdeka <florian.bezdeka@siemens.com>,
	Paul McKenney <paulmck@kernel.org>
Subject: Re: [PATCH 2/7] sched/fair: prepare throttle path for task based throttle
Date: Fri, 23 May 2025 19:17:22 +0800	[thread overview]
Message-ID: <20250523111016.GA1240558@bytedance> (raw)
In-Reply-To: <20250523105222.GJ24938@noisy.programming.kicks-ass.net>

On Fri, May 23, 2025 at 12:52:22PM +0200, Peter Zijlstra wrote:
> On Fri, May 23, 2025 at 05:53:50PM +0800, Aaron Lu wrote:
> > On Thu, May 22, 2025 at 08:40:02PM +0800, Aaron Lu wrote:
> > > On Thu, May 22, 2025 at 01:54:18PM +0200, Peter Zijlstra wrote:
> > > > On Thu, May 22, 2025 at 07:44:55PM +0800, Aaron Lu wrote:
> > > > > On Thu, May 22, 2025 at 12:48:43PM +0200, Peter Zijlstra wrote:
> > > > > > On Tue, May 20, 2025 at 06:41:05PM +0800, Aaron Lu wrote:
> > > > > > 
> > > > > > >  static void throttle_cfs_rq_work(struct callback_head *work)
> > > > > > >  {
> > > > > > > +	struct task_struct *p = container_of(work, struct task_struct, sched_throttle_work);
> > > > > > > +	struct sched_entity *se;
> > > > > > > +	struct cfs_rq *cfs_rq;
> > > > > > > +	struct rq *rq;
> > > > > > > +
> > > > > > > +	WARN_ON_ONCE(p != current);
> > > > > > > +	p->sched_throttle_work.next = &p->sched_throttle_work;
> > > > > > > +
> > > > > > > +	/*
> > > > > > > +	 * If task is exiting, then there won't be a return to userspace, so we
> > > > > > > +	 * don't have to bother with any of this.
> > > > > > > +	 */
> > > > > > > +	if ((p->flags & PF_EXITING))
> > > > > > > +		return;
> > > > > > > +
> > > > > > > +	scoped_guard(task_rq_lock, p) {
> > > > > > > +		se = &p->se;
> > > > > > > +		cfs_rq = cfs_rq_of(se);
> > > > > > > +
> > > > > > > +		/* Raced, forget */
> > > > > > > +		if (p->sched_class != &fair_sched_class)
> > > > > > > +			return;
> > > > > > > +
> > > > > > > +		/*
> > > > > > > +		 * If not in limbo, then either replenish has happened or this
> > > > > > > +		 * task got migrated out of the throttled cfs_rq, move along.
> > > > > > > +		 */
> > > > > > > +		if (!cfs_rq->throttle_count)
> > > > > > > +			return;
> > > > > > > +		rq = scope.rq;
> > > > > > > +		update_rq_clock(rq);
> > > > > > > +		WARN_ON_ONCE(!list_empty(&p->throttle_node));
> > > > > > > +		dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
> > > > > > > +		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> > > > > > > +		resched_curr(rq);
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	cond_resched_tasks_rcu_qs();
> > > > > > >  }
> > > > > > 
> > > > > > What's that cond_resched thing about? The general plan is to make
> > > > > > cond_resched go away.
> > > > > 
> > > > > Got it.
> > > > > 
> > > > > The purpose is to let throttled task schedule and also mark a task rcu
> > > > > quiescent state. Without this cond_resched_tasks_rcu_qs(), this task
> > > > > will be scheduled by cond_resched() in task_work_run() and since that is
> > > > > a preempt schedule, it didn't mark a task rcu quiescent state.
> > > > > 
> > > > > Any suggestion here? Perhaps a plain schedule()? Thanks.
> > > > 
> > > > I am confused, this is task_work_run(), that is ran from
> > > > exit_to_user_mode_loop(), which contains a schedule().
> > >
> > 
> > I should probably have added that the schedule() call contained in
> > exit_to_user_mode_loop() is early in that loop, where the to-be-throttled
> > task doesn't have need_resched bit set yet.
> 
> No, but if it does get set, it will get picked up at:
> 
> 	ti_work = read_thread_flags();
> 
> and since TIF_NEED_RESCHED is part of EXIT_TO_USER_MODE_WORK, we'll get
> another cycle, and do the schedule() thing.
> 
> > > There is a cond_resched() in task_work_run() loop:
> > > 
> > > 		do {
> > > 			next = work->next;
> > > 			work->func(work);
> > > 			work = next;
> > > 			cond_resched();
> > > 		} while (work);
> 
> That cond_resched() is equally going away.

Good to know this.

As long as this cond_resched() goes away, there is no need for an
explicite schedule() or any of its other forms in this throttle task
work.

> > > And when this throttle work returns with need_resched bit set,
> > > cond_resched() will cause a schedule but that didn't mark a task
> > > quiescent state...
> > 
> > Another approach I can think of is to add a test of task_is_throttled()
> > in rcu_tasks_is_holdout(). I remembered when I tried this before, I can
> > hit the following path:
> 
> So this really is about task_rcu needing something? Let me go look at
> task-rcu.

Yes. I found this problem when using bpftrace to profile something and
bpftrace couldn't start untill the test is finished :)

I'm assuming bpftrace need those throttled tasks properly mark a qs
state. With this change here, bpftrace can start normally when the test
is running.

> 
> So AFAICT, exit_to_user_mode_loop() will do schedule(), which will call
> __schedule(SM_NONE), which then will have preempt = false and call:
> rcu_note_context_switch(false) which in turn will do:
> rcu_task_rq(current, false).
> 
> This should be sufficient, no?

Yes, as long as that cond_resched() in task_work_run() loop is gone.

I'll also give it a test and will let you know if I find anything
unexpected.

Thanks,
Aaron

  reply	other threads:[~2025-05-23 11:17 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 10:41 [PATCH 0/7] Defer throttle when task exits to user Aaron Lu
2025-05-20 10:41 ` [PATCH 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-05-21  8:48   ` Chengming Zhou
2025-05-20 10:41 ` [PATCH 2/7] sched/fair: prepare throttle path " Aaron Lu
2025-05-20 12:02   ` Florian Bezdeka
2025-05-21  6:37     ` Aaron Lu
2025-05-21 11:51       ` Aaron Lu
2025-05-21  9:01   ` Chengming Zhou
2025-05-21  9:21     ` [External] " Aaron Lu
2025-05-22 11:43       ` Chengming Zhou
2025-05-23  8:03         ` Aaron Lu
2025-05-22 10:48   ` Peter Zijlstra
2025-05-22 11:44     ` Aaron Lu
2025-05-22 11:54       ` Peter Zijlstra
2025-05-22 12:40         ` Aaron Lu
2025-05-23  9:53           ` Aaron Lu
2025-05-23 10:52             ` Peter Zijlstra
2025-05-23 11:17               ` Aaron Lu [this message]
2025-05-22 11:07   ` Peter Zijlstra
2025-05-23  7:40     ` Aaron Lu
2025-05-29 11:51       ` Aaron Lu
2025-05-30  5:36         ` K Prateek Nayak
2025-05-30 11:02           ` Aaron Lu
2025-05-23 12:35   ` Peter Zijlstra
2025-05-20 10:41 ` [PATCH 3/7] sched/fair: prepare unthrottle " Aaron Lu
2025-05-20 10:41 ` [PATCH 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-05-22 12:03   ` Peter Zijlstra
2025-05-22 12:49     ` Aaron Lu
2025-05-23 14:59       ` Peter Zijlstra
2025-05-26 11:36         ` Aaron Lu
2025-05-27  6:58           ` Aaron Lu
2025-05-27 11:19             ` K Prateek Nayak
2025-05-27 11:54               ` Aaron Lu
2025-05-27 14:16                 ` K Prateek Nayak
2025-05-23  2:43   ` Chengming Zhou
2025-05-23  7:56     ` Aaron Lu
2025-05-23  9:13       ` Chengming Zhou
2025-05-23  9:42         ` Aaron Lu
2025-05-23  9:53           ` Chengming Zhou
2025-05-23 11:59             ` Aaron Lu
2025-05-26 13:14               ` Chengming Zhou
2025-05-20 10:41 ` [PATCH 5/7] sched/fair: switch to task based throttle model Aaron Lu
2025-05-20 10:41 ` [PATCH 6/7] sched/fair: task based throttle time accounting Aaron Lu
2025-05-20 10:41 ` [PATCH 7/7] sched/fair: get rid of throttled_lb_pair() Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250523111016.GA1240558@bytedance \
    --to=ziqianlu@bytedance.com \
    --cc=bsegall@google.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xii@google.com \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.