Re: [GIT PULL] sched_ext: Initial pull request for v6.11

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, David Vernet <void@manifault.com>,
	Ingo Molnar <mingo@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [GIT PULL] sched_ext: Initial pull request for v6.11
Date: Tue, 6 Aug 2024 23:10:02 +0200	[thread overview]
Message-ID: <20240806211002.GA37996@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <ZqmVG9ZiktN6bnm0@slm.duckdns.org>

On Tue, Jul 30, 2024 at 03:36:27PM -1000, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jul 24, 2024 at 10:52:21AM +0200, Peter Zijlstra wrote:
> ...
> > So pick_task() came from the SCHED_CORE crud, which does a remote pick
> > and as such isn't able to do a put -- remote is still running its
> > current etc.
> > 
> > So pick_task() *SHOULD* already be considering its current and pick
> > that if it is a better candidate than whatever is on the queue.
> > 
> > If we have a pick_task() that doesn't do that, it's a pre-existing bug
> > and needs fixing anyhow.
> 
> Right, I don't think it affects SCX in any significant way. Either way
> should be fine.

So I just looked at things. And considering we currently want to have:

  pick_next_task := pick_task() + set_next_task(.first = true)

and want to, with those other patches moving put_prev_task() around, get
to fully making pick_next_task() optional, it looks to me you're not
quite there yet. Notably:

> +static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool first)
> +{
> +	if (p->scx.flags & SCX_TASK_QUEUED) {
> +		/*
> +		 * Core-sched might decide to execute @p before it is
> +		 * dispatched. Call ops_dequeue() to notify the BPF scheduler.
> +		 */
> +		ops_dequeue(p, SCX_DEQ_CORE_SCHED_EXEC);
> +		dispatch_dequeue(rq, p);
> +	}
> +
> +	p->se.exec_start = rq_clock_task(rq);
> +
> +	/* see dequeue_task_scx() on why we skip when !QUEUED */
> +	if (SCX_HAS_OP(running) && (p->scx.flags & SCX_TASK_QUEUED))
> +		SCX_CALL_OP_TASK(SCX_KF_REST, running, p);
> +
> +	clr_task_runnable(p, true);
> +
> +	/*
> +	 * @p is getting newly scheduled or got kicked after someone updated its
> +	 * slice. Refresh whether tick can be stopped. See scx_can_stop_tick().
> +	 */
> +	if ((p->scx.slice == SCX_SLICE_INF) !=
> +	    (bool)(rq->scx.flags & SCX_RQ_CAN_STOP_TICK)) {
> +		if (p->scx.slice == SCX_SLICE_INF)
> +			rq->scx.flags |= SCX_RQ_CAN_STOP_TICK;
> +		else
> +			rq->scx.flags &= ~SCX_RQ_CAN_STOP_TICK;
> +
> +		sched_update_tick_dependency(rq);
> +
> +		/*
> +		 * For now, let's refresh the load_avgs just when transitioning
> +		 * in and out of nohz. In the future, we might want to add a
> +		 * mechanism which calls the following periodically on
> +		 * tick-stopped CPUs.
> +		 */
> +		update_other_load_avgs(rq);
> +	}
> +}

> +static struct task_struct *pick_next_task_scx(struct rq *rq)
> +{
> +	struct task_struct *p;
> +
> +#ifndef CONFIG_SMP
> +	/* UP workaround - see the comment at the head of put_prev_task_scx() */
> +	if (unlikely(rq->curr->sched_class != &ext_sched_class))
> +		balance_one(rq, rq->curr, true);
> +#endif

(should already be gone in your latest branch)

> +
> +	p = first_local_task(rq);
> +	if (!p)
> +		return NULL;
> +
> +	set_next_task_scx(rq, p, true);
> +
> +	if (unlikely(!p->scx.slice)) {
> +		if (!scx_ops_bypassing() && !scx_warned_zero_slice) {
> +			printk_deferred(KERN_WARNING "sched_ext: %s[%d] has zero slice in pick_next_task_scx()\n",
> +					p->comm, p->pid);
> +			scx_warned_zero_slice = true;
> +		}
> +		p->scx.slice = SCX_SLICE_DFL;
> +	}

This condition should probably move to set_next_task_scx(.first = true).

> +
> +	return p;
> +}

> +/**
> + * pick_task_scx - Pick a candidate task for core-sched
> + * @rq: rq to pick the candidate task from
> + *
> + * Core-sched calls this function on each SMT sibling to determine the next
> + * tasks to run on the SMT siblings. balance_one() has been called on all
> + * siblings and put_prev_task_scx() has been called only for the current CPU.
> + *
> + * As put_prev_task_scx() hasn't been called on remote CPUs, we can't just look
> + * at the first task in the local dsq. @rq->curr has to be considered explicitly
> + * to mimic %SCX_TASK_BAL_KEEP.
> + */
> +static struct task_struct *pick_task_scx(struct rq *rq)
> +{
> +	struct task_struct *curr = rq->curr;
> +	struct task_struct *first = first_local_task(rq);
> +
> +	if (curr->scx.flags & SCX_TASK_QUEUED) {
> +		/* is curr the only runnable task? */
> +		if (!first)
> +			return curr;
> +
> +		/*
> +		 * Does curr trump first? We can always go by core_sched_at for
> +		 * this comparison as it represents global FIFO ordering when
> +		 * the default core-sched ordering is used and local-DSQ FIFO
> +		 * ordering otherwise.
> +		 *
> +		 * We can have a task with an earlier timestamp on the DSQ. For
> +		 * example, when a current task is preempted by a sibling
> +		 * picking a different cookie, the task would be requeued at the
> +		 * head of the local DSQ with an earlier timestamp than the
> +		 * core-sched picked next task. Besides, the BPF scheduler may
> +		 * dispatch any tasks to the local DSQ anytime.
> +		 */
> +		if (curr->scx.slice && time_before64(curr->scx.core_sched_at,
> +						     first->scx.core_sched_at))
> +			return curr;
> +	}

And the above condition seems a little core_sched specific. Is that
suitable for the primary pick function?

> +
> +	return first;	/* this may be %NULL */
> +}

next prev parent reply	other threads:[~2024-08-06 21:10 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15 22:32 [GIT PULL] sched_ext: Initial pull request for v6.11 Tejun Heo
2024-07-23 16:33 ` Peter Zijlstra
2024-07-23 19:34   ` Tejun Heo
2024-07-24  8:52     ` Peter Zijlstra
2024-07-24 17:38       ` David Vernet
2024-07-31  1:36       ` Tejun Heo
2024-08-02 11:10         ` Peter Zijlstra
2024-08-02 16:09           ` Tejun Heo
2024-08-02 17:37             ` Peter Zijlstra
2024-08-06 21:10         ` Peter Zijlstra [this message]
2024-08-06 21:34           ` Tejun Heo
2024-08-06 21:55             ` Peter Zijlstra
2024-08-06 22:09               ` Tejun Heo
2024-08-10 20:45                 ` Peter Zijlstra
2024-08-13 19:14                   ` Tejun Heo
2024-08-13 22:53                     ` Peter Zijlstra
2024-08-21 23:08                       ` Tejun Heo
2024-08-06 19:56       ` Tejun Heo
2024-08-06 20:18         ` Peter Zijlstra
2024-08-06 20:20           ` Tejun Heo
2024-08-02 12:20   ` Peter Zijlstra
2024-08-02 18:47     ` Tejun Heo
2024-08-06  8:27       ` Peter Zijlstra
2024-08-06 19:17         ` Tejun Heo
2024-07-25  1:19 ` Qais Yousef
2024-07-30  9:04   ` Peter Zijlstra
2024-07-31  1:11     ` Tejun Heo
2024-07-31  1:22   ` Tejun Heo
2024-08-01 13:17     ` Qais Yousef
2024-08-01 16:36       ` Tejun Heo
2024-08-05  1:44         ` Qais Yousef
2024-08-01  2:50   ` Russell Haley
2024-08-01 15:52     ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240806211002.GA37996@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ast@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox