public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <andrea.righi@linux.dev>
To: Tejun Heo <tj@kernel.org>
Cc: void@manifault.com, kernel-team@meta.com,
	linux-kernel@vger.kernel.org,
	Daniel Hodges <hodges.daniel.scott@gmail.com>,
	Changwoo Min <multics69@gmail.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>
Subject: Re: [PATCH 10/11] sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()
Date: Sat, 31 Aug 2024 23:15:01 +0200	[thread overview]
Message-ID: <ZtOH1YlEgyP45UkU@gpd3> (raw)
In-Reply-To: <ZtNC6l9nUEPnneag@slm.duckdns.org>

On Sat, Aug 31, 2024 at 06:20:58AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Sat, Aug 31, 2024 at 04:30:57PM +0200, Andrea Righi wrote:
> ...
> > > @@ -5511,7 +5516,7 @@ __bpf_kfunc void scx_bpf_dispatch(struct task_struct *p, u64 dsq_id, u64 slice,
> > >   * scx_bpf_dispatch_vtime - Dispatch a task into the vtime priority queue of a DSQ
> > >   * @p: task_struct to dispatch
> > >   * @dsq_id: DSQ to dispatch to
> > > - * @slice: duration @p can run for in nsecs
> > > + * @slice: duration @p can run for in nsecs, 0 to keep the current value
> > >   * @vtime: @p's ordering inside the vtime-sorted queue of the target DSQ
> > 
> > Maybe allow to keep the current vtime if 0 is passed, similar to slice?
> 
> It's tricky as 0 is a valid vtime. It's unlikely but depending on how vtime
> is defined, it may wrap in a practical amount of time. More on this below.

Ok.

> 
> ...
> > > +	/*
> > > +	 * Can be called from either ops.dispatch() locking this_rq() or any
> > > +	 * context where no rq lock is held. If latter, lock @p's task_rq which
> > > +	 * we'll likely need anyway.
> > > +	 */
> > 
> > About locking, I was wondering if we could provide a similar API
> > (scx_bpf_dispatch_lock()?) to use scx_bpf_dispatch() from any context
> > and not necessarily from ops.select_cpu() / ops.enqueue() or
> > ops.dispatch().
> > 
> > This would be really useful for user-space schedulers, since we could
> > use scx_bpf_dispatch() directly and get rid of the
> > BPF_MAP_TYPE_RINGBUFFER complexity.
> 
> One difference between scx_bpf_dispatch() and scx_bpf_dispatch_from_dsq() is
> that the former is designed to be safe to call from any context under any
> locks by doing the actual dispatches asynchronously. This is primarily to
> allow scx_bpf_dispatch() to be called under BPF locks as they are used to
> transfer the ownership of tasks from the BPF side to the kernel side. This
> makes it more difficult to make scx_bpf_dispatch() more flexible. The way
> BPF locks are currently developing, we might not have to worry about killing
> the system through deadlocks but it'd still be very prone to soft deadlocks
> that kill the BPF scheduler if implemented synchronously. Maybe the solution
> here is bouncing to an irq_work or something. I'll think more on it.

Got it. Well, the idea was to reduce complexity in the user-space
schedulers, but if we need to increase complexity in the kernel to do
so, probably it's not a good idea.

Moreover, using the BPF_MAP_TYPE_RINGBUFFER is really fast now, the
overhead is pretty close to zero, so maybe we can keep this as a low
priority todo.

> 
> ...
> > > +__bpf_kfunc bool scx_bpf_dispatch_from_dsq(struct bpf_iter_scx_dsq *it__iter,
> > > +					   struct task_struct *p, u64 dsq_id,
> > > +					   u64 slice, u64 enq_flags)
> > > +{
> > > +	return scx_dispatch_from_dsq((struct bpf_iter_scx_dsq_kern *)it__iter,
> > > +				     p, dsq_id, slice, 0, enq_flags);
> > > +}
> > > +
> > > +/**
> > > + * scx_bpf_dispatch_vtime_from_dsq - Move a task from DSQ iteration to a PRIQ DSQ
> > > + * @it__iter: DSQ iterator in progress
> > > + * @p: task to transfer
> > > + * @dsq_id: DSQ to move @p to
> > > + * @slice: duration @p can run for in nsecs, 0 to keep the current value
> > > + * @vtime: @p's ordering inside the vtime-sorted queue of the target DSQ
> > > + * @enq_flags: SCX_ENQ_*
> > 
> > Hm... can we pass 6 arguments to a kfunc? I think we're limited to 5,
> > unless I'm missing something here.
> 
> Hah, I actually don't know and didn't test the vtime variant. Maybe I should
> just drop the @slice and @vtime. They can be set by the caller explicitly
> before calling these kfuncs anyway although there are some concerns around
> ownership (ie. the caller can't be sure that the task has already been
> dispatched by someone else before scx_bpf_dispatch_from_dsq() commits). Or
> maybe I should pack the optional arguments into a struct. I'll think more
> about it.

IMHO we can simply drop them, introducing a separate struct makes the
API a bit inconsistent with scx_bpf_dispatch() (and I don't think we
want to change also scx_bpf_dispatch() for that).

About the ownership, true... maybe we can accept a bit of fuzziness
in this case, also considering that this race can happen only when using
scx_bpf_dispatch_from_dsq().

Thanks,
-Andrea

  reply	other threads:[~2024-08-31 21:15 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 11:03 [PATCHSET sched_ext/for-6.12] sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq() Tejun Heo
2024-08-30 11:03 ` [PATCH 01/11] sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate Tejun Heo
2024-08-30 17:45   ` David Vernet
2024-08-30 11:03 ` [PATCH 02/11] sched_ext: Refactor consume_remote_task() Tejun Heo
2024-08-31  4:05   ` David Vernet
2024-08-31  5:33     ` Tejun Heo
2024-08-31 23:40       ` David Vernet
2024-08-30 11:03 ` [PATCH 03/11] sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON Tejun Heo
2024-09-01  0:11   ` David Vernet
2024-08-30 11:03 ` [PATCH 04/11] sched_ext: Make dispatch_to_local_dsq() return void Tejun Heo
2024-08-30 17:44   ` [PATCH 04/11] sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling Tejun Heo
2024-09-01  0:53     ` David Vernet
2024-09-01  0:56       ` David Vernet
2024-09-01  8:03         ` Tejun Heo
2024-09-01 15:35           ` David Vernet
2024-08-30 11:03 ` [PATCH 05/11] sched_ext: Restructure dispatch_to_local_dsq() Tejun Heo
2024-09-01  1:09   ` David Vernet
2024-08-30 11:03 ` [PATCH 06/11] sched_ext: Reorder args for consume_local/remote_task() Tejun Heo
2024-09-01  1:40   ` David Vernet
2024-09-01  6:37     ` Tejun Heo
2024-08-30 11:03 ` [PATCH 07/11] sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq() Tejun Heo
2024-09-01  1:42   ` David Vernet
2024-08-30 11:03 ` [PATCH 08/11] sched_ext: Move consume_local_task() upward Tejun Heo
2024-09-01  1:43   ` David Vernet
2024-08-30 11:03 ` [PATCH 09/11] sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq() Tejun Heo
2024-09-01  1:55   ` David Vernet
2024-08-30 11:03 ` [PATCH 10/11] sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq() Tejun Heo
2024-08-31 14:30   ` Andrea Righi
2024-08-31 16:20     ` Tejun Heo
2024-08-31 21:15       ` Andrea Righi [this message]
2024-09-02  1:53         ` Changwoo Min
2024-09-02  5:59           ` Tejun Heo
2024-08-30 11:03 ` [PATCH 11/11] scx_qmap: Implement highpri boosting Tejun Heo
2024-08-30 20:59   ` [PATCH v2 " Tejun Heo
2024-08-30 17:31 ` [PATCHSET sched_ext/for-6.12] sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq() Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZtOH1YlEgyP45UkU@gpd3 \
    --to=andrea.righi@linux.dev \
    --cc=hodges.daniel.scott@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=multics69@gmail.com \
    --cc=schatzberg.dan@gmail.com \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox