Re: [PATCH V2 1/3] Sched: Scheduler time slice extension

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH V2 1/3] Sched: Scheduler time slice extension
Date: Fri, 25 Apr 2025 08:53:57 +0200	[thread overview]
Message-ID: <20250425065357.eiwSSvff@linutronix.de> (raw)
In-Reply-To: <DE11FCAF-4686-44AC-82AD-F0672FE450E1@oracle.com>

On 2025-04-25 01:19:07 [+0000], Prakash Sangappa wrote:
> > On Apr 24, 2025, at 7:13 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > 
> > On 2025-04-18 19:34:08 [+0000], Prakash Sangappa wrote:
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> > …
> >> @@ -930,6 +931,9 @@ struct task_struct {
> >> struct plist_node pushable_tasks;
> >> struct rb_node pushable_dl_tasks;
> >> #endif
> >> +#ifdef CONFIG_RSEQ
> >> + unsigned rseq_sched_delay:1;
> >> +#endif
> > 
> > There should be somewhere a bitfield already which you could use without
> > the ifdef. Then you could use IS_ENABLED() if you want to save some code
> > if RSEQ is not enabled.
> 
> I suppose we could. 
> Patch 1 is pretty much what PeterZ posted, hope he will comment on it.

If it is "pretty much what PeterZ posted" why did he not receive any
credit for it?

> Could it be moved below here, call it sched_time_delay, or some variant of this name?

I don't mind the name. The point is to add to an existing group instead
of starting a new u32 bit field.

> struct task_struct {
> ..
> #ifdef CONFIG_TASK_DELAY_ACCT
>         /* delay due to memory thrashing */
>         unsigned                        in_thrashing:1;
> #endif
>         unsigned                        sched_time_delay:1;
> ..
> }
> 
> This field will be for scheduler time extension use only. Mainly updated in the context of the current thread. 
> Do we even need to use IS_ENABLED(CONFIG_RSEQ) to access?

Well, if you want to avoid the code in the !CONFIG_RSEQ then yes.

> >> struct mm_struct *mm;
> >> struct mm_struct *active_mm;
> >> --- a/include/uapi/linux/rseq.h
> >> +++ b/include/uapi/linux/rseq.h
> > …
> >> @@ -128,6 +131,8 @@ struct rseq {
> >> * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
> >> *     Inhibit instruction sequence block restart on migration for
> >> *     this thread.
> >> + * - RSEQ_CS_DELAY_RESCHED
> >> + *     Try delay resched...
> > 
> > Delay resched up to $time for $kind-of-stats under $conditions.
> 
> Will add some comment like
>  “Delay resched for upto 50us.  Checked when thread is about to be preempted"
> 
> With the tunable added in the subsequent patch, will change ‘50us' it to the tunable name.

A proper descritption of the flag would nice. The current state is that
I can derive move from the constant than from the comment.

> >> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> >> index 6b7ff1bc1b9b..944027d14198 100644
> >> --- a/kernel/entry/common.c
> >> +++ b/kernel/entry/common.c
> > …
> >> @@ -99,8 +100,12 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> >> 
> >> local_irq_enable_exit_to_user(ti_work);
> >> 
> >> - if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY))
> >> - schedule();
> >> + if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
> > 
> > couldn't we restrict this to _TIF_NEED_RESCHED_LAZY? That way we would
> > still schedule immediately for any SCHED_FIFO/RR/DL tasks and do this
> > delay only for everything else such as SCHED_OTHER/…
> 
> 
> Wasn’t this the entire discussion about whether to limit it to SCHE_OTHER or not?
> Will defer it to Peter.

Oh. But this still deserves a trace point for this manoeuvre. A trace
would show you a wakeup, the need-resched bit will be shown and then it
will vanish later and people might wonder where did it go.

> > 
> >> +       if (irq && rseq_delay_resched())
> >> +       clear_tsk_need_resched(current);
> >> +       else
> >> +       schedule();
> >> + }
> >> 
> >> if (ti_work & _TIF_UPROBE)
> >> uprobe_notify_resume(regs);
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 165c90ba64ea..cee50e139723 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -823,6 +823,7 @@ void update_rq_clock(struct rq *rq)
> >> 
> >> static void hrtick_clear(struct rq *rq)
> >> {
> >> + rseq_delay_resched_tick();
> > 
> > This is called from __schedule(). If you set the need-resched flag here,
> > it gets removed shortly after. Do I miss something?
> 
> hrtick_clear() is also called when the cpu is being removed in sched_cpu_dying().
> We need to set resched there?

Do we? My understanding is that the NEED_RESCHED flag gets removed once
and then RSEQ_CS_DELAY_RESCHED gets set. RSEQ_CS_DELAY_RESCHED in turn
gets cleared in the scheduler once task leaves the CPU. Once the task
left the CPU then there is no need to set the bit. The sched_cpu_dying()
is the HP thread so if that one is on the CPU then the user task is
gone.

How does this delay thingy work with HZ=100 vs HZ=1000? Like what is the
most you could get in extra time? I could imagine that if a second task
gets on the runqueue and you skip the wake up but the runtime is used up
then the HZ tick should set NEED_RESCHED again and the following HZ tick
should force the schedule point.

> Thanks for your comments.
> -Prakash.

Sebastian

next prev parent reply	other threads:[~2025-04-25  6:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-18 19:34 [PATCH V2 0/3] Scheduler time slice extension Prakash Sangappa
2025-04-18 19:34 ` [PATCH V2 1/3] Sched: " Prakash Sangappa
2025-04-24 14:13   ` Sebastian Andrzej Siewior
2025-04-25  1:19     ` Prakash Sangappa
2025-04-25  3:48       ` K Prateek Nayak
2025-04-25 17:07         ` Prakash Sangappa
2025-04-25  6:53       ` Sebastian Andrzej Siewior [this message]
2025-04-25 18:55         ` Prakash Sangappa
2025-04-18 19:34 ` [PATCH V2 2/3] Sched: Tunable to specify duration of " Prakash Sangappa
2025-04-18 19:34 ` [PATCH V2 3/3] Sched: Add scheduler stat for cpu " Prakash Sangappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250425065357.eiwSSvff@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=peterz@infradead.org \
    --cc=prakash.sangappa@oracle.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.