linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V5 0/6] Scheduler time slice extension
@ 2025-06-03 23:36 Prakash Sangappa
  2025-06-03 23:36 ` [PATCH V5 1/6] Sched: " Prakash Sangappa
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Prakash Sangappa @ 2025-06-03 23:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, rostedt, mathieu.desnoyers, tglx, bigeasy, kprateek.nayak,
	vineethr

A user thread can get preempted in the middle of executing a critical
section in user space while holding locks, which can have undesirable affect
on performance. Having a way for the thread to request additional execution
time on cpu, so that it can complete the critical section will be useful in
such scenario. The request can be made by setting a bit in mapped memory,
such that the kernel can also access to check and grant extra execution time
on the cpu. 

There have been couple of proposals[1][2] for such a feature, which attempt
to address the above scenario by granting one extra tick of execution time.
In patch thread [1] posted by Steven Rostedt, there is ample discussion about
need for this feature.

However, the concern has been that this can lead to abuse. One extra tick can
be a long time(about a millisec or more). Peter Zijlstra in response posted a 
prototype solution[5], which grants 50us execution time extension only.
This is achieved with the help of a timer started on that cpu at the time of
granting extra execution time. When the timer fires the thread will be
preempted, if still running. 

This patchset implements above solution as suggested, with use of restartable
sequences(rseq) structure for API. Refer [3][4] for further discussions.

v5:
- Added #ifdef CONFIG_RSEQ and CONFIG_PROC_SYSCTL for sysctl tunable
  changes(patch 3).
- Added #ifdef CONFIG_RSEQ for schedular stat changes(patch 4).
- Removed deprecated flags from the supported flags returned, as
  pointed out by Mathieu Desnoyers(patch 6).
- Added IF_ENABLED(CONFIG_SCHED_HRTICK) check before returning supported
  delay resched flags.

v4:
https://lore.kernel.org/all/20250513214554.4160454-1-prakash.sangappa@oracle.com
- Changed default sched delay extension time to 30us
- Added patch to indicate to userspace if the thread got preempted in
  the extended cpu time granted. Uses another bit in rseq cs flags for it.
  This should help the application to check and avoid having to call a
  system call to yield cpu, especially sched_yield() as pointed out
  by Steven Rostedt.
- Moved tracepoint call towards end of exit_to_user_mode_loop().
- Added a pr_warn() message when the 'sched_preempt_delay_us' tunable is
  set higher then the default value of 30us.
- Patch to add an API to query if sched time extension feature is supported. 
  A new flag to sys_rseq flags argument called 'RSEQ_FLAG_QUERY_CS_FLAGS',
  is added, as suggested by Mathieu Desnoyers. 
  Returns bitmask of all the supported rseq cs flags, in rseq->flags field.

v3:
https://lore.kernel.org/all/20250502015955.3146733-1-prakash.sangappa@oracle.com
- Addressing review comments by Sebastian and Prateek.
  * Rename rseq_sched_delay -> sched_time_delay. Move its place in
    struct task_struct near other bits so it fits in existing word.
  * Use IS_ENABLED(CONFIG_RSEQ) instead of #ifdef to access
    'sched_time_delay'.
  * removed rseq_delay_resched_tick() call from hrtick_clear().
  * Introduced a patch to add a tracepoint in exit_to_user_mode_loop(),
    suggested by Sebastian.
  * Added comments to describe RSEQ_CS_FLAG_DELAY_RESCHED flag.

v2:
https://lore.kernel.org/all/20250418193410.2010058-1-prakash.sangappa@oracle.com/
- Based on discussions in [3], expecting user application to call sched_yield()
  to yield the cpu at the end of the critical section may not be advisable as
  pointed out by Linus.  

  So added a check in return path from a system call to reschedule if time
  slice extension was granted to the thread. The check could as well be in
  syscall enter path from user mode.
  This would allow application thread to call any system call to yield the cpu. 
  Which system call should be suggested? getppid(2) works.

  Do we still need the change in sched_yield() to reschedule when the thread
  has current->rseq_sched_delay set?

- Added patch to introduce a sysctl tunable parameter to specify duration of
  the time slice extension in micro seconds(us), called 'sched_preempt_delay_us'.
  Can take a value in the range 0 to 100. Default is set to 50us.
  Setting this tunable to 0 disables the scheduler time slice extension feature.

v1: 
https://lore.kernel.org/all/20250215005414.224409-1-prakash.sangappa@oracle.com/


[1] https://lore.kernel.org/lkml/20231025054219.1acaa3dd@gandalf.local.home/
[2] https://lore.kernel.org/lkml/1395767870-28053-1-git-send-email-khalid.aziz@oracle.com/
[3] https://lore.kernel.org/all/20250131225837.972218232@goodmis.org/
[4] https://lore.kernel.org/all/20241113000126.967713-1-prakash.sangappa@oracle.com/
[5] https://lore.kernel.org/lkml/20231030132949.GA38123@noisy.programming.kicks-ass.net/
[6] https://lore.kernel.org/all/1631147036-13597-1-git-send-email-prakash.sangappa@oracle.com/

Prakash Sangappa (6):
  Sched: Scheduler time slice extension
  Sched: Indicate if thread got rescheduled
  Sched: Tunable to specify duration of time slice extension
  Sched: Add scheduler stat for cpu time slice extension
  Sched: Add tracepoint for sched time slice extension
  Add API to query supported rseq cs flags

 include/linux/entry-common.h | 11 +++--
 include/linux/sched.h        | 30 +++++++++++
 include/trace/events/sched.h | 28 +++++++++++
 include/uapi/linux/rseq.h    | 19 +++++++
 kernel/entry/common.c        | 27 ++++++++--
 kernel/rseq.c                | 96 ++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c          | 60 ++++++++++++++++++++++
 kernel/sched/debug.c         |  4 ++
 kernel/sched/syscalls.c      |  5 ++
 9 files changed, 272 insertions(+), 8 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-07-01  0:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 23:36 [PATCH V5 0/6] Scheduler time slice extension Prakash Sangappa
2025-06-03 23:36 ` [PATCH V5 1/6] Sched: " Prakash Sangappa
2025-06-04 14:31   ` Steven Rostedt
2025-06-04 14:54     ` Sebastian Andrzej Siewior
2025-06-04 17:29       ` Prakash Sangappa
2025-06-04 19:23         ` Sebastian Andrzej Siewior
2025-06-09 20:55           ` Steven Rostedt
2025-06-09 21:33             ` Steven Rostedt
2025-06-10 16:31               ` Prakash Sangappa
2025-06-10 16:40                 ` Steven Rostedt
2025-07-01  0:48                   ` Prakash Sangappa
2025-06-09 21:52             ` Steven Rostedt
2025-06-09 22:06               ` Steven Rostedt
2025-06-10 15:40               ` Prakash Sangappa
2025-06-04 17:09     ` Prakash Sangappa
2025-06-03 23:36 ` [PATCH V5 2/6] Sched: Indicate if thread got rescheduled Prakash Sangappa
2025-06-03 23:36 ` [PATCH V5 3/6] Sched: Tunable to specify duration of time slice extension Prakash Sangappa
2025-06-03 23:36 ` [PATCH V5 4/6] Sched: Add scheduler stat for cpu " Prakash Sangappa
2025-06-03 23:36 ` [PATCH V5 5/6] Sched: Add tracepoint for sched " Prakash Sangappa
2025-06-04 14:36   ` Steven Rostedt
2025-06-04 17:10     ` Prakash Sangappa
2025-06-03 23:36 ` [PATCH V5 6/6] Add API to query supported rseq cs flags Prakash Sangappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).