public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning
@ 2024-11-27  5:56 Adam Li
  2024-11-27  5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Adam Li @ 2024-11-27  5:56 UTC (permalink / raw)
  To: peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr, Adam Li

When running Specjbb workload with NEXT_BUDDY enabled, kernel warning,
rcu stall and panic may be triggered.

The kernel panic is triggered in pick_next_entity() if pick_eevdf()
returns NULL.

In patch 1 ("Fix warning if NEXT_BUDDY enabled"), if sched_delayed is set
we do not set next buddy. After the patch, kernel warning, rcu stall and
panic disappear. However to avoid panic, we still check return value of
pick_eevdf().

The 'last' and 'skip' buddy are obsoleted by EEVDF. Update the comments in
pick_next_entity().

Detail log bellow:
[  124.972623] ------------[ cut here ]------------
[  124.977300] cfs_rq->next->sched_delayed
[  124.977310] WARNING: CPU: 51 PID: 2150 at kernel/sched/fair.c:5621 pick_task_fair+0x130/0x150
[  125.049547] CPU: 51 UID: 0 PID: 2150 Comm: kworker/51:1 Tainted: G            E      6.12.0.adam+ #1
[  125.058678] Tainted: [E]=UNSIGNED_MODULE
[  125.062591] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[  125.069629] Workqueue:  0x0 (mm_percpu_wq)
[  125.073721] pstate: 634000c9 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  125.080671] pc : pick_task_fair+0x130/0x150
[  125.084841] lr : pick_task_fair+0x130/0x150
[  125.089013] sp : ffff8000ab41bc10
[  125.092315] x29: ffff8000ab41bc10 x28: 0000000000000000 x27: 0000000000000000
[  125.099440] x26: ffff000123bd8788 x25: 0000000000000402 x24: 0000000000000001
[  125.106565] x23: ffff000123bd8000 x22: ffff007dfef5cd00 x21: ffff007dfef5cd80
[  125.113689] x20: ffff007dfef5cd80 x19: ffff2001ab20a780 x18: 0000000000000006
[  125.120815] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000ab41b5e0
[  125.127938] x14: 0000000000000000 x13: 646579616c65645f x12: 64656863733e2d74
[  125.135062] x11: fffffffffc000000 x10: ffff207dfac9b420 x9 : ffff80008014ed60
[  125.142185] x8 : 00000000ffdfffff x7 : ffff207dfac80000 x6 : 000000000000122c
[  125.149309] x5 : ffff007dfef49408 x4 : 40000000ffe0122c x3 : ffff807d7d673000
[  125.156433] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000123bd8000
[  125.163561] Call trace:
[  125.165996]  pick_task_fair+0x130/0x150 (P)
[  125.170167]  pick_task_fair+0x130/0x150 (L)
[  125.174338]  pick_next_task_fair+0x48/0x3c0
[  125.178512]  __pick_next_task+0x4c/0x220
[  125.182426]  pick_next_task+0x44/0x980
[  125.186163]  __schedule+0x3d0/0x628
[  125.189645]  schedule+0x3c/0xe0
[  125.192776]  worker_thread+0x1a4/0x368
[  125.196516]  kthread+0xfc/0x110
[  125.199647]  ret_from_fork+0x10/0x20
[  125.203213] ---[ end trace 0000000000000000 ]---
[  125.207818] ------------[ cut here ]------------

[  211.151849] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  211.159759] rcu:     (detected by 141, t=15003 jiffies, g=5629, q=26516 ncpus=384)
[  211.169780] rcu: All QSes seen, last rcu_preempt kthread activity 15002 (4294943634-4294928632), jiffies_till_next_fqs=2, root ->qsmask 0x0
[  211.185062] rcu: rcu_preempt kthread timer wakeup didn't happen for 15004 jiffies! g5629 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0
[  211.199043] rcu:     Possible timer handling issue on cpu=352 timer-softirq=1091
[  211.208943] rcu: rcu_preempt kthread starved for 15012 jiffies! g5629 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=352
[  211.222141] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  211.234037] rcu: RCU grace-period kthread stack dump:
[  211.241854] task:rcu_preempt     state:R  running task     stack:0     pid:17    tgid:17    ppid:2      flags:0x00000008
[  211.255487] Call trace:
[  211.260698]  __switch_to+0xf0/0x150 (T)
[  211.267299]  __schedule+0x238/0x628
[  211.273553]  schedule+0x3c/0xe0
[  211.279459]  schedule_timeout+0x88/0x108
[  211.286147]  rcu_gp_fqs_loop+0x158/0x4d0
[  211.292835]  rcu_gp_kthread+0x164/0x198
[  211.299436]  kthread+0xfc/0x110
[  211.305342]  ret_from_fork+0x10/0x20
[  211.311684] rcu: Stack dump where RCU GP kthread last ran:
[  211.319940] Sending NMI from CPU 141 to CPUs 352:
[  211.327411] NMI backtrace for cpu 352
[  211.333835] CPU: 352 UID: 0 PID: 0 Comm: swapper/352 Tainted: G        W   E      6.12.0.adam+ #1
[  211.345466] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[  211.353021] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[  211.362834] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  211.372557] pc : cpuidle_enter_state+0xcc/0x4f0
[  211.379851] lr : cpuidle_enter_state+0xc0/0x4f0
[  211.387147] sp : ffff8000878c3d70
[  211.393226] x29: ffff8000878c3d70 x28: 0000000000000000 x27: 0000000000000000
[  211.403125] x26: 0000000000000000 x25: 0000000000000000 x24: 0000003133e26b84
[  211.413023] x23: 0000000000000000 x22: ffff800082092d98 x21: 00000031341844e8
[  211.422922] x20: 0000000000000000 x19: ffff20011d459800 x18: 0000000000000000
[  211.432820] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  211.442719] x14: 0000000000000000 x13: ffff800081dcf030 x12: 0000000000000001
[  211.452619] x11: 0000003131433c14 x10: 071c71c71c71c71c x9 : ffff8000810b5900
[  211.462517] x8 : 00000000003bf790 x7 : ffff207e031d57e4 x6 : 0000002023a22338
[  211.472416] x5 : 1fffffffffffffff x4 : 0000000000000015 x3 : 000000000030e5a8
[  211.482314] x2 : ffffa07d818ed000 x1 : ffff207e031d6d00 x0 : 0000000000000000
[  211.492213] Call trace:
[  211.497426]  cpuidle_enter_state+0xcc/0x4f0 (P)
[  211.504719]  cpuidle_enter_state+0xc0/0x4f0 (L)
[  211.512013]  cpuidle_enter+0x40/0x60
[  211.518351]  cpuidle_idle_call+0x130/0x1c8
[  211.525212]  do_idle+0xec/0xf8
[  211.531033]  cpu_startup_entry+0x40/0x50
[  211.537719]  secondary_start_kernel+0xe0/0x120
[  211.544926]  __secondary_switched+0xc0/0xc8

[  297.371198] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000051
[  297.406112] CPU: 116 UID: 0 PID: 10328 Comm: Grizzly-worker( Tainted: G        W   E      6.12.0.adam+ #1
[  297.414884] Mem abort info:
[  297.424437] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[  297.427219]   ESR = 0x0000000096000005
[  297.431997] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[  297.435734]   EC = 0x25: DABT (current EL), IL = 32 bits
[  297.442770] pstate: a34000c9 (NzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  297.448069]   SET = 0, FnV = 0
[  297.455018] pc : pick_task_fair+0x50/0x150
[  297.458060]   EA = 0, S1PTW = 0
[  297.462144] lr : pick_task_fair+0x50/0x150
[  297.465274]   FSC = 0x05: level 1 translation fault
[  297.469358] sp : ffff800101d93ae0
[  297.474223] Data abort info:
[  297.477526] x29: ffff800101d93ae0
[  297.480395]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  297.480395]  x28: 0000000000000009
[  297.483703]  x27: 0000000000000000
[  297.489177]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  297.492567]
[  297.495956]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  297.500996] x26: ffff006da4381b08
[  297.502477] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000198b3b000
[  297.507777]  x25: 0000000000000080
[  297.511080] [0000000000000051] pgd=08000001c0636403
[  297.517509]  x24: 0000000000000001
[  297.520899] , p4d=08000001c0636403
[  297.525765]
[  297.529155] , pud=0000000000000000
[  297.532545] x23: ffff006da4381380
[  297.534025]
[  297.537415]  x22: ffff007dff7fed00 x21: ffff007dff7fed80
[  297.547496] x20: ffff000167f60c00 x19: 0000000000000000 x18: 0000000000000006
[  297.554621] x17: ffff8000820b3be8 x16: 0000000087c17f9e x15: ffff800083d53690
[  297.561745] x14: 0000000000000004 x13: ffff800081df4ac8 x12: 0000000000000000
[  297.568868] x11: ffff200111a3f0b0 x10: ffff200111a3efc8 x9 : ffff800080109e48
[  297.575992] x8 : 00000000000000b8 x7 : 0000000000000074 x6 : 0000000000000002
[  297.583115] x5 : 0000000000000002 x4 : 0000000000000002 x3 : 0000000000000000
[  297.590239] x2 : fffffffffffffff0 x1 : 0000000000000000 x0 : 0000000000000000
[  297.597362] Call trace:
[  297.599795]  pick_task_fair+0x50/0x150 (P)
[  297.603879]  pick_task_fair+0x50/0x150 (L)
[  297.607963]  pick_next_task_fair+0x30/0x3c0
[  297.612134]  __pick_next_task+0x4c/0x220
[  297.616045]  pick_next_task+0x44/0x980
[  297.619782]  __schedule+0x3d0/0x628
[  297.623259]  do_task_dead+0x50/0x60
[  297.626736]  do_exit+0x28c/0x410
[  297.629955]  do_group_exit+0x3c/0xa0
[  297.633518]  get_signal+0x8c4/0x8d0
[  297.636996]  do_signal+0x9c/0x270
[  297.640299]  do_notify_resume+0xe0/0x198
[  297.644212]  el0_svc+0xf4/0x170
[  297.647342]  el0t_64_sync_handler+0x10c/0x138
[  297.651687]  el0t_64_sync+0x1ac/0x1b0
[  297.655339] Code: d503201f 1400002a aa1403e0 97ffda0b (39414401)
[  297.661439] ---[ end trace 0000000000000000 ]---
[  297.726593] Kernel panic - not syncing: Oops: Fatal exception

v2:
  Follow Christian Loehle's suggestion, revise commit message.
  Add patch to check return value of pick_eevdf() in pick_next_entity().

v1:
   https://lore.kernel.org/all/20241125021222.356881-1-adamli@os.amperecomputing.com/

Adam Li (3):
  sched/fair: Fix warning if NEXT_BUDDY enabled
  sched/fair: Fix panic if pick_eevdf() returns NULL
  sched/fair: Update comments regarding last and skip buddy

 kernel/sched/fair.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-27  5:56 [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning Adam Li
@ 2024-11-27  5:56 ` Adam Li
  2024-11-28  7:29   ` K Prateek Nayak
  2024-11-27  5:56 ` [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL Adam Li
  2024-11-27  5:56 ` [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy Adam Li
  2 siblings, 1 reply; 24+ messages in thread
From: Adam Li @ 2024-11-27  5:56 UTC (permalink / raw)
  To: peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr, Adam Li

Enabling NEXT_BUDDY triggers warning, and rcu stall:

[  124.977300] cfs_rq->next->sched_delayed
[  124.977310] WARNING: CPU: 51 PID: 2150 at kernel/sched/fair.c:5621 pick_task_fair+0x130/0x150
[  125.049547] CPU: 51 UID: 0 PID: 2150 Comm: kworker/51:1 Tainted: G            E      6.12.0.adam+ #1
<snip>
[  125.163561] Call trace:
[  125.165996]  pick_task_fair+0x130/0x150 (P)
[  125.170167]  pick_task_fair+0x130/0x150 (L)
[  125.174338]  pick_next_task_fair+0x48/0x3c0
[  125.178512]  __pick_next_task+0x4c/0x220
[  125.182426]  pick_next_task+0x44/0x980
[  125.186163]  __schedule+0x3d0/0x628
[  125.189645]  schedule+0x3c/0xe0
[  125.192776]  worker_thread+0x1a4/0x368
[  125.196516]  kthread+0xfc/0x110
[  125.199647]  ret_from_fork+0x10/0x20
[  125.203213] ---[ end trace 0000000000000000 ]---
<snip>
[  211.151849] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  211.159759] rcu:     (detected by 141, t=15003 jiffies, g=5629, q=26516 ncpus=384)
<snip>

Do not set next buddy if sched_delayed is set.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fbdca89c677f..cd1188b7f3df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
 			return;
 		if (se_is_idle(se))
 			return;
+		if (se->sched_delayed)
+			return;
 		cfs_rq_of(se)->next = se;
 	}
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL
  2024-11-27  5:56 [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning Adam Li
  2024-11-27  5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
@ 2024-11-27  5:56 ` Adam Li
  2024-11-29  9:18   ` Peter Zijlstra
  2024-11-27  5:56 ` [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy Adam Li
  2 siblings, 1 reply; 24+ messages in thread
From: Adam Li @ 2024-11-27  5:56 UTC (permalink / raw)
  To: peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr, Adam Li

pick_eevdf() may return NULL, which triggers NULL pointer
dereference at pick_next_entity():
	struct sched_entity *se = pick_eevdf(cfs_rq);
	if (se->sched_delayed)

    [  297.371198] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000051
    [  297.406112] CPU: 116 UID: 0 PID: 10328 Comm: Grizzly-worker( Tainted: G        W   E      6.12.0.adam+ #1
    [  297.597362] Call trace:
    [  297.599795]  pick_task_fair+0x50/0x150 (P)
    [  297.603879]  pick_task_fair+0x50/0x150 (L)
    [  297.607963]  pick_next_task_fair+0x30/0x3c0
    [  297.612134]  __pick_next_task+0x4c/0x220
    [  297.616045]  pick_next_task+0x44/0x980
    [  297.619782]  __schedule+0x3d0/0x628
    [  297.623259]  do_task_dead+0x50/0x60
    [  297.626736]  do_exit+0x28c/0x410
    [  297.629955]  do_group_exit+0x3c/0xa0
    [  297.633518]  get_signal+0x8c4/0x8d0
    [  297.636996]  do_signal+0x9c/0x270
    [  297.640299]  do_notify_resume+0xe0/0x198
    [  297.644212]  el0_svc+0xf4/0x170
    [  297.647342]  el0t_64_sync_handler+0x10c/0x138
    [  297.651687]  el0t_64_sync+0x1ac/0x1b0
    [  297.655339] Code: d503201f 1400002a aa1403e0 97ffda0b (39414401)
    [  297.661439] ---[ end trace 0000000000000000 ]---
    [  297.726593] Kernel panic - not syncing: Oops: Fatal exception

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd1188b7f3df..d5a3b5589e4e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5623,7 +5623,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
 	}
 
 	struct sched_entity *se = pick_eevdf(cfs_rq);
-	if (se->sched_delayed) {
+	if (se && se->sched_delayed) {
 		dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
 		/*
 		 * Must not reference @se again, see __block_task().
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy
  2024-11-27  5:56 [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning Adam Li
  2024-11-27  5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
  2024-11-27  5:56 ` [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL Adam Li
@ 2024-11-27  5:56 ` Adam Li
  2025-03-13  8:30   ` Madadi Vineeth Reddy
  2 siblings, 1 reply; 24+ messages in thread
From: Adam Li @ 2024-11-27  5:56 UTC (permalink / raw)
  To: peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr, Adam Li

Commit 5e963f2bd465 ("sched/fair: Commit to EEVDF") removed the "last"
and "skip" buddy. Update comments in pick_next_entity().

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
---
 kernel/sched/fair.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d5a3b5589e4e..259c56dcdff6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5602,17 +5602,11 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 
 static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags);
 
-/*
- * Pick the next process, keeping these things in mind, in this order:
- * 1) keep things fair between processes/task groups
- * 2) pick the "next" process, since someone really wants that to run
- * 3) pick the "last" process, for cache locality
- * 4) do not run the "skip" process, if something else is available
- */
 static struct sched_entity *
 pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
 {
 	/*
+	 * Pick the "next" buddy, since someone really wants that to run.
 	 * Enabling NEXT_BUDDY will affect latency but not fairness.
 	 */
 	if (sched_feat(NEXT_BUDDY) &&
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-27  5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
@ 2024-11-28  7:29   ` K Prateek Nayak
  2024-11-29  3:21     ` Adam Li
                       ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: K Prateek Nayak @ 2024-11-28  7:29 UTC (permalink / raw)
  To: Adam Li, peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr

Hello Adam,

On 11/27/2024 11:26 AM, Adam Li wrote:
> Enabling NEXT_BUDDY triggers warning, and rcu stall:
> 
> [  124.977300] cfs_rq->next->sched_delayed

I could reproduce this with a run of "perf bench sched messaging" but
given that we hit this warning, it also means that either
set_next_buddy() has incorrectly set a delayed entity as next buddy, or
clear_next_buddy() did not clear a delayed entity.

I also see PSI splats like:

     psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0

but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
the flags it is trying to clear
"(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
possible if you have picked a dequeued entity for running before its
wakeup, which is also perhaps why the "nr_running" computation goes awry
and pick_eevdf() returns NULL (which it should never since
pick_next_entity() is only called when rq->cfs.nr_running is > 0)

> [  124.977310] WARNING: CPU: 51 PID: 2150 at kernel/sched/fair.c:5621 pick_task_fair+0x130/0x150
> [  125.049547] CPU: 51 UID: 0 PID: 2150 Comm: kworker/51:1 Tainted: G            E      6.12.0.adam+ #1
> <snip>
> [  125.163561] Call trace:
> [  125.165996]  pick_task_fair+0x130/0x150 (P)
> [  125.170167]  pick_task_fair+0x130/0x150 (L)
> [  125.174338]  pick_next_task_fair+0x48/0x3c0
> [  125.178512]  __pick_next_task+0x4c/0x220
> [  125.182426]  pick_next_task+0x44/0x980
> [  125.186163]  __schedule+0x3d0/0x628
> [  125.189645]  schedule+0x3c/0xe0
> [  125.192776]  worker_thread+0x1a4/0x368
> [  125.196516]  kthread+0xfc/0x110
> [  125.199647]  ret_from_fork+0x10/0x20
> [  125.203213] ---[ end trace 0000000000000000 ]---
> <snip>
> [  211.151849] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  211.159759] rcu:     (detected by 141, t=15003 jiffies, g=5629, q=26516 ncpus=384)
> <snip>
> 
> Do not set next buddy if sched_delayed is set.
> 
> Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
> Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
> ---
>   kernel/sched/fair.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fbdca89c677f..cd1188b7f3df 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>   			return;
>   		if (se_is_idle(se))
>   			return;
> +		if (se->sched_delayed)
> +			return;

I tried to put a SCHED_WARN_ON() here to track where this comes from and
seems like it is usually from attach_task() in the load balancing path
pulling a delayed task which is set as the next buddy in
check_preempt_wakeup_fair()

Can you please try the following diff instead of the first two patches
and see if you still hit these warnings, stalls, and pick_eevdf()
returning NULL?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff7cae9274c5..61e74eb5af22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  	bool sleep = flags & DEQUEUE_SLEEP;
  
  	update_curr(cfs_rq);
+	clear_buddies(cfs_rq, se);
  
  	if (flags & DEQUEUE_DELAYED) {
  		SCHED_WARN_ON(!se->sched_delayed);
@@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  
  	update_stats_dequeue_fair(cfs_rq, se, flags);
  
-	clear_buddies(cfs_rq, se);
-
  	update_entity_lag(cfs_rq, se);
  	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
  		se->deadline -= se->vruntime;
@@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
  	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
  		return;
  
-	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
+	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
  		set_next_buddy(pse);
  	}
  
--

If you are still encountering pick_eevdf() returning NULL, there could
be a larger issues (with eligibility computation, etc.) that the second
patch can hide which can lead to bigger problems later. Thank you.

>   		cfs_rq_of(se)->next = se;
>   	}
>   }

-- 
Thanks and Regards,
Prateek


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-28  7:29   ` K Prateek Nayak
@ 2024-11-29  3:21     ` Adam Li
  2024-11-29  4:28       ` K Prateek Nayak
  2024-11-29  9:55     ` Peter Zijlstra
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Adam Li @ 2024-11-29  3:21 UTC (permalink / raw)
  To: K Prateek Nayak, peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr

On 11/28/2024 3:29 PM, K Prateek Nayak wrote:
> Hello Adam,
> 
Hi Prateek,
Thanks for comments.

> On 11/27/2024 11:26 AM, Adam Li wrote:
>> Enabling NEXT_BUDDY triggers warning, and rcu stall:
>>
>> [  124.977300] cfs_rq->next->sched_delayed
> 
> I could reproduce this with a run of "perf bench sched messaging" but
> given that we hit this warning, it also means that either
> set_next_buddy() has incorrectly set a delayed entity as next buddy, or
> clear_next_buddy() did not clear a delayed entity.
> 
Yes. The logic of this patch is a delayed entity should not be set as next buddy.

> I also see PSI splats like:
> 
>     psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
> 
> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
> the flags it is trying to clear
> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
> possible if you have picked a dequeued entity for running before its
> wakeup, which is also perhaps why the "nr_running" computation goes awry
> and pick_eevdf() returns NULL (which it should never since
> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
IIUC, one path for pick_eevdf() to return NULL is:
pick_eevdf():
<snip>
	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
		curr = NULL; <--- curr is set to NULL
<snip>
found:
	if (!best || (curr && entity_before(curr, best)))
		best = curr; <--- curr and best are both NULL

	return best;  <--- return NULL

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index fbdca89c677f..cd1188b7f3df 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>>               return;
>>           if (se_is_idle(se))
>>               return;
>> +        if (se->sched_delayed)
>> +            return;
> 
> I tried to put a SCHED_WARN_ON() here to track where this comes from and
> seems like it is usually from attach_task() in the load balancing path
> pulling a delayed task which is set as the next buddy in
> check_preempt_wakeup_fair()
> 
> Can you please try the following diff instead of the first two patches
> and see if you still hit these warnings, stalls, and pick_eevdf()
> returning NULL?
> 
Tested. Run specjbb with NEXT_BUDDY enabled, warnings, stalls and panic disappear.

Regards,
-adam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-29  3:21     ` Adam Li
@ 2024-11-29  4:28       ` K Prateek Nayak
  2024-11-29  7:40         ` Adam Li
  0 siblings, 1 reply; 24+ messages in thread
From: K Prateek Nayak @ 2024-11-29  4:28 UTC (permalink / raw)
  To: Adam Li, peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr

Hello Adam,

On 11/29/2024 8:51 AM, Adam Li wrote:
> On 11/28/2024 3:29 PM, K Prateek Nayak wrote:
>> Hello Adam,
>>
> Hi Prateek,
> Thanks for comments.
> 
>> On 11/27/2024 11:26 AM, Adam Li wrote:
>>> Enabling NEXT_BUDDY triggers warning, and rcu stall:
>>>
>>> [  124.977300] cfs_rq->next->sched_delayed
>>
>> I could reproduce this with a run of "perf bench sched messaging" but
>> given that we hit this warning, it also means that either
>> set_next_buddy() has incorrectly set a delayed entity as next buddy, or
>> clear_next_buddy() did not clear a delayed entity.
>>
> Yes. The logic of this patch is a delayed entity should not be set as next buddy.
> 
>> I also see PSI splats like:
>>
>>      psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>
>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>> the flags it is trying to clear
>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>> possible if you have picked a dequeued entity for running before its
>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>> and pick_eevdf() returns NULL (which it should never since
>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
> IIUC, one path for pick_eevdf() to return NULL is:
> pick_eevdf():
> <snip>
> 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> 		curr = NULL; <--- curr is set to NULL

"on_rq" is only cleared when the entity is dequeued so "curr" is in fact
going to sleep (proper sleep) and we've reached at pick_eevdf(),
otherwise, if "curr" is not eligible, there is at least one more tasks
on the cfs_rq which implies best has be found and will be non-null.

> <snip>
> found:
> 	if (!best || (curr && entity_before(curr, best)))
> 		best = curr; <--- curr and best are both NULL

Say "curr" is going to sleep, and there is no "best", in which case
"curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
should have not reached pick_eevdf() in the first place since
pick_next_entity() is only called by pick_task_fair() if
"cfs_rq->nr_running" is non-zero.

So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
return a valid runnable entity. Failure to do so perhaps points to
"entity_eligible()" check going sideways somewhere or a bug in
"nr_running" accounting.

Chenyu had proposed a similar fix long back in
https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
but the consensus was it was covering up a larger problem which
then boiled down to avg_vruntime being computed incorrectly
https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/

> 
> 	return best;  <--- return NULL
> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index fbdca89c677f..cd1188b7f3df 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>>>                return;
>>>            if (se_is_idle(se))
>>>                return;
>>> +        if (se->sched_delayed)
>>> +            return;
>>
>> I tried to put a SCHED_WARN_ON() here to track where this comes from and
>> seems like it is usually from attach_task() in the load balancing path
>> pulling a delayed task which is set as the next buddy in
>> check_preempt_wakeup_fair()
>>
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
> Tested. Run specjbb with NEXT_BUDDY enabled, warnings, stalls and panic disappear.

Thank you for testing. I'll let Peter come back on which approach he
prefers :)

> 
> Regards,
> -adam

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-29  4:28       ` K Prateek Nayak
@ 2024-11-29  7:40         ` Adam Li
  2024-11-29  8:00           ` K Prateek Nayak
  0 siblings, 1 reply; 24+ messages in thread
From: Adam Li @ 2024-11-29  7:40 UTC (permalink / raw)
  To: K Prateek Nayak, peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr

On 11/29/2024 12:28 PM, K Prateek Nayak wrote:

>>> I also see PSI splats like:
>>>
>>>      psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>>
>>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>>> the flags it is trying to clear
>>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>>> possible if you have picked a dequeued entity for running before its
>>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>>> and pick_eevdf() returns NULL (which it should never since
>>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
>> IIUC, one path for pick_eevdf() to return NULL is:
>> pick_eevdf():
>> <snip>
>>     if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>>         curr = NULL; <--- curr is set to NULL
> 
> "on_rq" is only cleared when the entity is dequeued so "curr" is in fact
> going to sleep (proper sleep) and we've reached at pick_eevdf(),
> otherwise, if "curr" is not eligible, there is at least one more tasks
> on the cfs_rq which implies best has be found and will be non-null.
> 
if curr->sched_delayed == 1, the condition: '(curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))'
can be true. Please correct me if wrong.

>> <snip>
>> found:
>>     if (!best || (curr && entity_before(curr, best)))
>>         best = curr; <--- curr and best are both NULL
> 
> Say "curr" is going to sleep, and there is no "best", in which case
> "curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
> should have not reached pick_eevdf() in the first place since
> pick_next_entity() is only called by pick_task_fair() if
> "cfs_rq->nr_running" is non-zero.
> 
> So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
> return a valid runnable entity. Failure to do so perhaps points to
> "entity_eligible()" check going sideways somewhere or a bug in
> "nr_running" accounting.
> 
> Chenyu had proposed a similar fix long back in
> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
> but the consensus was it was covering up a larger problem which
> then boiled down to avg_vruntime being computed incorrectly
> https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/
> 
Thanks for the information.
From the timeline, it seems the issue is before 152e11f6df29 ("sched/fair: Implement delayed dequeue").
DELAY_DEQUEUE may introduce risk for pick_eevdf() return NULL.

After patch 1 ("Fix warning if NEXT_BUDDY enabled"), the NULL pointer panic disappears.
Patch 2 ("Fix panic if pick_eevdf() returns NULL") is a safe guard.

Thanks,
-adam


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-29  7:40         ` Adam Li
@ 2024-11-29  8:00           ` K Prateek Nayak
  0 siblings, 0 replies; 24+ messages in thread
From: K Prateek Nayak @ 2024-11-29  8:00 UTC (permalink / raw)
  To: Adam Li, peterz, mingo, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr

Hello Adam,

On 11/29/2024 1:10 PM, Adam Li wrote:
> On 11/29/2024 12:28 PM, K Prateek Nayak wrote:
> 
>>>> I also see PSI splats like:
>>>>
>>>>       psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>>>
>>>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>>>> the flags it is trying to clear
>>>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>>>> possible if you have picked a dequeued entity for running before its
>>>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>>>> and pick_eevdf() returns NULL (which it should never since
>>>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
>>> IIUC, one path for pick_eevdf() to return NULL is:
>>> pick_eevdf():
>>> <snip>
>>>      if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>>>          curr = NULL; <--- curr is set to NULL
>>
>> "on_rq" is only cleared when the entity is dequeued so "curr" is in fact
>> going to sleep (proper sleep) and we've reached at pick_eevdf(),
>> otherwise, if "curr" is not eligible, there is at least one more tasks
>> on the cfs_rq which implies best has be found and will be non-null.
>>
> if curr->sched_delayed == 1, the condition: '(curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))'
> can be true. Please correct me if wrong.

If curr is sched_delayed, that means it is going to sleep and it is
ineligible but it can only be ineligible if there is at least one more
task with a lower vruntime and hence best must be found. A delayed
entity will also not decrement the "nr_running" and it'll be queued back
from put_prev_entity() to be picked off later.

Essentially, I believe curr can never be ineligible in absence of other
eligible tasks.

> 
>>> <snip>
>>> found:
>>>      if (!best || (curr && entity_before(curr, best)))
>>>          best = curr; <--- curr and best are both NULL
>>
>> Say "curr" is going to sleep, and there is no "best", in which case
>> "curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
>> should have not reached pick_eevdf() in the first place since
>> pick_next_entity() is only called by pick_task_fair() if
>> "cfs_rq->nr_running" is non-zero.
>>
>> So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
>> return a valid runnable entity. Failure to do so perhaps points to
>> "entity_eligible()" check going sideways somewhere or a bug in
>> "nr_running" accounting.
>>
>> Chenyu had proposed a similar fix long back in
>> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
>> but the consensus was it was covering up a larger problem which
>> then boiled down to avg_vruntime being computed incorrectly
>> https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/
>>
> Thanks for the information.
>  From the timeline, it seems the issue is before 152e11f6df29 ("sched/fair: Implement delayed dequeue").
> DELAY_DEQUEUE may introduce risk for pick_eevdf() return NULL.

Ideally it shouldn't since delayed entities are still captured in
"cfs_rq->nr_running" and they'll eventually become eligible and be
picked off but I may be wrong and I hope someone corrects me in which
case :)

> 
> After patch 1 ("Fix warning if NEXT_BUDDY enabled"), the NULL pointer panic disappears.
> Patch 2 ("Fix panic if pick_eevdf() returns NULL") is a safe guard.
> 
> Thanks,
> -adam
> 

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL
  2024-11-27  5:56 ` [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL Adam Li
@ 2024-11-29  9:18   ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2024-11-29  9:18 UTC (permalink / raw)
  To: Adam Li
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On Wed, Nov 27, 2024 at 05:56:09AM +0000, Adam Li wrote:
> pick_eevdf() may return NULL, which triggers NULL pointer
> dereference at pick_next_entity():
> 	struct sched_entity *se = pick_eevdf(cfs_rq);
> 	if (se->sched_delayed)
> 
>     [  297.371198] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000051
>     [  297.406112] CPU: 116 UID: 0 PID: 10328 Comm: Grizzly-worker( Tainted: G        W   E      6.12.0.adam+ #1
>     [  297.597362] Call trace:
>     [  297.599795]  pick_task_fair+0x50/0x150 (P)
>     [  297.603879]  pick_task_fair+0x50/0x150 (L)
>     [  297.607963]  pick_next_task_fair+0x30/0x3c0
>     [  297.612134]  __pick_next_task+0x4c/0x220
>     [  297.616045]  pick_next_task+0x44/0x980

Well, if you look at pick_task_fair() you'll see we'll not get there
unless cfs_rq->nr_running is non-zero, at which point pick_eevdf()
really should NOT be returning NULL.

If it does, something else is broken.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-28  7:29   ` K Prateek Nayak
  2024-11-29  3:21     ` Adam Li
@ 2024-11-29  9:55     ` Peter Zijlstra
  2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
  2024-11-29 17:46       ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled K Prateek Nayak
  2024-12-03 16:05     ` Peter Zijlstra
  2024-12-09 11:00     ` [tip: sched/core] sched/fair: Fix NEXT_BUDDY tip-bot2 for K Prateek Nayak
  3 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2024-11-29  9:55 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:

> I tried to put a SCHED_WARN_ON() here to track where this comes from and
> seems like it is usually from attach_task() in the load balancing path
> pulling a delayed task which is set as the next buddy in
> check_preempt_wakeup_fair()
> 
> Can you please try the following diff instead of the first two patches
> and see if you still hit these warnings, stalls, and pick_eevdf()
> returning NULL?
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff7cae9274c5..61e74eb5af22 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	bool sleep = flags & DEQUEUE_SLEEP;
>  	update_curr(cfs_rq);
> +	clear_buddies(cfs_rq, se);
>  	if (flags & DEQUEUE_DELAYED) {
>  		SCHED_WARN_ON(!se->sched_delayed);
> @@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	update_stats_dequeue_fair(cfs_rq, se, flags);
> -	clear_buddies(cfs_rq, se);
> -
>  	update_entity_lag(cfs_rq, se);
>  	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>  		se->deadline -= se->vruntime;

So this puts the clear_buddies() before the whole delayed thing, and
should be sufficient afaict, no?

> @@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>  	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
>  		return;
> -	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
> +	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
>  		set_next_buddy(pse);
>  	}

But then this should never happen, which is after a wakeup, p and the
whole hierarchy up should be runnable at this point.

Or should I go find more wake-up juice and try again :-)



Anyway..  I'm sure I started a patch series cleaning up the whole next
buddy thing months ago (there's more problems here), but I can't seem to
find it in a hurry :/




^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
  2024-11-29  9:55     ` Peter Zijlstra
@ 2024-11-29 10:15       ` Peter Zijlstra
  2024-11-29 10:18         ` Peter Zijlstra
                           ` (2 more replies)
  2024-11-29 17:46       ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled K Prateek Nayak
  1 sibling, 3 replies; 24+ messages in thread
From: Peter Zijlstra @ 2024-11-29 10:15 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:

> Anyway..  I'm sure I started a patch series cleaning up the whole next
> buddy thing months ago (there's more problems here), but I can't seem to
> find it in a hurry :/

There was this..

---
Subject: sched/fair: Untangle NEXT_BUDDY and pick_next_task()
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Nov 29 10:36:59 CET 2024

There are 3 sites using set_next_buddy() and only one is conditional
on NEXT_BUDDY, the other two sites are unconditional; to note:

  - yield_to_task()
  - cgroup dequeue / pick optimization

However, having NEXT_BUDDY control both the wakeup-preemption and the
picking side of things means its near useless.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c     |    4 ++--
 kernel/sched/features.h |    9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5613,9 +5613,9 @@ static struct sched_entity *
 pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
 {
 	/*
-	 * Enabling NEXT_BUDDY will affect latency but not fairness.
+	 * Picking the ->next buddy will affect latency but not fairness.
 	 */
-	if (sched_feat(NEXT_BUDDY) &&
+	if (sched_feat(PICK_BUDDY) &&
 	    cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) {
 		/* ->next will never be delayed */
 		SCHED_WARN_ON(cfs_rq->next->sched_delayed);
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -32,6 +32,15 @@ SCHED_FEAT(PREEMPT_SHORT, true)
 SCHED_FEAT(NEXT_BUDDY, false)
 
 /*
+ * Allow completely ignoring cfs_rq->next; which can be set from various
+ * places:
+ *   - NEXT_BUDDY (wakeup preemption)
+ *   - yield_to_task()
+ *   - cgroup dequeue / pick
+ */
+SCHED_FEAT(PICK_BUDDY, true)
+
+/*
  * Consider buddies to be cache hot, decreases the likeliness of a
  * cache buddy being migrated away, increases cache locality.
  */
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
  2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
@ 2024-11-29 10:18         ` Peter Zijlstra
  2024-11-29 10:37           ` Adam Li
  2024-12-06  6:47         ` Adam Li
  2024-12-09 11:00         ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2024-11-29 10:18 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On Fri, Nov 29, 2024 at 11:15:41AM +0100, Peter Zijlstra wrote:
> On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:
> 
> > Anyway..  I'm sure I started a patch series cleaning up the whole next
> > buddy thing months ago (there's more problems here), but I can't seem to
> > find it in a hurry :/
> 
> There was this..

And this I think.

Adam, what was the reason you were enabling NEXT_BUDDY in the first
place?

I think someone (Ingo?) was proposing we kill the wakeup preempt thing;
and I suspect you don't actually care about that but instead want either
the cgroup or the yield_to_task()/KVM thing working.

---
Subject: sched/fair: Add CGROUP_BUDDY feature
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Nov 29 10:49:45 CET 2024

Add a feature to toggle the cgroup optimization.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c     |    3 ++-
 kernel/sched/features.h |    8 +++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7126,7 +7126,8 @@ static int dequeue_entities(struct rq *r
 			 * Bias pick_next to pick a task from this cfs_rq, as
 			 * p is sleeping when it is within its sched_slice.
 			 */
-			if (task_sleep && se && !throttled_hierarchy(cfs_rq))
+			if (sched_feat(CGROUP_BUDDY) &&
+			    task_sleep && se && !throttled_hierarchy(cfs_rq))
 				set_next_buddy(se);
 			break;
 		}
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -32,11 +32,17 @@ SCHED_FEAT(PREEMPT_SHORT, true)
 SCHED_FEAT(NEXT_BUDDY, false)
 
 /*
+ * Optimization for cgroup scheduling where a dequeue + pick tries
+ * to share as much of the hierarchy as possible.
+ */
+SCHED_FEAT(CGROUP_BUDDY, true)
+
+/*
  * Allow completely ignoring cfs_rq->next; which can be set from various
  * places:
  *   - NEXT_BUDDY (wakeup preemption)
  *   - yield_to_task()
- *   - cgroup dequeue / pick
+ *   - CGROUP_BUDDY (cgroup dequeue / pick)
  */
 SCHED_FEAT(PICK_BUDDY, true)
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
  2024-11-29 10:18         ` Peter Zijlstra
@ 2024-11-29 10:37           ` Adam Li
  2024-11-29 11:45             ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Adam Li @ 2024-11-29 10:37 UTC (permalink / raw)
  To: Peter Zijlstra, K Prateek Nayak
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On 11/29/2024 6:18 PM, Peter Zijlstra wrote:
> On Fri, Nov 29, 2024 at 11:15:41AM +0100, Peter Zijlstra wrote:
>> On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:
>>
>>> Anyway..  I'm sure I started a patch series cleaning up the whole next
>>> buddy thing months ago (there's more problems here), but I can't seem to
>>> find it in a hurry :/
>>
>> There was this..
> 
> And this I think.
> 
> Adam, what was the reason you were enabling NEXT_BUDDY in the first
> place?
> 
Hi Peter,

I am tuning Specjbb critical-jOPS, which is latency sensitive.
NEXT_BUDDY affects schedule latency so I tried to enable NEXT_BUDDY.
However Specjbb critical-jOPS drops with NEXT_BUDDY enabled (after my patch fixing panic).

I will test your new NEXT_BUDDY patches.

> I think someone (Ingo?) was proposing we kill the wakeup preempt thing;
> and I suspect you don't actually care about that but instead want either
> the cgroup or the yield_to_task()/KVM thing working.
> 
> ---
> Subject: sched/fair: Add CGROUP_BUDDY feature
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri Nov 29 10:49:45 CET 2024
> 
> Add a feature to toggle the cgroup optimization.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/fair.c     |    3 ++-
>  kernel/sched/features.h |    8 +++++++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7126,7 +7126,8 @@ static int dequeue_entities(struct rq *r
>  			 * Bias pick_next to pick a task from this cfs_rq, as
>  			 * p is sleeping when it is within its sched_slice.
>  			 */
> -			if (task_sleep && se && !throttled_hierarchy(cfs_rq))
> +			if (sched_feat(CGROUP_BUDDY) &&
> +			    task_sleep && se && !throttled_hierarchy(cfs_rq))
>  				set_next_buddy(se);
>  			break;
>  		}
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -32,11 +32,17 @@ SCHED_FEAT(PREEMPT_SHORT, true)
>  SCHED_FEAT(NEXT_BUDDY, false)
>  
>  /*
> + * Optimization for cgroup scheduling where a dequeue + pick tries
> + * to share as much of the hierarchy as possible.
> + */
> +SCHED_FEAT(CGROUP_BUDDY, true)
> +
> +/*
>   * Allow completely ignoring cfs_rq->next; which can be set from various
>   * places:
>   *   - NEXT_BUDDY (wakeup preemption)
>   *   - yield_to_task()
> - *   - cgroup dequeue / pick
> + *   - CGROUP_BUDDY (cgroup dequeue / pick)
>   */
>  SCHED_FEAT(PICK_BUDDY, true)
>  

Thanks,
-adam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
  2024-11-29 10:37           ` Adam Li
@ 2024-11-29 11:45             ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2024-11-29 11:45 UTC (permalink / raw)
  To: Adam Li
  Cc: K Prateek Nayak, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	linux-kernel, patches, cl, christian.loehle, vineethr

On Fri, Nov 29, 2024 at 06:37:06PM +0800, Adam Li wrote:
> On 11/29/2024 6:18 PM, Peter Zijlstra wrote:
> > On Fri, Nov 29, 2024 at 11:15:41AM +0100, Peter Zijlstra wrote:
> >> On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:
> >>
> >>> Anyway..  I'm sure I started a patch series cleaning up the whole next
> >>> buddy thing months ago (there's more problems here), but I can't seem to
> >>> find it in a hurry :/
> >>
> >> There was this..
> > 
> > And this I think.
> > 
> > Adam, what was the reason you were enabling NEXT_BUDDY in the first
> > place?
> > 
> Hi Peter,
> 
> I am tuning Specjbb critical-jOPS, which is latency sensitive.

There is a lot to latency, sometimes it's best to not preempt. I think
Prateek has found a fair number of workloads where SCHED_BATCH has been
helpful.

> NEXT_BUDDY affects schedule latency so I tried to enable NEXT_BUDDY.
> However Specjbb critical-jOPS drops with NEXT_BUDDY enabled (after my patch fixing panic).

Yes, picking outside of the EEVDF policy can make worse decisions for
latency.

The yield_to_task() can help performance for KVM (the only user AFAIK
-- oh DMA fences seem to also use it these days).

And the CGROUP_BUDDY thing can sometimes help when using cgroups.

But the wakeup thing is very situational -- it's disabled for a reason.
Unfortunately it seems to also have disabled the other users, which
wasn't intended.
	
> I will test your new NEXT_BUDDY patches.

We still need Prateek's fix. That ensures a delayed task will ever end
up being ->next.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-29  9:55     ` Peter Zijlstra
  2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
@ 2024-11-29 17:46       ` K Prateek Nayak
  2024-11-29 17:53         ` K Prateek Nayak
  1 sibling, 1 reply; 24+ messages in thread
From: K Prateek Nayak @ 2024-11-29 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

Hello Peter,

On 11/29/2024 3:25 PM, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:
> 
>> I tried to put a SCHED_WARN_ON() here to track where this comes from and
>> seems like it is usually from attach_task() in the load balancing path
>> pulling a delayed task which is set as the next buddy in
>> check_preempt_wakeup_fair()
>>
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff7cae9274c5..61e74eb5af22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   	bool sleep = flags & DEQUEUE_SLEEP;
>>   	update_curr(cfs_rq);
>> +	clear_buddies(cfs_rq, se);
>>   	if (flags & DEQUEUE_DELAYED) {
>>   		SCHED_WARN_ON(!se->sched_delayed);
>> @@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   	update_stats_dequeue_fair(cfs_rq, se, flags);
>> -	clear_buddies(cfs_rq, se);
>> -
>>   	update_entity_lag(cfs_rq, se);
>>   	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>>   		se->deadline -= se->vruntime;
> 
> So this puts the clear_buddies() before the whole delayed thing, and
> should be sufficient afaict, no?
> 
>> @@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>   	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
>>   		return;
>> -	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
>> +	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
>>   		set_next_buddy(pse);
>>   	}
> 
> But then this should never happen, which is after a wakeup, p and the
> whole hierarchy up should be runnable at this point.
> 
> Or should I go find more wake-up juice and try again :-)

The motivation there was this splat from my testing:

     Kernel panic - not syncing: Encountered delayed entity in check_preempt_wakeup_fair()
     CPU: 190 UID: 1000 PID: 4215 Comm: sched-messaging Tainted: G        W          6.12.0-rc4-test+ #156
     Tainted: [W]=WARN
     Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
     Call Trace:
      <TASK>
      panic+0x399/0x3f0
      check_preempt_wakeup_fair+0x21b/0x220
      wakeup_preempt+0x64/0x70
      sched_balance_rq+0x970/0x12e0
      ? update_load_avg+0x7e/0x7e0
      sched_balance_newidle+0x1e2/0x490
      pick_next_task_fair+0x32/0x3b0
      __pick_next_task+0x3d/0x1a0
      __schedule+0x1da/0x1540
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? aa_file_perm+0x121/0x4d0
      schedule+0x28/0x110
      pipe_read+0x345/0x470
      ? __pfx_autoremove_wake_function+0x10/0x10
      vfs_read+0x2f1/0x330
      ksys_read+0xaf/0xe0
      do_syscall_64+0x6f/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? current_time+0x31/0xf0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? atime_needs_update+0x9c/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? touch_atime+0x1e/0x100
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? pipe_read+0x3ec/0x470
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? vfs_read+0x2f1/0x330
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? ksys_read+0xcc/0xe0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? syscall_exit_to_user_mode+0x51/0x1a0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      ? ksys_read+0xcc/0xe0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? syscall_exit_to_user_mode+0x51/0x1a0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? ksys_read+0xcc/0xe0
      ? do_syscall_64+0x7b/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? syscall_exit_to_user_mode+0x51/0x1a0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x7fa8b851481c
     Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 c1 f7 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 2f c2 f7 ff 48
     RSP: 002b:00007fa8ae0d8cb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
     RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8b851481c
     RDX: 0000000000000064 RSI: 00007fa8ae0d8cf0 RDI: 0000000000000027
     RBP: 00007fa8ae0d8d90 R08: 0000000000000000 R09: 00007ffee0fdf97f
     R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000064
     R13: 00007fa8ae0d8cf0 R14: 00000000000005aa R15: 000055c2e80accb0
      </TASK>
--

newidle balance pulls a delayed entity which goes through the
check_preempt_wakeup_fair() path in attach_task() and is set as the next
buddy. On a second thought this is perhaps not required since even if
this delayed entity is picked, it'll go thorough a full dequeue and the
clear_buddies() change above should take care of it.

> 
> 
> 
> Anyway..  I'm sure I started a patch series cleaning up the whole next
> buddy thing months ago (there's more problems here), but I can't seem to
> find it in a hurry :/
> 
> 
> 

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-29 17:46       ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled K Prateek Nayak
@ 2024-11-29 17:53         ` K Prateek Nayak
  0 siblings, 0 replies; 24+ messages in thread
From: K Prateek Nayak @ 2024-11-29 17:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On 11/29/2024 11:16 PM, K Prateek Nayak wrote:
> [..snip..]
> newidle balance pulls a delayed entity which goes through the
> check_preempt_wakeup_fair() path in attach_task() and is set as the next
> buddy. On a second thought this is perhaps not required since even if
> this delayed entity is picked, it'll go thorough a full dequeue and the
> clear_buddies() change above should take care of it.

That said, it'll still trigger the following warning in
pick_next_entity():

     /* ->next will never be delayed */
     SCHED_WARN_ON(cfs_rq->next->sched_delayed);

which is also why I added that check in check_preempt_wakeup_fair() :)

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-11-28  7:29   ` K Prateek Nayak
  2024-11-29  3:21     ` Adam Li
  2024-11-29  9:55     ` Peter Zijlstra
@ 2024-12-03 16:05     ` Peter Zijlstra
  2024-12-03 16:48       ` K Prateek Nayak
  2024-12-09 11:00     ` [tip: sched/core] sched/fair: Fix NEXT_BUDDY tip-bot2 for K Prateek Nayak
  3 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2024-12-03 16:05 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:

> Can you please try the following diff instead of the first two patches
> and see if you still hit these warnings, stalls, and pick_eevdf()
> returning NULL?
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff7cae9274c5..61e74eb5af22 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	bool sleep = flags & DEQUEUE_SLEEP;
>  	update_curr(cfs_rq);
> +	clear_buddies(cfs_rq, se);
>  	if (flags & DEQUEUE_DELAYED) {
>  		SCHED_WARN_ON(!se->sched_delayed);
> @@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	update_stats_dequeue_fair(cfs_rq, se, flags);
> -	clear_buddies(cfs_rq, se);
> -
>  	update_entity_lag(cfs_rq, se);
>  	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>  		se->deadline -= se->vruntime;
> @@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>  	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
>  		return;
> -	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
> +	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
>  		set_next_buddy(pse);
>  	}


Prateek, I've presumed your SoB on this change:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/urgent&id=d1139307fe97ffefcf90212772f7516732a11034

Holler if you want it modified.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
  2024-12-03 16:05     ` Peter Zijlstra
@ 2024-12-03 16:48       ` K Prateek Nayak
  0 siblings, 0 replies; 24+ messages in thread
From: K Prateek Nayak @ 2024-12-03 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adam Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

Hello Peter,

On 12/3/2024 9:35 PM, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:
> 
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff7cae9274c5..61e74eb5af22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   	bool sleep = flags & DEQUEUE_SLEEP;
>>   	update_curr(cfs_rq);
>> +	clear_buddies(cfs_rq, se);
>>   	if (flags & DEQUEUE_DELAYED) {
>>   		SCHED_WARN_ON(!se->sched_delayed);
>> @@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   	update_stats_dequeue_fair(cfs_rq, se, flags);
>> -	clear_buddies(cfs_rq, se);
>> -
>>   	update_entity_lag(cfs_rq, se);
>>   	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>>   		se->deadline -= se->vruntime;
>> @@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>   	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
>>   		return;
>> -	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
>> +	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
>>   		set_next_buddy(pse);
>>   	}
> 
> 
> Prateek, I've presumed your SoB on this change:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/urgent&id=d1139307fe97ffefcf90212772f7516732a11034

No objections from my side! While at it, perhaps you can also squash in:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ed4af8be76b..eadcd64c03e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5519,8 +5519,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  
  		if (sched_feat(DELAY_DEQUEUE) && delay &&
  		    !entity_eligible(cfs_rq, se)) {
-			if (cfs_rq->next == se)
-				cfs_rq->next = NULL;
  			update_load_avg(cfs_rq, se, 0);
  			set_delayed(se);
  			return false;
--

Since we do a clear_buddy() upfront, we no longer need this special case
for delayed entities. Tested it on top of queue:sched/urgent with
hackbench and I didn't run into any problems / splats. Thank you.

> 
> Holler if you want it modified.
> 
> Thanks!

-- 
Thanks and Regards,
Prateek


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
  2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
  2024-11-29 10:18         ` Peter Zijlstra
@ 2024-12-06  6:47         ` Adam Li
  2024-12-09 11:00         ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 24+ messages in thread
From: Adam Li @ 2024-12-06  6:47 UTC (permalink / raw)
  To: Peter Zijlstra, K Prateek Nayak
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, vineethr

On 11/29/2024 6:15 PM, Peter Zijlstra wrote:
> On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:
> 
>> Anyway..  I'm sure I started a patch series cleaning up the whole next
>> buddy thing months ago (there's more problems here), but I can't seem to
>> find it in a hurry :/
> 
> There was this..
> 
Hi Peter and Prateek,

I tested the two patches on 6.13-rc1 + patch "sched/fair: Fix NEXT_BUDDY"
(https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/urgent&id=d1139307fe97ffefcf90212772f7516732a11034)
 
1) sched/fair: Untangle NEXT_BUDDY and pick_next_task()
2) sched/fair: Add CGROUP_BUDDY feature

With all 2^3=8 combinations: (NO_)NEXT_BUDDY, (NO_)CGROUP_BUDDY, (NO_)PICK_BUDDY,
there is no warning or panic. There is no significant performance difference for
Specjbb workload.

And there is no much performance difference before and after the two patches.

Before the patches, I think the default setting 'NO_NEXT_BUDDY' logically
equals to 'NO_PICK_BUDDY && CGROUP_BUDDY && NO_NEXT_BUDDY'.
After the patches, the default becomes 'PICK_BUDDY && CGROUP_BUDDY && NO_NEXT_BUDDY'.

Thanks,
-adam
> ---
> Subject: sched/fair: Untangle NEXT_BUDDY and pick_next_task()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri Nov 29 10:36:59 CET 2024
> 
> There are 3 sites using set_next_buddy() and only one is conditional
> on NEXT_BUDDY, the other two sites are unconditional; to note:
> 
>   - yield_to_task()
>   - cgroup dequeue / pick optimization
> 
> However, having NEXT_BUDDY control both the wakeup-preemption and the
> picking side of things means its near useless.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/fair.c     |    4 ++--
>  kernel/sched/features.h |    9 +++++++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5613,9 +5613,9 @@ static struct sched_entity *
>  pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
>  {
>  	/*
> -	 * Enabling NEXT_BUDDY will affect latency but not fairness.
> +	 * Picking the ->next buddy will affect latency but not fairness.
>  	 */
> -	if (sched_feat(NEXT_BUDDY) &&
> +	if (sched_feat(PICK_BUDDY) &&
>  	    cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) {
>  		/* ->next will never be delayed */
>  		SCHED_WARN_ON(cfs_rq->next->sched_delayed);
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -32,6 +32,15 @@ SCHED_FEAT(PREEMPT_SHORT, true)
>  SCHED_FEAT(NEXT_BUDDY, false)
>  
>  /*
> + * Allow completely ignoring cfs_rq->next; which can be set from various
> + * places:
> + *   - NEXT_BUDDY (wakeup preemption)
> + *   - yield_to_task()
> + *   - cgroup dequeue / pick
> + */
> +SCHED_FEAT(PICK_BUDDY, true)
> +
> +/*
>   * Consider buddies to be cache hot, decreases the likeliness of a
>   * cache buddy being migrated away, increases cache locality.
>   */
>>
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip: sched/core] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
  2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
  2024-11-29 10:18         ` Peter Zijlstra
  2024-12-06  6:47         ` Adam Li
@ 2024-12-09 11:00         ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2024-12-09 11:00 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     2a77e4be12cb58bbf774e7c717c8bb80e128b7a4
Gitweb:        https://git.kernel.org/tip/2a77e4be12cb58bbf774e7c717c8bb80e128b7a4
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 29 Nov 2024 11:15:41 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 09 Dec 2024 11:48:13 +01:00

sched/fair: Untangle NEXT_BUDDY and pick_next_task()

There are 3 sites using set_next_buddy() and only one is conditional
on NEXT_BUDDY, the other two sites are unconditional; to note:

  - yield_to_task()
  - cgroup dequeue / pick optimization

However, having NEXT_BUDDY control both the wakeup-preemption and the
picking side of things means its near useless.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241129101541.GA33464@noisy.programming.kicks-ass.net
---
 kernel/sched/fair.c     |  4 ++--
 kernel/sched/features.h |  9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b505d3d..2c4ebfc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5630,9 +5630,9 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
 	struct sched_entity *se;
 
 	/*
-	 * Enabling NEXT_BUDDY will affect latency but not fairness.
+	 * Picking the ->next buddy will affect latency but not fairness.
 	 */
-	if (sched_feat(NEXT_BUDDY) &&
+	if (sched_feat(PICK_BUDDY) &&
 	    cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) {
 		/* ->next will never be delayed */
 		SCHED_WARN_ON(cfs_rq->next->sched_delayed);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index a3d331d..3c12d9f 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -32,6 +32,15 @@ SCHED_FEAT(PREEMPT_SHORT, true)
 SCHED_FEAT(NEXT_BUDDY, false)
 
 /*
+ * Allow completely ignoring cfs_rq->next; which can be set from various
+ * places:
+ *   - NEXT_BUDDY (wakeup preemption)
+ *   - yield_to_task()
+ *   - cgroup dequeue / pick
+ */
+SCHED_FEAT(PICK_BUDDY, true)
+
+/*
  * Consider buddies to be cache hot, decreases the likeliness of a
  * cache buddy being migrated away, increases cache locality.
  */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip: sched/core] sched/fair: Fix NEXT_BUDDY
  2024-11-28  7:29   ` K Prateek Nayak
                       ` (2 preceding siblings ...)
  2024-12-03 16:05     ` Peter Zijlstra
@ 2024-12-09 11:00     ` tip-bot2 for K Prateek Nayak
  3 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for K Prateek Nayak @ 2024-12-09 11:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Adam Li, K Prateek Nayak, Peter Zijlstra (Intel), x86,
	linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     493afbd187c4c9cc1642792c0d9ba400c3d6d90d
Gitweb:        https://git.kernel.org/tip/493afbd187c4c9cc1642792c0d9ba400c3d6d90d
Author:        K Prateek Nayak <kprateek.nayak@amd.com>
AuthorDate:    Thu, 28 Nov 2024 12:59:54 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 09 Dec 2024 11:48:09 +01:00

sched/fair: Fix NEXT_BUDDY

Adam reports that enabling NEXT_BUDDY insta triggers a WARN in
pick_next_entity().

Moving clear_buddies() up before the delayed dequeue bits ensures
no ->next buddy becomes delayed. Further ensure no new ->next buddy
ever starts as delayed.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Adam Li <adamli@os.amperecomputing.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Adam Li <adamli@os.amperecomputing.com>
Link: https://lkml.kernel.org/r/670a0d54-e398-4b1f-8a6e-90784e2fdf89@amd.com
---
 kernel/sched/fair.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 05b8f1e..9d7a2dd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	bool sleep = flags & DEQUEUE_SLEEP;
 
 	update_curr(cfs_rq);
+	clear_buddies(cfs_rq, se);
 
 	if (flags & DEQUEUE_DELAYED) {
 		SCHED_WARN_ON(!se->sched_delayed);
@@ -5494,8 +5495,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 		if (sched_feat(DELAY_DEQUEUE) && delay &&
 		    !entity_eligible(cfs_rq, se)) {
-			if (cfs_rq->next == se)
-				cfs_rq->next = NULL;
 			update_load_avg(cfs_rq, se, 0);
 			se->sched_delayed = 1;
 			return false;
@@ -5520,8 +5519,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 	update_stats_dequeue_fair(cfs_rq, se, flags);
 
-	clear_buddies(cfs_rq, se);
-
 	update_entity_lag(cfs_rq, se);
 	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
 		se->deadline -= se->vruntime;
@@ -8774,7 +8771,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int 
 	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
 		return;
 
-	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
+	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
 		set_next_buddy(pse);
 	}
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy
  2024-11-27  5:56 ` [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy Adam Li
@ 2025-03-13  8:30   ` Madadi Vineeth Reddy
  2025-03-14  2:53     ` Adam Li
  0 siblings, 1 reply; 24+ messages in thread
From: Madadi Vineeth Reddy @ 2025-03-13  8:30 UTC (permalink / raw)
  To: Adam Li
  Cc: peterz, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle, Madadi Vineeth Reddy

Hi Adam,

On 27/11/24 11:26, Adam Li wrote:
> Commit 5e963f2bd465 ("sched/fair: Commit to EEVDF") removed the "last"
> and "skip" buddy. Update comments in pick_next_entity().
> 
> Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
> Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
>  kernel/sched/fair.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d5a3b5589e4e..259c56dcdff6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5602,17 +5602,11 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  
>  static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags);
>  
> -/*
> - * Pick the next process, keeping these things in mind, in this order:
> - * 1) keep things fair between processes/task groups
> - * 2) pick the "next" process, since someone really wants that to run
> - * 3) pick the "last" process, for cache locality
> - * 4) do not run the "skip" process, if something else is available
> - */
>  static struct sched_entity *
>  pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
>  {
>  	/*
> +	 * Pick the "next" buddy, since someone really wants that to run.
>  	 * Enabling NEXT_BUDDY will affect latency but not fairness.
>  	 */
>  	if (sched_feat(NEXT_BUDDY) &&

There is one more reference to LAST_BUDDY in check_preempt_wakeup_fair.

Regarding pick_next_entity, the first two points are still valid, so only
points 3 and 4 could be removed?

Something like below

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9dafb374d76d..379dbcbb24e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5578,8 +5578,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags);
  * Pick the next process, keeping these things in mind, in this order:
  * 1) keep things fair between processes/task groups
  * 2) pick the "next" process, since someone really wants that to run
- * 3) pick the "last" process, for cache locality
- * 4) do not run the "skip" process, if something else is available
  */
 static struct sched_entity *
 pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
@@ -8780,9 +8778,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
         *
         * Note: this also catches the edge-case of curr being in a throttled
         * group (e.g. via set_curr_task), since update_curr() (in the
-        * enqueue of curr) will have resulted in resched being set.  This
-        * prevents us from potentially nominating it as a false LAST_BUDDY
-        * below.
+        * enqueue of curr) will have resulted in resched being set.
         */
        if (test_tsk_need_resched(rq->curr))
                return;

Thanks,
Madadi Vineeth Reddy

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy
  2025-03-13  8:30   ` Madadi Vineeth Reddy
@ 2025-03-14  2:53     ` Adam Li
  0 siblings, 0 replies; 24+ messages in thread
From: Adam Li @ 2025-03-14  2:53 UTC (permalink / raw)
  To: Madadi Vineeth Reddy
  Cc: peterz, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, linux-kernel, patches, cl,
	christian.loehle

Reviewed-by: Adam Li <adamli@os.amperecomputing.com>

Thanks,
-adam
On 3/13/2025 4:30 PM, Madadi Vineeth Reddy wrote:
> Hi Adam,
> 
> On 27/11/24 11:26, Adam Li wrote:
>> Commit 5e963f2bd465 ("sched/fair: Commit to EEVDF") removed the "last"
>> and "skip" buddy. Update comments in pick_next_entity().
>>
>> Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
>> Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
>> ---
>>  kernel/sched/fair.c | 8 +-------
>>  1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index d5a3b5589e4e..259c56dcdff6 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5602,17 +5602,11 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>  
>>  static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags);
>>  
>> -/*
>> - * Pick the next process, keeping these things in mind, in this order:
>> - * 1) keep things fair between processes/task groups
>> - * 2) pick the "next" process, since someone really wants that to run
>> - * 3) pick the "last" process, for cache locality
>> - * 4) do not run the "skip" process, if something else is available
>> - */
>>  static struct sched_entity *
>>  pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
>>  {
>>  	/*
>> +	 * Pick the "next" buddy, since someone really wants that to run.
>>  	 * Enabling NEXT_BUDDY will affect latency but not fairness.
>>  	 */
>>  	if (sched_feat(NEXT_BUDDY) &&
> 
> There is one more reference to LAST_BUDDY in check_preempt_wakeup_fair.
> 
> Regarding pick_next_entity, the first two points are still valid, so only
> points 3 and 4 could be removed?
> 
> Something like below
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9dafb374d76d..379dbcbb24e9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5578,8 +5578,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags);
>   * Pick the next process, keeping these things in mind, in this order:
>   * 1) keep things fair between processes/task groups
>   * 2) pick the "next" process, since someone really wants that to run
> - * 3) pick the "last" process, for cache locality
> - * 4) do not run the "skip" process, if something else is available
>   */
>  static struct sched_entity *
>  pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
> @@ -8780,9 +8778,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>          *
>          * Note: this also catches the edge-case of curr being in a throttled
>          * group (e.g. via set_curr_task), since update_curr() (in the
> -        * enqueue of curr) will have resulted in resched being set.  This
> -        * prevents us from potentially nominating it as a false LAST_BUDDY
> -        * below.
> +        * enqueue of curr) will have resulted in resched being set.
>          */
>         if (test_tsk_need_resched(rq->curr))
>                 return;
> 
> Thanks,
> Madadi Vineeth Reddy


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-03-14  2:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27  5:56 [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning Adam Li
2024-11-27  5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
2024-11-28  7:29   ` K Prateek Nayak
2024-11-29  3:21     ` Adam Li
2024-11-29  4:28       ` K Prateek Nayak
2024-11-29  7:40         ` Adam Li
2024-11-29  8:00           ` K Prateek Nayak
2024-11-29  9:55     ` Peter Zijlstra
2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
2024-11-29 10:18         ` Peter Zijlstra
2024-11-29 10:37           ` Adam Li
2024-11-29 11:45             ` Peter Zijlstra
2024-12-06  6:47         ` Adam Li
2024-12-09 11:00         ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-11-29 17:46       ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled K Prateek Nayak
2024-11-29 17:53         ` K Prateek Nayak
2024-12-03 16:05     ` Peter Zijlstra
2024-12-03 16:48       ` K Prateek Nayak
2024-12-09 11:00     ` [tip: sched/core] sched/fair: Fix NEXT_BUDDY tip-bot2 for K Prateek Nayak
2024-11-27  5:56 ` [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL Adam Li
2024-11-29  9:18   ` Peter Zijlstra
2024-11-27  5:56 ` [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy Adam Li
2025-03-13  8:30   ` Madadi Vineeth Reddy
2025-03-14  2:53     ` Adam Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox