public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
@ 2025-08-27  2:22 xupengbo
  2025-11-28 11:54 ` Aaron Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: xupengbo @ 2025-08-27  2:22 UTC (permalink / raw)
  To: ziqianlu, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Aaron Lu, David Vernet,
	linux-kernel
  Cc: xupengbo, cgroups, xupengbo1029

When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows.

1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.

Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.

Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: xupengbo <xupengbo@oppo.com>
---
Changes:
v1 -> v2: 
- Another option to fix the bug. Check cfs_rq->tg_load_avg_contrib in 
cfs_rq_is_decayed() to avoid early removal from the leaf_cfs_rq_list.
- Link to v1 : https://lore.kernel.org/cgroups/20250804130326.57523-1-xupengbo@oppo.com/
v2 -> v3:
- Check if cfs_rq->tg_load_avg_contrib is 0 derectly.
- Link to v2 : https://lore.kernel.org/cgroups/20250805144121.14871-1-xupengbo@oppo.com/
v3 -> v4:
- Fix typo
- Link to v3 : https://lore.kernel.org/cgroups/20250826075743.19106-1-xupengbo@oppo.com/
v4 -> v5:
- Amend the commit message
- Link to v4 : https://lore.kernel.org/cgroups/20250826084854.25956-1-xupengbo@oppo.com/

After some preliminary discussion and analysis, I think it is feasible to
directly check if cfs_rq->tg_load_avg_contrib is 0 in cfs_rq_is_decay().
So patch v3 was submitted.

Please send emails to a different email address <xupengbo1029@163.com>
after September 3, 2025, after that date <xupengbo@oppo.com> will expire
for personal reasons.

Thanks,
Xu Pengbo
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..81b7df87f1ce 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4062,6 +4062,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 	if (child_cfs_rq_on_list(cfs_rq))
 		return false;
 
+	if (cfs_rq->tg_load_avg_contrib)
+		return false;
+
 	return true;
 }
 

base-commit: fab1beda7597fac1cecc01707d55eadb6bbe773c
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
  2025-08-27  2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
@ 2025-11-28 11:54 ` Aaron Lu
  2025-11-28 13:40   ` Peter Zijlstra
  2025-12-03 18:25 ` [tip: sched/urgent] " tip-bot2 for xupengbo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Aaron Lu @ 2025-11-28 11:54 UTC (permalink / raw)
  To: xupengbo, Ingo Molnar, Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, David Vernet,
	linux-kernel, cgroups

Hello,

On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> When a task is migrated out, there is a probability that the tg->load_avg
> value will become abnormal. The reason is as follows.
> 
> 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> is a possibility that the reduced load_avg is not updated to tg->load_avg
> when a task migrates out.
> 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> function cfs_rq_is_decayed() does not check whether
> cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> updated to tg->load_avg.
> 
> Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> which fixes the case (2.) mentioned above.
> 
> Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: xupengbo <xupengbo@oppo.com>

I wonder if there are any more concerns about this patch? If no, I hope
this fix can be merged. It's a rare case but it does happen for some
specific setup.

Sorry if this is a bad timing, but I just hit an oncall where this exact
problem occurred so I suppose it's worth a ping :)

Best regards,
Aaron

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
  2025-11-28 11:54 ` Aaron Lu
@ 2025-11-28 13:40   ` Peter Zijlstra
  2025-11-28 14:15     ` Aaron Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2025-11-28 13:40 UTC (permalink / raw)
  To: Aaron Lu
  Cc: xupengbo, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, David Vernet, linux-kernel, cgroups

On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> Hello,
> 
> On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > When a task is migrated out, there is a probability that the tg->load_avg
> > value will become abnormal. The reason is as follows.
> > 
> > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > when a task migrates out.
> > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > function cfs_rq_is_decayed() does not check whether
> > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > updated to tg->load_avg.
> > 
> > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > which fixes the case (2.) mentioned above.
> > 
> > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> > Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: xupengbo <xupengbo@oppo.com>
> 
> I wonder if there are any more concerns about this patch? If no, I hope
> this fix can be merged. It's a rare case but it does happen for some
> specific setup.
> 
> Sorry if this is a bad timing, but I just hit an oncall where this exact
> problem occurred so I suppose it's worth a ping :)

Totally missed it. Seems okay, let me go queue the thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
  2025-11-28 13:40   ` Peter Zijlstra
@ 2025-11-28 14:15     ` Aaron Lu
  0 siblings, 0 replies; 7+ messages in thread
From: Aaron Lu @ 2025-11-28 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: xupengbo, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, David Vernet, linux-kernel, cgroups

On Fri, Nov 28, 2025 at 02:40:17PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> > Hello,
> > 
> > On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > > When a task is migrated out, there is a probability that the tg->load_avg
> > > value will become abnormal. The reason is as follows.
> > > 
> > > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > > when a task migrates out.
> > > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > > function cfs_rq_is_decayed() does not check whether
> > > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > > updated to tg->load_avg.
> > > 
> > > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > > which fixes the case (2.) mentioned above.
> > > 
> > > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > > Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> > > Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> > > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > Signed-off-by: xupengbo <xupengbo@oppo.com>
> > 
> > I wonder if there are any more concerns about this patch? If no, I hope
> > this fix can be merged. It's a rare case but it does happen for some
> > specific setup.
> > 
> > Sorry if this is a bad timing, but I just hit an oncall where this exact
> > problem occurred so I suppose it's worth a ping :)
> 
> Totally missed it. Seems okay, let me go queue the thing.

Thanks Peter!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: sched/urgent] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
  2025-08-27  2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
  2025-11-28 11:54 ` Aaron Lu
@ 2025-12-03 18:25 ` tip-bot2 for xupengbo
  2025-12-03 18:31 ` tip-bot2 for xupengbo
  2025-12-06  9:10 ` tip-bot2 for xupengbo
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for xupengbo @ 2025-12-03 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: xupengbo, Peter Zijlstra (Intel), Ingo Molnar, Aaron Lu,
	Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     36c26a1f1f510b23ab81db176c90921305fae669
Gitweb:        https://git.kernel.org/tip/36c26a1f1f510b23ab81db176c90921305fae669
Author:        xupengbo <xupengbo@oppo.com>
AuthorDate:    Wed, 27 Aug 2025 10:22:07 +08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 02 Dec 2025 15:25:00 +01:00

sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out

When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:

1. Due to the 1ms update period limitation in update_tg_load_avg(), there
   is a possibility that the reduced load_avg is not updated to tg->load_avg
   when a task migrates out.

2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
   calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
   function cfs_rq_is_decayed() does not check whether
   cfs->tg_load_avg_contrib is null. Consequently, in some cases,
   __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
   updated to tg->load_avg.

Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.

Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7..da46c31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 	if (child_cfs_rq_on_list(cfs_rq))
 		return false;
 
+	if (cfs_rq->tg_load_avg_contrib)
+		return false;
+
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip: sched/urgent] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
  2025-08-27  2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
  2025-11-28 11:54 ` Aaron Lu
  2025-12-03 18:25 ` [tip: sched/urgent] " tip-bot2 for xupengbo
@ 2025-12-03 18:31 ` tip-bot2 for xupengbo
  2025-12-06  9:10 ` tip-bot2 for xupengbo
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for xupengbo @ 2025-12-03 18:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: xupengbo, Peter Zijlstra (Intel), Ingo Molnar, Aaron Lu,
	Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     3dc7ae575aa1a32971565d9aaf784e6050dae959
Gitweb:        https://git.kernel.org/tip/3dc7ae575aa1a32971565d9aaf784e6050dae959
Author:        xupengbo <xupengbo@oppo.com>
AuthorDate:    Wed, 27 Aug 2025 10:22:07 +08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 03 Dec 2025 19:26:22 +01:00

sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out

When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:

1. Due to the 1ms update period limitation in update_tg_load_avg(), there
   is a possibility that the reduced load_avg is not updated to tg->load_avg
   when a task migrates out.

2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
   calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
   function cfs_rq_is_decayed() does not check whether
   cfs->tg_load_avg_contrib is null. Consequently, in some cases,
   __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
   updated to tg->load_avg.

Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.

Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00a32c9..a31d88e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 	if (child_cfs_rq_on_list(cfs_rq))
 		return false;
 
+	if (cfs_rq->tg_load_avg_contrib)
+		return false;
+
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip: sched/urgent] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
  2025-08-27  2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
                   ` (2 preceding siblings ...)
  2025-12-03 18:31 ` tip-bot2 for xupengbo
@ 2025-12-06  9:10 ` tip-bot2 for xupengbo
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for xupengbo @ 2025-12-06  9:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: xupengbo, Peter Zijlstra (Intel), Ingo Molnar, Aaron Lu,
	Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     ca125231dd29fc0678dd3622e9cdea80a51dffe4
Gitweb:        https://git.kernel.org/tip/ca125231dd29fc0678dd3622e9cdea80a51dffe4
Author:        xupengbo <xupengbo@oppo.com>
AuthorDate:    Wed, 27 Aug 2025 10:22:07 +08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Dec 2025 10:03:13 +01:00

sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out

When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:

1. Due to the 1ms update period limitation in update_tg_load_avg(), there
   is a possibility that the reduced load_avg is not updated to tg->load_avg
   when a task migrates out.

2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
   calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
   function cfs_rq_is_decayed() does not check whether
   cfs->tg_load_avg_contrib is null. Consequently, in some cases,
   __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
   updated to tg->load_avg.

Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.

Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7..da46c31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 	if (child_cfs_rq_on_list(cfs_rq))
 		return false;
 
+	if (cfs_rq->tg_load_avg_contrib)
+		return false;
+
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-06  9:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27  2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
2025-11-28 11:54 ` Aaron Lu
2025-11-28 13:40   ` Peter Zijlstra
2025-11-28 14:15     ` Aaron Lu
2025-12-03 18:25 ` [tip: sched/urgent] " tip-bot2 for xupengbo
2025-12-03 18:31 ` tip-bot2 for xupengbo
2025-12-06  9:10 ` tip-bot2 for xupengbo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox