* [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
@ 2025-08-27 2:22 xupengbo
2025-11-28 11:54 ` Aaron Lu
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: xupengbo @ 2025-08-27 2:22 UTC (permalink / raw)
To: ziqianlu, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Aaron Lu, David Vernet,
linux-kernel
Cc: xupengbo, cgroups, xupengbo1029
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows.
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: xupengbo <xupengbo@oppo.com>
---
Changes:
v1 -> v2:
- Another option to fix the bug. Check cfs_rq->tg_load_avg_contrib in
cfs_rq_is_decayed() to avoid early removal from the leaf_cfs_rq_list.
- Link to v1 : https://lore.kernel.org/cgroups/20250804130326.57523-1-xupengbo@oppo.com/
v2 -> v3:
- Check if cfs_rq->tg_load_avg_contrib is 0 derectly.
- Link to v2 : https://lore.kernel.org/cgroups/20250805144121.14871-1-xupengbo@oppo.com/
v3 -> v4:
- Fix typo
- Link to v3 : https://lore.kernel.org/cgroups/20250826075743.19106-1-xupengbo@oppo.com/
v4 -> v5:
- Amend the commit message
- Link to v4 : https://lore.kernel.org/cgroups/20250826084854.25956-1-xupengbo@oppo.com/
After some preliminary discussion and analysis, I think it is feasible to
directly check if cfs_rq->tg_load_avg_contrib is 0 in cfs_rq_is_decay().
So patch v3 was submitted.
Please send emails to a different email address <xupengbo1029@163.com>
after September 3, 2025, after that date <xupengbo@oppo.com> will expire
for personal reasons.
Thanks,
Xu Pengbo
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..81b7df87f1ce 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4062,6 +4062,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
base-commit: fab1beda7597fac1cecc01707d55eadb6bbe773c
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
2025-08-27 2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
@ 2025-11-28 11:54 ` Aaron Lu
2025-11-28 13:40 ` Peter Zijlstra
2025-12-03 18:25 ` [tip: sched/urgent] " tip-bot2 for xupengbo
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Aaron Lu @ 2025-11-28 11:54 UTC (permalink / raw)
To: xupengbo, Ingo Molnar, Peter Zijlstra
Cc: Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Valentin Schneider, David Vernet,
linux-kernel, cgroups
Hello,
On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> When a task is migrated out, there is a probability that the tg->load_avg
> value will become abnormal. The reason is as follows.
>
> 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> is a possibility that the reduced load_avg is not updated to tg->load_avg
> when a task migrates out.
> 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> function cfs_rq_is_decayed() does not check whether
> cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> updated to tg->load_avg.
>
> Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> which fixes the case (2.) mentioned above.
>
> Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: xupengbo <xupengbo@oppo.com>
I wonder if there are any more concerns about this patch? If no, I hope
this fix can be merged. It's a rare case but it does happen for some
specific setup.
Sorry if this is a bad timing, but I just hit an oncall where this exact
problem occurred so I suppose it's worth a ping :)
Best regards,
Aaron
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
2025-11-28 11:54 ` Aaron Lu
@ 2025-11-28 13:40 ` Peter Zijlstra
2025-11-28 14:15 ` Aaron Lu
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2025-11-28 13:40 UTC (permalink / raw)
To: Aaron Lu
Cc: xupengbo, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, David Vernet, linux-kernel, cgroups
On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> Hello,
>
> On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > When a task is migrated out, there is a probability that the tg->load_avg
> > value will become abnormal. The reason is as follows.
> >
> > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > when a task migrates out.
> > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > function cfs_rq_is_decayed() does not check whether
> > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > updated to tg->load_avg.
> >
> > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > which fixes the case (2.) mentioned above.
> >
> > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> > Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: xupengbo <xupengbo@oppo.com>
>
> I wonder if there are any more concerns about this patch? If no, I hope
> this fix can be merged. It's a rare case but it does happen for some
> specific setup.
>
> Sorry if this is a bad timing, but I just hit an oncall where this exact
> problem occurred so I suppose it's worth a ping :)
Totally missed it. Seems okay, let me go queue the thing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
2025-11-28 13:40 ` Peter Zijlstra
@ 2025-11-28 14:15 ` Aaron Lu
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Lu @ 2025-11-28 14:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: xupengbo, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, David Vernet, linux-kernel, cgroups
On Fri, Nov 28, 2025 at 02:40:17PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> > Hello,
> >
> > On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > > When a task is migrated out, there is a probability that the tg->load_avg
> > > value will become abnormal. The reason is as follows.
> > >
> > > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > > when a task migrates out.
> > > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > > function cfs_rq_is_decayed() does not check whether
> > > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > > updated to tg->load_avg.
> > >
> > > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > > which fixes the case (2.) mentioned above.
> > >
> > > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > > Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> > > Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> > > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > Signed-off-by: xupengbo <xupengbo@oppo.com>
> >
> > I wonder if there are any more concerns about this patch? If no, I hope
> > this fix can be merged. It's a rare case but it does happen for some
> > specific setup.
> >
> > Sorry if this is a bad timing, but I just hit an oncall where this exact
> > problem occurred so I suppose it's worth a ping :)
>
> Totally missed it. Seems okay, let me go queue the thing.
Thanks Peter!
^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip: sched/urgent] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
2025-08-27 2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
2025-11-28 11:54 ` Aaron Lu
@ 2025-12-03 18:25 ` tip-bot2 for xupengbo
2025-12-03 18:31 ` tip-bot2 for xupengbo
2025-12-06 9:10 ` tip-bot2 for xupengbo
3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for xupengbo @ 2025-12-03 18:25 UTC (permalink / raw)
To: linux-tip-commits
Cc: xupengbo, Peter Zijlstra (Intel), Ingo Molnar, Aaron Lu,
Vincent Guittot, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 36c26a1f1f510b23ab81db176c90921305fae669
Gitweb: https://git.kernel.org/tip/36c26a1f1f510b23ab81db176c90921305fae669
Author: xupengbo <xupengbo@oppo.com>
AuthorDate: Wed, 27 Aug 2025 10:22:07 +08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 02 Dec 2025 15:25:00 +01:00
sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7..da46c31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip: sched/urgent] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
2025-08-27 2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
2025-11-28 11:54 ` Aaron Lu
2025-12-03 18:25 ` [tip: sched/urgent] " tip-bot2 for xupengbo
@ 2025-12-03 18:31 ` tip-bot2 for xupengbo
2025-12-06 9:10 ` tip-bot2 for xupengbo
3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for xupengbo @ 2025-12-03 18:31 UTC (permalink / raw)
To: linux-tip-commits
Cc: xupengbo, Peter Zijlstra (Intel), Ingo Molnar, Aaron Lu,
Vincent Guittot, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 3dc7ae575aa1a32971565d9aaf784e6050dae959
Gitweb: https://git.kernel.org/tip/3dc7ae575aa1a32971565d9aaf784e6050dae959
Author: xupengbo <xupengbo@oppo.com>
AuthorDate: Wed, 27 Aug 2025 10:22:07 +08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 03 Dec 2025 19:26:22 +01:00
sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00a32c9..a31d88e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip: sched/urgent] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
2025-08-27 2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
` (2 preceding siblings ...)
2025-12-03 18:31 ` tip-bot2 for xupengbo
@ 2025-12-06 9:10 ` tip-bot2 for xupengbo
3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for xupengbo @ 2025-12-06 9:10 UTC (permalink / raw)
To: linux-tip-commits
Cc: xupengbo, Peter Zijlstra (Intel), Ingo Molnar, Aaron Lu,
Vincent Guittot, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: ca125231dd29fc0678dd3622e9cdea80a51dffe4
Gitweb: https://git.kernel.org/tip/ca125231dd29fc0678dd3622e9cdea80a51dffe4
Author: xupengbo <xupengbo@oppo.com>
AuthorDate: Wed, 27 Aug 2025 10:22:07 +08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Dec 2025 10:03:13 +01:00
sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7..da46c31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-06 9:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 2:22 [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
2025-11-28 11:54 ` Aaron Lu
2025-11-28 13:40 ` Peter Zijlstra
2025-11-28 14:15 ` Aaron Lu
2025-12-03 18:25 ` [tip: sched/urgent] " tip-bot2 for xupengbo
2025-12-03 18:31 ` tip-bot2 for xupengbo
2025-12-06 9:10 ` tip-bot2 for xupengbo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox