* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
[not found] <20250715071658.267-4-ziqianlu@bytedance.com>
@ 2025-07-15 23:29 ` kernel test robot
2025-07-16 6:57 ` Aaron Lu
0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2025-07-15 23:29 UTC (permalink / raw)
To: Aaron Lu, Valentin Schneider, Ben Segall, K Prateek Nayak,
Peter Zijlstra, Chengming Zhou, Josh Don, Ingo Molnar,
Vincent Guittot, Xi Wang
Cc: llvm, oe-kbuild-all, linux-kernel, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Mel Gorman, Chuyi Zhou, Jan Kiszka,
Florian Bezdeka, Songtang Liu
Hi Aaron,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on next-20250715]
[cannot apply to linus/master v6.16-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
base: tip/sched/core
patch link: https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
5456 | cfs_rq->pelt_clock_throttled = 1;
| ^ ~
kernel/sched/fair.c:5971:32: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
5971 | cfs_rq->pelt_clock_throttled = 1;
| ^ ~
kernel/sched/fair.c:6014:20: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
6014 | cfs_rq->throttled = 1;
| ^ ~
3 warnings generated.
vim +/int +5456 kernel/sched/fair.c
5372
5373 static bool
5374 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
5375 {
5376 bool sleep = flags & DEQUEUE_SLEEP;
5377 int action = UPDATE_TG;
5378
5379 update_curr(cfs_rq);
5380 clear_buddies(cfs_rq, se);
5381
5382 if (flags & DEQUEUE_DELAYED) {
5383 WARN_ON_ONCE(!se->sched_delayed);
5384 } else {
5385 bool delay = sleep;
5386 /*
5387 * DELAY_DEQUEUE relies on spurious wakeups, special task
5388 * states must not suffer spurious wakeups, excempt them.
5389 */
5390 if (flags & DEQUEUE_SPECIAL)
5391 delay = false;
5392
5393 WARN_ON_ONCE(delay && se->sched_delayed);
5394
5395 if (sched_feat(DELAY_DEQUEUE) && delay &&
5396 !entity_eligible(cfs_rq, se)) {
5397 update_load_avg(cfs_rq, se, 0);
5398 set_delayed(se);
5399 return false;
5400 }
5401 }
5402
5403 if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
5404 action |= DO_DETACH;
5405
5406 /*
5407 * When dequeuing a sched_entity, we must:
5408 * - Update loads to have both entity and cfs_rq synced with now.
5409 * - For group_entity, update its runnable_weight to reflect the new
5410 * h_nr_runnable of its group cfs_rq.
5411 * - Subtract its previous weight from cfs_rq->load.weight.
5412 * - For group entity, update its weight to reflect the new share
5413 * of its group cfs_rq.
5414 */
5415 update_load_avg(cfs_rq, se, action);
5416 se_update_runnable(se);
5417
5418 update_stats_dequeue_fair(cfs_rq, se, flags);
5419
5420 update_entity_lag(cfs_rq, se);
5421 if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
5422 se->deadline -= se->vruntime;
5423 se->rel_deadline = 1;
5424 }
5425
5426 if (se != cfs_rq->curr)
5427 __dequeue_entity(cfs_rq, se);
5428 se->on_rq = 0;
5429 account_entity_dequeue(cfs_rq, se);
5430
5431 /* return excess runtime on last dequeue */
5432 return_cfs_rq_runtime(cfs_rq);
5433
5434 update_cfs_group(se);
5435
5436 /*
5437 * Now advance min_vruntime if @se was the entity holding it back,
5438 * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
5439 * put back on, and if we advance min_vruntime, we'll be placed back
5440 * further than we started -- i.e. we'll be penalized.
5441 */
5442 if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
5443 update_min_vruntime(cfs_rq);
5444
5445 if (flags & DEQUEUE_DELAYED)
5446 finish_delayed_dequeue_entity(se);
5447
5448 if (cfs_rq->nr_queued == 0) {
5449 update_idle_cfs_rq_clock_pelt(cfs_rq);
5450 #ifdef CONFIG_CFS_BANDWIDTH
5451 if (throttled_hierarchy(cfs_rq)) {
5452 struct rq *rq = rq_of(cfs_rq);
5453
5454 list_del_leaf_cfs_rq(cfs_rq);
5455 cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> 5456 cfs_rq->pelt_clock_throttled = 1;
5457 }
5458 #endif
5459 }
5460
5461 return true;
5462 }
5463
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
2025-07-15 23:29 ` [PATCH v3 3/5] sched/fair: Switch to task based throttle model kernel test robot
@ 2025-07-16 6:57 ` Aaron Lu
2025-07-16 7:40 ` Philip Li
2025-07-16 11:27 ` [PATCH v3 " Peter Zijlstra
0 siblings, 2 replies; 5+ messages in thread
From: Aaron Lu @ 2025-07-16 6:57 UTC (permalink / raw)
To: kernel test robot
Cc: Valentin Schneider, Ben Segall, K Prateek Nayak, Peter Zijlstra,
Chengming Zhou, Josh Don, Ingo Molnar, Vincent Guittot, Xi Wang,
llvm, oe-kbuild-all, linux-kernel, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Mel Gorman, Chuyi Zhou, Jan Kiszka,
Florian Bezdeka, Songtang Liu
On Wed, Jul 16, 2025 at 07:29:37AM +0800, kernel test robot wrote:
> Hi Aaron,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on next-20250715]
> [cannot apply to linus/master v6.16-rc6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
> base: tip/sched/core
> patch link: https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
> patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
> config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> >> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> 5456 | cfs_rq->pelt_clock_throttled = 1;
> | ^ ~
Thanks for the report.
I don't think this will affect correctness since both cfs_rq's throttled
and pelt_clock_throttled fields are used as true(not 0) or false(0). I
used bitfield for them to save some space.
Change their types to either unsigned int or bool should cure this
warning, I suppose bool looks more clear?
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dbe52e18b93a0..434f816a56701 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -737,8 +737,8 @@ struct cfs_rq {
u64 throttled_clock_pelt_time;
u64 throttled_clock_self;
u64 throttled_clock_self_time;
- int throttled:1;
- int pelt_clock_throttled:1;
+ bool throttled:1;
+ bool pelt_clock_throttled:1;
int throttle_count;
struct list_head throttled_list;
struct list_head throttled_csd_list;
Hi LKP,
I tried using clang-19 but couldn't reproduce this warning and I don't
have clang-20 at hand. Can you please apply the above hunk on top of
this series and see if that warning is gone? Thanks.
Best regards,
Aaron
> kernel/sched/fair.c:5971:32: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> 5971 | cfs_rq->pelt_clock_throttled = 1;
> | ^ ~
> kernel/sched/fair.c:6014:20: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> 6014 | cfs_rq->throttled = 1;
> | ^ ~
> 3 warnings generated.
>
>
> vim +/int +5456 kernel/sched/fair.c
>
> 5372
> 5373 static bool
> 5374 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> 5375 {
> 5376 bool sleep = flags & DEQUEUE_SLEEP;
> 5377 int action = UPDATE_TG;
> 5378
> 5379 update_curr(cfs_rq);
> 5380 clear_buddies(cfs_rq, se);
> 5381
> 5382 if (flags & DEQUEUE_DELAYED) {
> 5383 WARN_ON_ONCE(!se->sched_delayed);
> 5384 } else {
> 5385 bool delay = sleep;
> 5386 /*
> 5387 * DELAY_DEQUEUE relies on spurious wakeups, special task
> 5388 * states must not suffer spurious wakeups, excempt them.
> 5389 */
> 5390 if (flags & DEQUEUE_SPECIAL)
> 5391 delay = false;
> 5392
> 5393 WARN_ON_ONCE(delay && se->sched_delayed);
> 5394
> 5395 if (sched_feat(DELAY_DEQUEUE) && delay &&
> 5396 !entity_eligible(cfs_rq, se)) {
> 5397 update_load_avg(cfs_rq, se, 0);
> 5398 set_delayed(se);
> 5399 return false;
> 5400 }
> 5401 }
> 5402
> 5403 if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
> 5404 action |= DO_DETACH;
> 5405
> 5406 /*
> 5407 * When dequeuing a sched_entity, we must:
> 5408 * - Update loads to have both entity and cfs_rq synced with now.
> 5409 * - For group_entity, update its runnable_weight to reflect the new
> 5410 * h_nr_runnable of its group cfs_rq.
> 5411 * - Subtract its previous weight from cfs_rq->load.weight.
> 5412 * - For group entity, update its weight to reflect the new share
> 5413 * of its group cfs_rq.
> 5414 */
> 5415 update_load_avg(cfs_rq, se, action);
> 5416 se_update_runnable(se);
> 5417
> 5418 update_stats_dequeue_fair(cfs_rq, se, flags);
> 5419
> 5420 update_entity_lag(cfs_rq, se);
> 5421 if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
> 5422 se->deadline -= se->vruntime;
> 5423 se->rel_deadline = 1;
> 5424 }
> 5425
> 5426 if (se != cfs_rq->curr)
> 5427 __dequeue_entity(cfs_rq, se);
> 5428 se->on_rq = 0;
> 5429 account_entity_dequeue(cfs_rq, se);
> 5430
> 5431 /* return excess runtime on last dequeue */
> 5432 return_cfs_rq_runtime(cfs_rq);
> 5433
> 5434 update_cfs_group(se);
> 5435
> 5436 /*
> 5437 * Now advance min_vruntime if @se was the entity holding it back,
> 5438 * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
> 5439 * put back on, and if we advance min_vruntime, we'll be placed back
> 5440 * further than we started -- i.e. we'll be penalized.
> 5441 */
> 5442 if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
> 5443 update_min_vruntime(cfs_rq);
> 5444
> 5445 if (flags & DEQUEUE_DELAYED)
> 5446 finish_delayed_dequeue_entity(se);
> 5447
> 5448 if (cfs_rq->nr_queued == 0) {
> 5449 update_idle_cfs_rq_clock_pelt(cfs_rq);
> 5450 #ifdef CONFIG_CFS_BANDWIDTH
> 5451 if (throttled_hierarchy(cfs_rq)) {
> 5452 struct rq *rq = rq_of(cfs_rq);
> 5453
> 5454 list_del_leaf_cfs_rq(cfs_rq);
> 5455 cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> > 5456 cfs_rq->pelt_clock_throttled = 1;
> 5457 }
> 5458 #endif
> 5459 }
> 5460
> 5461 return true;
> 5462 }
> 5463
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
2025-07-16 6:57 ` Aaron Lu
@ 2025-07-16 7:40 ` Philip Li
2025-07-16 11:15 ` [PATCH v3 update " Aaron Lu
2025-07-16 11:27 ` [PATCH v3 " Peter Zijlstra
1 sibling, 1 reply; 5+ messages in thread
From: Philip Li @ 2025-07-16 7:40 UTC (permalink / raw)
To: Aaron Lu
Cc: kernel test robot, Valentin Schneider, Ben Segall,
K Prateek Nayak, Peter Zijlstra, Chengming Zhou, Josh Don,
Ingo Molnar, Vincent Guittot, Xi Wang, llvm, oe-kbuild-all,
linux-kernel, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
Mel Gorman, Chuyi Zhou, Jan Kiszka, Florian Bezdeka, Songtang Liu
On Wed, Jul 16, 2025 at 02:57:07PM +0800, Aaron Lu wrote:
> On Wed, Jul 16, 2025 at 07:29:37AM +0800, kernel test robot wrote:
> > Hi Aaron,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on tip/sched/core]
> > [also build test WARNING on next-20250715]
> > [cannot apply to linus/master v6.16-rc6]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
> > base: tip/sched/core
> > patch link: https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
> > patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
> > config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
> > compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> > >> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> > 5456 | cfs_rq->pelt_clock_throttled = 1;
> > | ^ ~
>
> Thanks for the report.
>
> I don't think this will affect correctness since both cfs_rq's throttled
> and pelt_clock_throttled fields are used as true(not 0) or false(0). I
> used bitfield for them to save some space.
>
> Change their types to either unsigned int or bool should cure this
> warning, I suppose bool looks more clear?
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index dbe52e18b93a0..434f816a56701 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -737,8 +737,8 @@ struct cfs_rq {
> u64 throttled_clock_pelt_time;
> u64 throttled_clock_self;
> u64 throttled_clock_self_time;
> - int throttled:1;
> - int pelt_clock_throttled:1;
> + bool throttled:1;
> + bool pelt_clock_throttled:1;
> int throttle_count;
> struct list_head throttled_list;
> struct list_head throttled_csd_list;
>
> Hi LKP,
>
> I tried using clang-19 but couldn't reproduce this warning and I don't
> have clang-20 at hand. Can you please apply the above hunk on top of
> this series and see if that warning is gone? Thanks.
Got it, is it possible to give a try for the reproduce step [1], which can
download clang-20 to local dir? If it has issue, we will follow up to check
as early as possible.
[1] https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce
>
> Best regards,
> Aaron
>
> > kernel/sched/fair.c:5971:32: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> > 5971 | cfs_rq->pelt_clock_throttled = 1;
> > | ^ ~
> > kernel/sched/fair.c:6014:20: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> > 6014 | cfs_rq->throttled = 1;
> > | ^ ~
> > 3 warnings generated.
> >
> >
> > vim +/int +5456 kernel/sched/fair.c
> >
> > 5372
> > 5373 static bool
> > 5374 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> > 5375 {
> > 5376 bool sleep = flags & DEQUEUE_SLEEP;
> > 5377 int action = UPDATE_TG;
> > 5378
> > 5379 update_curr(cfs_rq);
> > 5380 clear_buddies(cfs_rq, se);
> > 5381
> > 5382 if (flags & DEQUEUE_DELAYED) {
> > 5383 WARN_ON_ONCE(!se->sched_delayed);
> > 5384 } else {
> > 5385 bool delay = sleep;
> > 5386 /*
> > 5387 * DELAY_DEQUEUE relies on spurious wakeups, special task
> > 5388 * states must not suffer spurious wakeups, excempt them.
> > 5389 */
> > 5390 if (flags & DEQUEUE_SPECIAL)
> > 5391 delay = false;
> > 5392
> > 5393 WARN_ON_ONCE(delay && se->sched_delayed);
> > 5394
> > 5395 if (sched_feat(DELAY_DEQUEUE) && delay &&
> > 5396 !entity_eligible(cfs_rq, se)) {
> > 5397 update_load_avg(cfs_rq, se, 0);
> > 5398 set_delayed(se);
> > 5399 return false;
> > 5400 }
> > 5401 }
> > 5402
> > 5403 if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
> > 5404 action |= DO_DETACH;
> > 5405
> > 5406 /*
> > 5407 * When dequeuing a sched_entity, we must:
> > 5408 * - Update loads to have both entity and cfs_rq synced with now.
> > 5409 * - For group_entity, update its runnable_weight to reflect the new
> > 5410 * h_nr_runnable of its group cfs_rq.
> > 5411 * - Subtract its previous weight from cfs_rq->load.weight.
> > 5412 * - For group entity, update its weight to reflect the new share
> > 5413 * of its group cfs_rq.
> > 5414 */
> > 5415 update_load_avg(cfs_rq, se, action);
> > 5416 se_update_runnable(se);
> > 5417
> > 5418 update_stats_dequeue_fair(cfs_rq, se, flags);
> > 5419
> > 5420 update_entity_lag(cfs_rq, se);
> > 5421 if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
> > 5422 se->deadline -= se->vruntime;
> > 5423 se->rel_deadline = 1;
> > 5424 }
> > 5425
> > 5426 if (se != cfs_rq->curr)
> > 5427 __dequeue_entity(cfs_rq, se);
> > 5428 se->on_rq = 0;
> > 5429 account_entity_dequeue(cfs_rq, se);
> > 5430
> > 5431 /* return excess runtime on last dequeue */
> > 5432 return_cfs_rq_runtime(cfs_rq);
> > 5433
> > 5434 update_cfs_group(se);
> > 5435
> > 5436 /*
> > 5437 * Now advance min_vruntime if @se was the entity holding it back,
> > 5438 * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
> > 5439 * put back on, and if we advance min_vruntime, we'll be placed back
> > 5440 * further than we started -- i.e. we'll be penalized.
> > 5441 */
> > 5442 if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
> > 5443 update_min_vruntime(cfs_rq);
> > 5444
> > 5445 if (flags & DEQUEUE_DELAYED)
> > 5446 finish_delayed_dequeue_entity(se);
> > 5447
> > 5448 if (cfs_rq->nr_queued == 0) {
> > 5449 update_idle_cfs_rq_clock_pelt(cfs_rq);
> > 5450 #ifdef CONFIG_CFS_BANDWIDTH
> > 5451 if (throttled_hierarchy(cfs_rq)) {
> > 5452 struct rq *rq = rq_of(cfs_rq);
> > 5453
> > 5454 list_del_leaf_cfs_rq(cfs_rq);
> > 5455 cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> > > 5456 cfs_rq->pelt_clock_throttled = 1;
> > 5457 }
> > 5458 #endif
> > 5459 }
> > 5460
> > 5461 return true;
> > 5462 }
> > 5463
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 update 3/5] sched/fair: Switch to task based throttle model
2025-07-16 7:40 ` Philip Li
@ 2025-07-16 11:15 ` Aaron Lu
0 siblings, 0 replies; 5+ messages in thread
From: Aaron Lu @ 2025-07-16 11:15 UTC (permalink / raw)
To: Philip Li
Cc: kernel test robot, Valentin Schneider, Ben Segall,
K Prateek Nayak, Peter Zijlstra, Chengming Zhou, Josh Don,
Ingo Molnar, Vincent Guittot, Xi Wang, llvm, oe-kbuild-all,
linux-kernel, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
Mel Gorman, Chuyi Zhou, Jan Kiszka, Florian Bezdeka, Songtang Liu
On Wed, Jul 16, 2025 at 03:40:26PM +0800, Philip Li wrote:
> On Wed, Jul 16, 2025 at 02:57:07PM +0800, Aaron Lu wrote:
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index dbe52e18b93a0..434f816a56701 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -737,8 +737,8 @@ struct cfs_rq {
> > u64 throttled_clock_pelt_time;
> > u64 throttled_clock_self;
> > u64 throttled_clock_self_time;
> > - int throttled:1;
> > - int pelt_clock_throttled:1;
> > + bool throttled:1;
> > + bool pelt_clock_throttled:1;
> > int throttle_count;
> > struct list_head throttled_list;
> > struct list_head throttled_csd_list;
> >
> > Hi LKP,
> >
> > I tried using clang-19 but couldn't reproduce this warning and I don't
> > have clang-20 at hand. Can you please apply the above hunk on top of
> > this series and see if that warning is gone? Thanks.
>
> Got it, is it possible to give a try for the reproduce step [1], which can
> download clang-20 to local dir? If it has issue, we will follow up to check
> as early as possible.
I managed to install clang-20 and can see this warning. With the above
hunk applied, the warning is gone.
Here is the updated patch3:
From: Valentin Schneider <vschneid@redhat.com>
Date: Fri, 23 May 2025 15:30:15 +0800
Subject: [PATCH v3 update 3/5] sched/fair: Switch to task based throttle model
In current throttle model, when a cfs_rq is throttled, its entity will
be dequeued from cpu's rq, making tasks attached to it not able to run,
thus achiveing the throttle target.
This has a drawback though: assume a task is a reader of percpu_rwsem
and is waiting. When it gets woken, it can not run till its task group's
next period comes, which can be a relatively long time. Waiting writer
will have to wait longer due to this and it also makes further reader
build up and eventually trigger task hung.
To improve this situation, change the throttle model to task based, i.e.
when a cfs_rq is throttled, record its throttled status but do not remove
it from cpu's rq. Instead, for tasks that belong to this cfs_rq, when
they get picked, add a task work to them so that when they return
to user, they can be dequeued there. In this way, tasks throttled will
not hold any kernel resources. And on unthrottle, enqueue back those
tasks so they can continue to run.
Throttled cfs_rq's PELT clock is handled differently now: previously the
cfs_rq's PELT clock is stopped once it entered throttled state but since
now tasks(in kernel mode) can continue to run, change the behaviour to
stop PELT clock when the throttled cfs_rq has no tasks left.
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Chengming Zhou <chengming.zhou@linux.dev> # tag on pick
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
---
v3 update:
- fix compiler warning about int bit-field.
kernel/sched/fair.c | 336 ++++++++++++++++++++++---------------------
kernel/sched/pelt.h | 4 +-
kernel/sched/sched.h | 3 +-
3 files changed, 176 insertions(+), 167 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 54c2a4df6a5d1..0eeea7f2e693d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5285,18 +5285,23 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (cfs_rq->nr_queued == 1) {
check_enqueue_throttle(cfs_rq);
- if (!throttled_hierarchy(cfs_rq)) {
- list_add_leaf_cfs_rq(cfs_rq);
- } else {
+ list_add_leaf_cfs_rq(cfs_rq);
#ifdef CONFIG_CFS_BANDWIDTH
+ if (throttled_hierarchy(cfs_rq)) {
struct rq *rq = rq_of(cfs_rq);
if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock)
cfs_rq->throttled_clock = rq_clock(rq);
if (!cfs_rq->throttled_clock_self)
cfs_rq->throttled_clock_self = rq_clock(rq);
-#endif
+
+ if (cfs_rq->pelt_clock_throttled) {
+ cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
+ cfs_rq->throttled_clock_pelt;
+ cfs_rq->pelt_clock_throttled = 0;
+ }
}
+#endif
}
}
@@ -5335,8 +5340,6 @@ static void set_delayed(struct sched_entity *se)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_runnable--;
- if (cfs_rq_throttled(cfs_rq))
- break;
}
}
@@ -5357,8 +5360,6 @@ static void clear_delayed(struct sched_entity *se)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_runnable++;
- if (cfs_rq_throttled(cfs_rq))
- break;
}
}
@@ -5444,8 +5445,18 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (flags & DEQUEUE_DELAYED)
finish_delayed_dequeue_entity(se);
- if (cfs_rq->nr_queued == 0)
+ if (cfs_rq->nr_queued == 0) {
update_idle_cfs_rq_clock_pelt(cfs_rq);
+#ifdef CONFIG_CFS_BANDWIDTH
+ if (throttled_hierarchy(cfs_rq)) {
+ struct rq *rq = rq_of(cfs_rq);
+
+ list_del_leaf_cfs_rq(cfs_rq);
+ cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+ cfs_rq->pelt_clock_throttled = 1;
+ }
+#endif
+ }
return true;
}
@@ -5784,6 +5795,10 @@ static void throttle_cfs_rq_work(struct callback_head *work)
WARN_ON_ONCE(p->throttled || !list_empty(&p->throttle_node));
dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
+ /*
+ * Must not set throttled before dequeue or dequeue will
+ * mistakenly regard this task as an already throttled one.
+ */
p->throttled = true;
resched_curr(rq);
}
@@ -5797,32 +5812,119 @@ void init_cfs_throttle_work(struct task_struct *p)
INIT_LIST_HEAD(&p->throttle_node);
}
+/*
+ * Task is throttled and someone wants to dequeue it again:
+ * it could be sched/core when core needs to do things like
+ * task affinity change, task group change, task sched class
+ * change etc. and in these cases, DEQUEUE_SLEEP is not set;
+ * or the task is blocked after throttled due to freezer etc.
+ * and in these cases, DEQUEUE_SLEEP is set.
+ */
+static void detach_task_cfs_rq(struct task_struct *p);
+static void dequeue_throttled_task(struct task_struct *p, int flags)
+{
+ WARN_ON_ONCE(p->se.on_rq);
+ list_del_init(&p->throttle_node);
+
+ /* task blocked after throttled */
+ if (flags & DEQUEUE_SLEEP) {
+ p->throttled = false;
+ return;
+ }
+
+ /*
+ * task is migrating off its old cfs_rq, detach
+ * the task's load from its old cfs_rq.
+ */
+ if (task_on_rq_migrating(p))
+ detach_task_cfs_rq(p);
+}
+
+static bool enqueue_throttled_task(struct task_struct *p)
+{
+ struct cfs_rq *cfs_rq = cfs_rq_of(&p->se);
+
+ /*
+ * If the throttled task is enqueued to a throttled cfs_rq,
+ * take the fast path by directly put the task on target
+ * cfs_rq's limbo list, except when p is current because
+ * the following race can cause p's group_node left in rq's
+ * cfs_tasks list when it's throttled:
+ *
+ * cpuX cpuY
+ * taskA ret2user
+ * throttle_cfs_rq_work() sched_move_task(taskA)
+ * task_rq_lock acquired
+ * dequeue_task_fair(taskA)
+ * task_rq_lock released
+ * task_rq_lock acquired
+ * task_current_donor(taskA) == true
+ * task_on_rq_queued(taskA) == true
+ * dequeue_task(taskA)
+ * put_prev_task(taskA)
+ * sched_change_group()
+ * enqueue_task(taskA) -> taskA's new cfs_rq
+ * is throttled, go
+ * fast path and skip
+ * actual enqueue
+ * set_next_task(taskA)
+ * __set_next_task_fair(taskA)
+ * list_move(&se->group_node, &rq->cfs_tasks); // bug
+ * schedule()
+ *
+ * And in the above race case, the task's current cfs_rq is in the same
+ * rq as its previous cfs_rq because sched_move_task() doesn't migrate
+ * task so we can use its current cfs_rq to derive rq and test if the
+ * task is current.
+ */
+ if (throttled_hierarchy(cfs_rq) &&
+ !task_current_donor(rq_of(cfs_rq), p)) {
+ list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
+ return true;
+ }
+
+ /* we can't take the fast path, do an actual enqueue*/
+ p->throttled = false;
+ return false;
+}
+
+static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags);
static int tg_unthrottle_up(struct task_group *tg, void *data)
{
struct rq *rq = data;
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
+ struct task_struct *p, *tmp;
+
+ if (--cfs_rq->throttle_count)
+ return 0;
- cfs_rq->throttle_count--;
- if (!cfs_rq->throttle_count) {
+ if (cfs_rq->pelt_clock_throttled) {
cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
cfs_rq->throttled_clock_pelt;
+ cfs_rq->pelt_clock_throttled = 0;
+ }
- /* Add cfs_rq with load or one or more already running entities to the list */
- if (!cfs_rq_is_decayed(cfs_rq))
- list_add_leaf_cfs_rq(cfs_rq);
+ if (cfs_rq->throttled_clock_self) {
+ u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
- if (cfs_rq->throttled_clock_self) {
- u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
+ cfs_rq->throttled_clock_self = 0;
- cfs_rq->throttled_clock_self = 0;
+ if (WARN_ON_ONCE((s64)delta < 0))
+ delta = 0;
- if (WARN_ON_ONCE((s64)delta < 0))
- delta = 0;
+ cfs_rq->throttled_clock_self_time += delta;
+ }
- cfs_rq->throttled_clock_self_time += delta;
- }
+ /* Re-enqueue the tasks that have been throttled at this level. */
+ list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
+ list_del_init(&p->throttle_node);
+ enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
}
+ /* Add cfs_rq with load or one or more already running entities to the list */
+ if (!cfs_rq_is_decayed(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
+
return 0;
}
@@ -5851,17 +5953,25 @@ static int tg_throttle_down(struct task_group *tg, void *data)
struct rq *rq = data;
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
+ if (cfs_rq->throttle_count++)
+ return 0;
+
+
/* group is entering throttled state, stop time */
- if (!cfs_rq->throttle_count) {
- cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+ WARN_ON_ONCE(cfs_rq->throttled_clock_self);
+ if (cfs_rq->nr_queued)
+ cfs_rq->throttled_clock_self = rq_clock(rq);
+ else {
+ /*
+ * For cfs_rqs that still have entities enqueued, PELT clock
+ * stop happens at dequeue time when all entities are dequeued.
+ */
list_del_leaf_cfs_rq(cfs_rq);
-
- WARN_ON_ONCE(cfs_rq->throttled_clock_self);
- if (cfs_rq->nr_queued)
- cfs_rq->throttled_clock_self = rq_clock(rq);
+ cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+ cfs_rq->pelt_clock_throttled = 1;
}
- cfs_rq->throttle_count++;
+ WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
return 0;
}
@@ -5869,8 +5979,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
{
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
- struct sched_entity *se;
- long queued_delta, runnable_delta, idle_delta, dequeue = 1;
+ int dequeue = 1;
raw_spin_lock(&cfs_b->lock);
/* This will start the period timer if necessary */
@@ -5893,68 +6002,11 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
if (!dequeue)
return false; /* Throttle no longer required. */
- se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
-
/* freeze hierarchy runnable averages while throttled */
rcu_read_lock();
walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq);
rcu_read_unlock();
- queued_delta = cfs_rq->h_nr_queued;
- runnable_delta = cfs_rq->h_nr_runnable;
- idle_delta = cfs_rq->h_nr_idle;
- for_each_sched_entity(se) {
- struct cfs_rq *qcfs_rq = cfs_rq_of(se);
- int flags;
-
- /* throttled entity or throttle-on-deactivate */
- if (!se->on_rq)
- goto done;
-
- /*
- * Abuse SPECIAL to avoid delayed dequeue in this instance.
- * This avoids teaching dequeue_entities() about throttled
- * entities and keeps things relatively simple.
- */
- flags = DEQUEUE_SLEEP | DEQUEUE_SPECIAL;
- if (se->sched_delayed)
- flags |= DEQUEUE_DELAYED;
- dequeue_entity(qcfs_rq, se, flags);
-
- if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_delta = cfs_rq->h_nr_queued;
-
- qcfs_rq->h_nr_queued -= queued_delta;
- qcfs_rq->h_nr_runnable -= runnable_delta;
- qcfs_rq->h_nr_idle -= idle_delta;
-
- if (qcfs_rq->load.weight) {
- /* Avoid re-evaluating load for this entity: */
- se = parent_entity(se);
- break;
- }
- }
-
- for_each_sched_entity(se) {
- struct cfs_rq *qcfs_rq = cfs_rq_of(se);
- /* throttled entity or throttle-on-deactivate */
- if (!se->on_rq)
- goto done;
-
- update_load_avg(qcfs_rq, se, 0);
- se_update_runnable(se);
-
- if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_delta = cfs_rq->h_nr_queued;
-
- qcfs_rq->h_nr_queued -= queued_delta;
- qcfs_rq->h_nr_runnable -= runnable_delta;
- qcfs_rq->h_nr_idle -= idle_delta;
- }
-
- /* At this point se is NULL and we are at root level*/
- sub_nr_running(rq, queued_delta);
-done:
/*
* Note: distribution will already see us throttled via the
* throttled-list. rq->lock protects completion.
@@ -5970,9 +6022,20 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
{
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
- struct sched_entity *se;
- long queued_delta, runnable_delta, idle_delta;
- long rq_h_nr_queued = rq->cfs.h_nr_queued;
+ struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)];
+
+ /*
+ * It's possible we are called with !runtime_remaining due to things
+ * like user changed quota setting(see tg_set_cfs_bandwidth()) or async
+ * unthrottled us with a positive runtime_remaining but other still
+ * running entities consumed those runtime before we reached here.
+ *
+ * Anyway, we can't unthrottle this cfs_rq without any runtime remaining
+ * because any enqueue in tg_unthrottle_up() will immediately trigger a
+ * throttle, which is not supposed to happen on unthrottle path.
+ */
+ if (cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <= 0)
+ return;
se = cfs_rq->tg->se[cpu_of(rq)];
@@ -6002,62 +6065,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
if (list_add_leaf_cfs_rq(cfs_rq_of(se)))
break;
}
- goto unthrottle_throttle;
}
- queued_delta = cfs_rq->h_nr_queued;
- runnable_delta = cfs_rq->h_nr_runnable;
- idle_delta = cfs_rq->h_nr_idle;
- for_each_sched_entity(se) {
- struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-
- /* Handle any unfinished DELAY_DEQUEUE business first. */
- if (se->sched_delayed) {
- int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
-
- dequeue_entity(qcfs_rq, se, flags);
- } else if (se->on_rq)
- break;
- enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
-
- if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_delta = cfs_rq->h_nr_queued;
-
- qcfs_rq->h_nr_queued += queued_delta;
- qcfs_rq->h_nr_runnable += runnable_delta;
- qcfs_rq->h_nr_idle += idle_delta;
-
- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(qcfs_rq))
- goto unthrottle_throttle;
- }
-
- for_each_sched_entity(se) {
- struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-
- update_load_avg(qcfs_rq, se, UPDATE_TG);
- se_update_runnable(se);
-
- if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_delta = cfs_rq->h_nr_queued;
-
- qcfs_rq->h_nr_queued += queued_delta;
- qcfs_rq->h_nr_runnable += runnable_delta;
- qcfs_rq->h_nr_idle += idle_delta;
-
- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(qcfs_rq))
- goto unthrottle_throttle;
- }
-
- /* Start the fair server if un-throttling resulted in new runnable tasks */
- if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
- dl_server_start(&rq->fair_server);
-
- /* At this point se is NULL and we are at root level*/
- add_nr_running(rq, queued_delta);
-
-unthrottle_throttle:
assert_list_leaf_cfs_rq(rq);
/* Determine whether we need to wake up potentially idle CPU: */
@@ -6711,6 +6720,8 @@ static inline void sync_throttle(struct task_group *tg, int cpu) {}
static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
static void task_throttle_setup_work(struct task_struct *p) {}
static bool task_is_throttled(struct task_struct *p) { return false; }
+static void dequeue_throttled_task(struct task_struct *p, int flags) {}
+static bool enqueue_throttled_task(struct task_struct *p) { return false; }
static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
{
@@ -6903,6 +6914,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
int rq_h_nr_queued = rq->cfs.h_nr_queued;
u64 slice = 0;
+ if (unlikely(task_is_throttled(p) && enqueue_throttled_task(p)))
+ return;
+
/*
* The code below (indirectly) updates schedutil which looks at
* the cfs_rq utilization to select a frequency.
@@ -6955,10 +6969,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = 1;
- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- goto enqueue_throttle;
-
flags = ENQUEUE_WAKEUP;
}
@@ -6980,10 +6990,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = 1;
-
- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- goto enqueue_throttle;
}
if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
@@ -7013,7 +7019,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!task_new)
check_update_overutilized_status(rq);
-enqueue_throttle:
assert_list_leaf_cfs_rq(rq);
hrtick_update(rq);
@@ -7068,10 +7073,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = h_nr_queued;
- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- return 0;
-
/* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight) {
slice = cfs_rq_min_slice(cfs_rq);
@@ -7108,10 +7109,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = h_nr_queued;
-
- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- return 0;
}
sub_nr_running(rq, h_nr_queued);
@@ -7145,6 +7142,11 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
*/
static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
+ if (unlikely(task_is_throttled(p))) {
+ dequeue_throttled_task(p, flags);
+ return true;
+ }
+
if (!p->se.sched_delayed)
util_est_dequeue(&rq->cfs, p);
@@ -8813,19 +8815,22 @@ static struct task_struct *pick_task_fair(struct rq *rq)
{
struct sched_entity *se;
struct cfs_rq *cfs_rq;
+ struct task_struct *p;
+ bool throttled;
again:
cfs_rq = &rq->cfs;
if (!cfs_rq->nr_queued)
return NULL;
+ throttled = false;
+
do {
/* Might not have done put_prev_entity() */
if (cfs_rq->curr && cfs_rq->curr->on_rq)
update_curr(cfs_rq);
- if (unlikely(check_cfs_rq_runtime(cfs_rq)))
- goto again;
+ throttled |= check_cfs_rq_runtime(cfs_rq);
se = pick_next_entity(rq, cfs_rq);
if (!se)
@@ -8833,7 +8838,10 @@ static struct task_struct *pick_task_fair(struct rq *rq)
cfs_rq = group_cfs_rq(se);
} while (cfs_rq);
- return task_of(se);
+ p = task_of(se);
+ if (unlikely(throttled))
+ task_throttle_setup_work(p);
+ return p;
}
static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool first);
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 62c3fa543c0f2..f921302dc40fb 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -162,7 +162,7 @@ static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
u64 throttled;
- if (unlikely(cfs_rq->throttle_count))
+ if (unlikely(cfs_rq->pelt_clock_throttled))
throttled = U64_MAX;
else
throttled = cfs_rq->throttled_clock_pelt_time;
@@ -173,7 +173,7 @@ static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
/* rq->task_clock normalized against any time this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
- if (unlikely(cfs_rq->throttle_count))
+ if (unlikely(cfs_rq->pelt_clock_throttled))
return cfs_rq->throttled_clock_pelt - cfs_rq->throttled_clock_pelt_time;
return rq_clock_pelt(rq_of(cfs_rq)) - cfs_rq->throttled_clock_pelt_time;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b0c9559992d8a..51d000ab4a70d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -737,7 +737,8 @@ struct cfs_rq {
u64 throttled_clock_pelt_time;
u64 throttled_clock_self;
u64 throttled_clock_self_time;
- int throttled;
+ bool throttled:1;
+ bool pelt_clock_throttled:1;
int throttle_count;
struct list_head throttled_list;
struct list_head throttled_csd_list;
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
2025-07-16 6:57 ` Aaron Lu
2025-07-16 7:40 ` Philip Li
@ 2025-07-16 11:27 ` Peter Zijlstra
1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2025-07-16 11:27 UTC (permalink / raw)
To: Aaron Lu
Cc: kernel test robot, Valentin Schneider, Ben Segall,
K Prateek Nayak, Chengming Zhou, Josh Don, Ingo Molnar,
Vincent Guittot, Xi Wang, llvm, oe-kbuild-all, linux-kernel,
Juri Lelli, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
Chuyi Zhou, Jan Kiszka, Florian Bezdeka, Songtang Liu
On Wed, Jul 16, 2025 at 02:57:07PM +0800, Aaron Lu wrote:
> On Wed, Jul 16, 2025 at 07:29:37AM +0800, kernel test robot wrote:
> > Hi Aaron,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on tip/sched/core]
> > [also build test WARNING on next-20250715]
> > [cannot apply to linus/master v6.16-rc6]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
> > base: tip/sched/core
> > patch link: https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
> > patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
> > config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
> > compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> > >> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> > 5456 | cfs_rq->pelt_clock_throttled = 1;
> > | ^ ~
Nice warning from clang.
>
> Thanks for the report.
>
> I don't think this will affect correctness since both cfs_rq's throttled
> and pelt_clock_throttled fields are used as true(not 0) or false(0). I
> used bitfield for them to save some space.
>
> Change their types to either unsigned int or bool should cure this
> warning, I suppose bool looks more clear?
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index dbe52e18b93a0..434f816a56701 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -737,8 +737,8 @@ struct cfs_rq {
> u64 throttled_clock_pelt_time;
> u64 throttled_clock_self;
> u64 throttled_clock_self_time;
> - int throttled:1;
> - int pelt_clock_throttled:1;
> + bool throttled:1;
> + bool pelt_clock_throttled:1;
> int throttle_count;
> struct list_head throttled_list;
> struct list_head throttled_csd_list;
Yeah, either this or any unsigned type will do.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-16 11:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250715071658.267-4-ziqianlu@bytedance.com>
2025-07-15 23:29 ` [PATCH v3 3/5] sched/fair: Switch to task based throttle model kernel test robot
2025-07-16 6:57 ` Aaron Lu
2025-07-16 7:40 ` Philip Li
2025-07-16 11:15 ` [PATCH v3 update " Aaron Lu
2025-07-16 11:27 ` [PATCH v3 " Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).