Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model

llvm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
       [not found] <20250715071658.267-4-ziqianlu@bytedance.com>
@ 2025-07-15 23:29 ` kernel test robot
  2025-07-16  6:57   ` Aaron Lu
  0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2025-07-15 23:29 UTC (permalink / raw)
  To: Aaron Lu, Valentin Schneider, Ben Segall, K Prateek Nayak,
	Peter Zijlstra, Chengming Zhou, Josh Don, Ingo Molnar,
	Vincent Guittot, Xi Wang
  Cc: llvm, oe-kbuild-all, linux-kernel, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Mel Gorman, Chuyi Zhou, Jan Kiszka,
	Florian Bezdeka, Songtang Liu

Hi Aaron,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on next-20250715]
[cannot apply to linus/master v6.16-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
    5456 |                         cfs_rq->pelt_clock_throttled = 1;
         |                                                      ^ ~
   kernel/sched/fair.c:5971:32: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
    5971 |                 cfs_rq->pelt_clock_throttled = 1;
         |                                              ^ ~
   kernel/sched/fair.c:6014:20: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
    6014 |         cfs_rq->throttled = 1;
         |                           ^ ~
   3 warnings generated.


vim +/int +5456 kernel/sched/fair.c

  5372	
  5373	static bool
  5374	dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  5375	{
  5376		bool sleep = flags & DEQUEUE_SLEEP;
  5377		int action = UPDATE_TG;
  5378	
  5379		update_curr(cfs_rq);
  5380		clear_buddies(cfs_rq, se);
  5381	
  5382		if (flags & DEQUEUE_DELAYED) {
  5383			WARN_ON_ONCE(!se->sched_delayed);
  5384		} else {
  5385			bool delay = sleep;
  5386			/*
  5387			 * DELAY_DEQUEUE relies on spurious wakeups, special task
  5388			 * states must not suffer spurious wakeups, excempt them.
  5389			 */
  5390			if (flags & DEQUEUE_SPECIAL)
  5391				delay = false;
  5392	
  5393			WARN_ON_ONCE(delay && se->sched_delayed);
  5394	
  5395			if (sched_feat(DELAY_DEQUEUE) && delay &&
  5396			    !entity_eligible(cfs_rq, se)) {
  5397				update_load_avg(cfs_rq, se, 0);
  5398				set_delayed(se);
  5399				return false;
  5400			}
  5401		}
  5402	
  5403		if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
  5404			action |= DO_DETACH;
  5405	
  5406		/*
  5407		 * When dequeuing a sched_entity, we must:
  5408		 *   - Update loads to have both entity and cfs_rq synced with now.
  5409		 *   - For group_entity, update its runnable_weight to reflect the new
  5410		 *     h_nr_runnable of its group cfs_rq.
  5411		 *   - Subtract its previous weight from cfs_rq->load.weight.
  5412		 *   - For group entity, update its weight to reflect the new share
  5413		 *     of its group cfs_rq.
  5414		 */
  5415		update_load_avg(cfs_rq, se, action);
  5416		se_update_runnable(se);
  5417	
  5418		update_stats_dequeue_fair(cfs_rq, se, flags);
  5419	
  5420		update_entity_lag(cfs_rq, se);
  5421		if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
  5422			se->deadline -= se->vruntime;
  5423			se->rel_deadline = 1;
  5424		}
  5425	
  5426		if (se != cfs_rq->curr)
  5427			__dequeue_entity(cfs_rq, se);
  5428		se->on_rq = 0;
  5429		account_entity_dequeue(cfs_rq, se);
  5430	
  5431		/* return excess runtime on last dequeue */
  5432		return_cfs_rq_runtime(cfs_rq);
  5433	
  5434		update_cfs_group(se);
  5435	
  5436		/*
  5437		 * Now advance min_vruntime if @se was the entity holding it back,
  5438		 * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
  5439		 * put back on, and if we advance min_vruntime, we'll be placed back
  5440		 * further than we started -- i.e. we'll be penalized.
  5441		 */
  5442		if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
  5443			update_min_vruntime(cfs_rq);
  5444	
  5445		if (flags & DEQUEUE_DELAYED)
  5446			finish_delayed_dequeue_entity(se);
  5447	
  5448		if (cfs_rq->nr_queued == 0) {
  5449			update_idle_cfs_rq_clock_pelt(cfs_rq);
  5450	#ifdef CONFIG_CFS_BANDWIDTH
  5451			if (throttled_hierarchy(cfs_rq)) {
  5452				struct rq *rq = rq_of(cfs_rq);
  5453	
  5454				list_del_leaf_cfs_rq(cfs_rq);
  5455				cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> 5456				cfs_rq->pelt_clock_throttled = 1;
  5457			}
  5458	#endif
  5459		}
  5460	
  5461		return true;
  5462	}
  5463	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
  2025-07-15 23:29 ` [PATCH v3 3/5] sched/fair: Switch to task based throttle model kernel test robot
@ 2025-07-16  6:57   ` Aaron Lu
  2025-07-16  7:40     ` Philip Li
  2025-07-16 11:27     ` [PATCH v3 " Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Aaron Lu @ 2025-07-16  6:57 UTC (permalink / raw)
  To: kernel test robot
  Cc: Valentin Schneider, Ben Segall, K Prateek Nayak, Peter Zijlstra,
	Chengming Zhou, Josh Don, Ingo Molnar, Vincent Guittot, Xi Wang,
	llvm, oe-kbuild-all, linux-kernel, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Mel Gorman, Chuyi Zhou, Jan Kiszka,
	Florian Bezdeka, Songtang Liu

On Wed, Jul 16, 2025 at 07:29:37AM +0800, kernel test robot wrote:
> Hi Aaron,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on next-20250715]
> [cannot apply to linus/master v6.16-rc6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
> base:   tip/sched/core
> patch link:    https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
> patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
> config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
>     5456 |                         cfs_rq->pelt_clock_throttled = 1;
>          |                                                      ^ ~

Thanks for the report.

I don't think this will affect correctness since both cfs_rq's throttled
and pelt_clock_throttled fields are used as true(not 0) or false(0). I
used bitfield for them to save some space.

Change their types to either unsigned int or bool should cure this
warning, I suppose bool looks more clear?

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dbe52e18b93a0..434f816a56701 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -737,8 +737,8 @@ struct cfs_rq {
 	u64			throttled_clock_pelt_time;
 	u64			throttled_clock_self;
 	u64			throttled_clock_self_time;
-	int			throttled:1;
-	int			pelt_clock_throttled:1;
+	bool			throttled:1;
+	bool			pelt_clock_throttled:1;
 	int			throttle_count;
 	struct list_head	throttled_list;
 	struct list_head	throttled_csd_list;

Hi LKP,

I tried using clang-19 but couldn't reproduce this warning and I don't
have clang-20 at hand. Can you please apply the above hunk on top of
this series and see if that warning is gone? Thanks.

Best regards,
Aaron

>    kernel/sched/fair.c:5971:32: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
>     5971 |                 cfs_rq->pelt_clock_throttled = 1;
>          |                                              ^ ~
>    kernel/sched/fair.c:6014:20: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
>     6014 |         cfs_rq->throttled = 1;
>          |                           ^ ~
>    3 warnings generated.
> 
> 
> vim +/int +5456 kernel/sched/fair.c
> 
>   5372	
>   5373	static bool
>   5374	dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>   5375	{
>   5376		bool sleep = flags & DEQUEUE_SLEEP;
>   5377		int action = UPDATE_TG;
>   5378	
>   5379		update_curr(cfs_rq);
>   5380		clear_buddies(cfs_rq, se);
>   5381	
>   5382		if (flags & DEQUEUE_DELAYED) {
>   5383			WARN_ON_ONCE(!se->sched_delayed);
>   5384		} else {
>   5385			bool delay = sleep;
>   5386			/*
>   5387			 * DELAY_DEQUEUE relies on spurious wakeups, special task
>   5388			 * states must not suffer spurious wakeups, excempt them.
>   5389			 */
>   5390			if (flags & DEQUEUE_SPECIAL)
>   5391				delay = false;
>   5392	
>   5393			WARN_ON_ONCE(delay && se->sched_delayed);
>   5394	
>   5395			if (sched_feat(DELAY_DEQUEUE) && delay &&
>   5396			    !entity_eligible(cfs_rq, se)) {
>   5397				update_load_avg(cfs_rq, se, 0);
>   5398				set_delayed(se);
>   5399				return false;
>   5400			}
>   5401		}
>   5402	
>   5403		if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
>   5404			action |= DO_DETACH;
>   5405	
>   5406		/*
>   5407		 * When dequeuing a sched_entity, we must:
>   5408		 *   - Update loads to have both entity and cfs_rq synced with now.
>   5409		 *   - For group_entity, update its runnable_weight to reflect the new
>   5410		 *     h_nr_runnable of its group cfs_rq.
>   5411		 *   - Subtract its previous weight from cfs_rq->load.weight.
>   5412		 *   - For group entity, update its weight to reflect the new share
>   5413		 *     of its group cfs_rq.
>   5414		 */
>   5415		update_load_avg(cfs_rq, se, action);
>   5416		se_update_runnable(se);
>   5417	
>   5418		update_stats_dequeue_fair(cfs_rq, se, flags);
>   5419	
>   5420		update_entity_lag(cfs_rq, se);
>   5421		if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>   5422			se->deadline -= se->vruntime;
>   5423			se->rel_deadline = 1;
>   5424		}
>   5425	
>   5426		if (se != cfs_rq->curr)
>   5427			__dequeue_entity(cfs_rq, se);
>   5428		se->on_rq = 0;
>   5429		account_entity_dequeue(cfs_rq, se);
>   5430	
>   5431		/* return excess runtime on last dequeue */
>   5432		return_cfs_rq_runtime(cfs_rq);
>   5433	
>   5434		update_cfs_group(se);
>   5435	
>   5436		/*
>   5437		 * Now advance min_vruntime if @se was the entity holding it back,
>   5438		 * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
>   5439		 * put back on, and if we advance min_vruntime, we'll be placed back
>   5440		 * further than we started -- i.e. we'll be penalized.
>   5441		 */
>   5442		if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
>   5443			update_min_vruntime(cfs_rq);
>   5444	
>   5445		if (flags & DEQUEUE_DELAYED)
>   5446			finish_delayed_dequeue_entity(se);
>   5447	
>   5448		if (cfs_rq->nr_queued == 0) {
>   5449			update_idle_cfs_rq_clock_pelt(cfs_rq);
>   5450	#ifdef CONFIG_CFS_BANDWIDTH
>   5451			if (throttled_hierarchy(cfs_rq)) {
>   5452				struct rq *rq = rq_of(cfs_rq);
>   5453	
>   5454				list_del_leaf_cfs_rq(cfs_rq);
>   5455				cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> > 5456				cfs_rq->pelt_clock_throttled = 1;
>   5457			}
>   5458	#endif
>   5459		}
>   5460	
>   5461		return true;
>   5462	}
>   5463	
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
  2025-07-16  6:57   ` Aaron Lu
@ 2025-07-16  7:40     ` Philip Li
  2025-07-16 11:15       ` [PATCH v3 update " Aaron Lu
  2025-07-16 11:27     ` [PATCH v3 " Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Philip Li @ 2025-07-16  7:40 UTC (permalink / raw)
  To: Aaron Lu
  Cc: kernel test robot, Valentin Schneider, Ben Segall,
	K Prateek Nayak, Peter Zijlstra, Chengming Zhou, Josh Don,
	Ingo Molnar, Vincent Guittot, Xi Wang, llvm, oe-kbuild-all,
	linux-kernel, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Mel Gorman, Chuyi Zhou, Jan Kiszka, Florian Bezdeka, Songtang Liu

On Wed, Jul 16, 2025 at 02:57:07PM +0800, Aaron Lu wrote:
> On Wed, Jul 16, 2025 at 07:29:37AM +0800, kernel test robot wrote:
> > Hi Aaron,
> > 
> > kernel test robot noticed the following build warnings:
> > 
> > [auto build test WARNING on tip/sched/core]
> > [also build test WARNING on next-20250715]
> > [cannot apply to linus/master v6.16-rc6]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> > 
> > url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
> > base:   tip/sched/core
> > patch link:    https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
> > patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
> > config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
> > compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
> > 
> > All warnings (new ones prefixed by >>):
> > 
> > >> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> >     5456 |                         cfs_rq->pelt_clock_throttled = 1;
> >          |                                                      ^ ~
> 
> Thanks for the report.
> 
> I don't think this will affect correctness since both cfs_rq's throttled
> and pelt_clock_throttled fields are used as true(not 0) or false(0). I
> used bitfield for them to save some space.
> 
> Change their types to either unsigned int or bool should cure this
> warning, I suppose bool looks more clear?
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index dbe52e18b93a0..434f816a56701 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -737,8 +737,8 @@ struct cfs_rq {
>  	u64			throttled_clock_pelt_time;
>  	u64			throttled_clock_self;
>  	u64			throttled_clock_self_time;
> -	int			throttled:1;
> -	int			pelt_clock_throttled:1;
> +	bool			throttled:1;
> +	bool			pelt_clock_throttled:1;
>  	int			throttle_count;
>  	struct list_head	throttled_list;
>  	struct list_head	throttled_csd_list;
> 
> Hi LKP,
> 
> I tried using clang-19 but couldn't reproduce this warning and I don't
> have clang-20 at hand. Can you please apply the above hunk on top of
> this series and see if that warning is gone? Thanks.

Got it, is it possible to give a try for the reproduce step [1], which can
download clang-20 to local dir? If it has issue, we will follow up to check
as early as possible.

[1] https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce

> 
> Best regards,
> Aaron
> 
> >    kernel/sched/fair.c:5971:32: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> >     5971 |                 cfs_rq->pelt_clock_throttled = 1;
> >          |                                              ^ ~
> >    kernel/sched/fair.c:6014:20: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> >     6014 |         cfs_rq->throttled = 1;
> >          |                           ^ ~
> >    3 warnings generated.
> > 
> > 
> > vim +/int +5456 kernel/sched/fair.c
> > 
> >   5372	
> >   5373	static bool
> >   5374	dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> >   5375	{
> >   5376		bool sleep = flags & DEQUEUE_SLEEP;
> >   5377		int action = UPDATE_TG;
> >   5378	
> >   5379		update_curr(cfs_rq);
> >   5380		clear_buddies(cfs_rq, se);
> >   5381	
> >   5382		if (flags & DEQUEUE_DELAYED) {
> >   5383			WARN_ON_ONCE(!se->sched_delayed);
> >   5384		} else {
> >   5385			bool delay = sleep;
> >   5386			/*
> >   5387			 * DELAY_DEQUEUE relies on spurious wakeups, special task
> >   5388			 * states must not suffer spurious wakeups, excempt them.
> >   5389			 */
> >   5390			if (flags & DEQUEUE_SPECIAL)
> >   5391				delay = false;
> >   5392	
> >   5393			WARN_ON_ONCE(delay && se->sched_delayed);
> >   5394	
> >   5395			if (sched_feat(DELAY_DEQUEUE) && delay &&
> >   5396			    !entity_eligible(cfs_rq, se)) {
> >   5397				update_load_avg(cfs_rq, se, 0);
> >   5398				set_delayed(se);
> >   5399				return false;
> >   5400			}
> >   5401		}
> >   5402	
> >   5403		if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
> >   5404			action |= DO_DETACH;
> >   5405	
> >   5406		/*
> >   5407		 * When dequeuing a sched_entity, we must:
> >   5408		 *   - Update loads to have both entity and cfs_rq synced with now.
> >   5409		 *   - For group_entity, update its runnable_weight to reflect the new
> >   5410		 *     h_nr_runnable of its group cfs_rq.
> >   5411		 *   - Subtract its previous weight from cfs_rq->load.weight.
> >   5412		 *   - For group entity, update its weight to reflect the new share
> >   5413		 *     of its group cfs_rq.
> >   5414		 */
> >   5415		update_load_avg(cfs_rq, se, action);
> >   5416		se_update_runnable(se);
> >   5417	
> >   5418		update_stats_dequeue_fair(cfs_rq, se, flags);
> >   5419	
> >   5420		update_entity_lag(cfs_rq, se);
> >   5421		if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
> >   5422			se->deadline -= se->vruntime;
> >   5423			se->rel_deadline = 1;
> >   5424		}
> >   5425	
> >   5426		if (se != cfs_rq->curr)
> >   5427			__dequeue_entity(cfs_rq, se);
> >   5428		se->on_rq = 0;
> >   5429		account_entity_dequeue(cfs_rq, se);
> >   5430	
> >   5431		/* return excess runtime on last dequeue */
> >   5432		return_cfs_rq_runtime(cfs_rq);
> >   5433	
> >   5434		update_cfs_group(se);
> >   5435	
> >   5436		/*
> >   5437		 * Now advance min_vruntime if @se was the entity holding it back,
> >   5438		 * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
> >   5439		 * put back on, and if we advance min_vruntime, we'll be placed back
> >   5440		 * further than we started -- i.e. we'll be penalized.
> >   5441		 */
> >   5442		if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
> >   5443			update_min_vruntime(cfs_rq);
> >   5444	
> >   5445		if (flags & DEQUEUE_DELAYED)
> >   5446			finish_delayed_dequeue_entity(se);
> >   5447	
> >   5448		if (cfs_rq->nr_queued == 0) {
> >   5449			update_idle_cfs_rq_clock_pelt(cfs_rq);
> >   5450	#ifdef CONFIG_CFS_BANDWIDTH
> >   5451			if (throttled_hierarchy(cfs_rq)) {
> >   5452				struct rq *rq = rq_of(cfs_rq);
> >   5453	
> >   5454				list_del_leaf_cfs_rq(cfs_rq);
> >   5455				cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> > > 5456				cfs_rq->pelt_clock_throttled = 1;
> >   5457			}
> >   5458	#endif
> >   5459		}
> >   5460	
> >   5461		return true;
> >   5462	}
> >   5463	
> > 
> > -- 
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 update 3/5] sched/fair: Switch to task based throttle model
  2025-07-16  7:40     ` Philip Li
@ 2025-07-16 11:15       ` Aaron Lu
  0 siblings, 0 replies; 5+ messages in thread
From: Aaron Lu @ 2025-07-16 11:15 UTC (permalink / raw)
  To: Philip Li
  Cc: kernel test robot, Valentin Schneider, Ben Segall,
	K Prateek Nayak, Peter Zijlstra, Chengming Zhou, Josh Don,
	Ingo Molnar, Vincent Guittot, Xi Wang, llvm, oe-kbuild-all,
	linux-kernel, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Mel Gorman, Chuyi Zhou, Jan Kiszka, Florian Bezdeka, Songtang Liu

On Wed, Jul 16, 2025 at 03:40:26PM +0800, Philip Li wrote:
> On Wed, Jul 16, 2025 at 02:57:07PM +0800, Aaron Lu wrote:
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index dbe52e18b93a0..434f816a56701 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -737,8 +737,8 @@ struct cfs_rq {
> >  	u64			throttled_clock_pelt_time;
> >  	u64			throttled_clock_self;
> >  	u64			throttled_clock_self_time;
> > -	int			throttled:1;
> > -	int			pelt_clock_throttled:1;
> > +	bool			throttled:1;
> > +	bool			pelt_clock_throttled:1;
> >  	int			throttle_count;
> >  	struct list_head	throttled_list;
> >  	struct list_head	throttled_csd_list;
> > 
> > Hi LKP,
> > 
> > I tried using clang-19 but couldn't reproduce this warning and I don't
> > have clang-20 at hand. Can you please apply the above hunk on top of
> > this series and see if that warning is gone? Thanks.
> 
> Got it, is it possible to give a try for the reproduce step [1], which can
> download clang-20 to local dir? If it has issue, we will follow up to check
> as early as possible.

I managed to install clang-20 and can see this warning. With the above
hunk applied, the warning is gone.

Here is the updated patch3:

From: Valentin Schneider <vschneid@redhat.com>
Date: Fri, 23 May 2025 15:30:15 +0800
Subject: [PATCH v3 update 3/5] sched/fair: Switch to task based throttle model

In current throttle model, when a cfs_rq is throttled, its entity will
be dequeued from cpu's rq, making tasks attached to it not able to run,
thus achiveing the throttle target.

This has a drawback though: assume a task is a reader of percpu_rwsem
and is waiting. When it gets woken, it can not run till its task group's
next period comes, which can be a relatively long time. Waiting writer
will have to wait longer due to this and it also makes further reader
build up and eventually trigger task hung.

To improve this situation, change the throttle model to task based, i.e.
when a cfs_rq is throttled, record its throttled status but do not remove
it from cpu's rq. Instead, for tasks that belong to this cfs_rq, when
they get picked, add a task work to them so that when they return
to user, they can be dequeued there. In this way, tasks throttled will
not hold any kernel resources. And on unthrottle, enqueue back those
tasks so they can continue to run.

Throttled cfs_rq's PELT clock is handled differently now: previously the
cfs_rq's PELT clock is stopped once it entered throttled state but since
now tasks(in kernel mode) can continue to run, change the behaviour to
stop PELT clock when the throttled cfs_rq has no tasks left.

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Chengming Zhou <chengming.zhou@linux.dev> # tag on pick
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
---
v3 update:
- fix compiler warning about int bit-field.

 kernel/sched/fair.c  | 336 ++++++++++++++++++++++---------------------
 kernel/sched/pelt.h  |   4 +-
 kernel/sched/sched.h |   3 +-
 3 files changed, 176 insertions(+), 167 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 54c2a4df6a5d1..0eeea7f2e693d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5285,18 +5285,23 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 	if (cfs_rq->nr_queued == 1) {
 		check_enqueue_throttle(cfs_rq);
-		if (!throttled_hierarchy(cfs_rq)) {
-			list_add_leaf_cfs_rq(cfs_rq);
-		} else {
+		list_add_leaf_cfs_rq(cfs_rq);
 #ifdef CONFIG_CFS_BANDWIDTH
+		if (throttled_hierarchy(cfs_rq)) {
 			struct rq *rq = rq_of(cfs_rq);
 
 			if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock)
 				cfs_rq->throttled_clock = rq_clock(rq);
 			if (!cfs_rq->throttled_clock_self)
 				cfs_rq->throttled_clock_self = rq_clock(rq);
-#endif
+
+			if (cfs_rq->pelt_clock_throttled) {
+				cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
+					cfs_rq->throttled_clock_pelt;
+				cfs_rq->pelt_clock_throttled = 0;
+			}
 		}
+#endif
 	}
 }
 
@@ -5335,8 +5340,6 @@ static void set_delayed(struct sched_entity *se)
 		struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
 		cfs_rq->h_nr_runnable--;
-		if (cfs_rq_throttled(cfs_rq))
-			break;
 	}
 }
 
@@ -5357,8 +5360,6 @@ static void clear_delayed(struct sched_entity *se)
 		struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
 		cfs_rq->h_nr_runnable++;
-		if (cfs_rq_throttled(cfs_rq))
-			break;
 	}
 }
 
@@ -5444,8 +5445,18 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	if (flags & DEQUEUE_DELAYED)
 		finish_delayed_dequeue_entity(se);
 
-	if (cfs_rq->nr_queued == 0)
+	if (cfs_rq->nr_queued == 0) {
 		update_idle_cfs_rq_clock_pelt(cfs_rq);
+#ifdef CONFIG_CFS_BANDWIDTH
+		if (throttled_hierarchy(cfs_rq)) {
+			struct rq *rq = rq_of(cfs_rq);
+
+			list_del_leaf_cfs_rq(cfs_rq);
+			cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+			cfs_rq->pelt_clock_throttled = 1;
+		}
+#endif
+	}
 
 	return true;
 }
@@ -5784,6 +5795,10 @@ static void throttle_cfs_rq_work(struct callback_head *work)
 		WARN_ON_ONCE(p->throttled || !list_empty(&p->throttle_node));
 		dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
 		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
+		/*
+		 * Must not set throttled before dequeue or dequeue will
+		 * mistakenly regard this task as an already throttled one.
+		 */
 		p->throttled = true;
 		resched_curr(rq);
 	}
@@ -5797,32 +5812,119 @@ void init_cfs_throttle_work(struct task_struct *p)
 	INIT_LIST_HEAD(&p->throttle_node);
 }
 
+/*
+ * Task is throttled and someone wants to dequeue it again:
+ * it could be sched/core when core needs to do things like
+ * task affinity change, task group change, task sched class
+ * change etc. and in these cases, DEQUEUE_SLEEP is not set;
+ * or the task is blocked after throttled due to freezer etc.
+ * and in these cases, DEQUEUE_SLEEP is set.
+ */
+static void detach_task_cfs_rq(struct task_struct *p);
+static void dequeue_throttled_task(struct task_struct *p, int flags)
+{
+	WARN_ON_ONCE(p->se.on_rq);
+	list_del_init(&p->throttle_node);
+
+	/* task blocked after throttled */
+	if (flags & DEQUEUE_SLEEP) {
+		p->throttled = false;
+		return;
+	}
+
+	/*
+	 * task is migrating off its old cfs_rq, detach
+	 * the task's load from its old cfs_rq.
+	 */
+	if (task_on_rq_migrating(p))
+		detach_task_cfs_rq(p);
+}
+
+static bool enqueue_throttled_task(struct task_struct *p)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(&p->se);
+
+	/*
+	 * If the throttled task is enqueued to a throttled cfs_rq,
+	 * take the fast path by directly put the task on target
+	 * cfs_rq's limbo list, except when p is current because
+	 * the following race can cause p's group_node left in rq's
+	 * cfs_tasks list when it's throttled:
+	 *
+	 *        cpuX                       cpuY
+	 *   taskA ret2user
+	 *  throttle_cfs_rq_work()    sched_move_task(taskA)
+	 *  task_rq_lock acquired
+	 *  dequeue_task_fair(taskA)
+	 *  task_rq_lock released
+	 *                            task_rq_lock acquired
+	 *                            task_current_donor(taskA) == true
+	 *                            task_on_rq_queued(taskA) == true
+	 *                            dequeue_task(taskA)
+	 *                            put_prev_task(taskA)
+	 *                            sched_change_group()
+	 *                            enqueue_task(taskA) -> taskA's new cfs_rq
+	 *                                                   is throttled, go
+	 *                                                   fast path and skip
+	 *                                                   actual enqueue
+	 *                            set_next_task(taskA)
+	 *                          __set_next_task_fair(taskA)
+	 *                    list_move(&se->group_node, &rq->cfs_tasks); // bug
+	 *  schedule()
+	 *
+	 * And in the above race case, the task's current cfs_rq is in the same
+	 * rq as its previous cfs_rq because sched_move_task() doesn't migrate
+	 * task so we can use its current cfs_rq to derive rq and test if the
+	 * task is current.
+	 */
+	if (throttled_hierarchy(cfs_rq) &&
+	    !task_current_donor(rq_of(cfs_rq), p)) {
+		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
+		return true;
+	}
+
+	/* we can't take the fast path, do an actual enqueue*/
+	p->throttled = false;
+	return false;
+}
+
+static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags);
 static int tg_unthrottle_up(struct task_group *tg, void *data)
 {
 	struct rq *rq = data;
 	struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
+	struct task_struct *p, *tmp;
+
+	if (--cfs_rq->throttle_count)
+		return 0;
 
-	cfs_rq->throttle_count--;
-	if (!cfs_rq->throttle_count) {
+	if (cfs_rq->pelt_clock_throttled) {
 		cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
 					     cfs_rq->throttled_clock_pelt;
+		cfs_rq->pelt_clock_throttled = 0;
+	}
 
-		/* Add cfs_rq with load or one or more already running entities to the list */
-		if (!cfs_rq_is_decayed(cfs_rq))
-			list_add_leaf_cfs_rq(cfs_rq);
+	if (cfs_rq->throttled_clock_self) {
+		u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
 
-		if (cfs_rq->throttled_clock_self) {
-			u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
+		cfs_rq->throttled_clock_self = 0;
 
-			cfs_rq->throttled_clock_self = 0;
+		if (WARN_ON_ONCE((s64)delta < 0))
+			delta = 0;
 
-			if (WARN_ON_ONCE((s64)delta < 0))
-				delta = 0;
+		cfs_rq->throttled_clock_self_time += delta;
+	}
 
-			cfs_rq->throttled_clock_self_time += delta;
-		}
+	/* Re-enqueue the tasks that have been throttled at this level. */
+	list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
+		list_del_init(&p->throttle_node);
+		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
 	}
 
+	/* Add cfs_rq with load or one or more already running entities to the list */
+	if (!cfs_rq_is_decayed(cfs_rq))
+		list_add_leaf_cfs_rq(cfs_rq);
+
 	return 0;
 }
 
@@ -5851,17 +5953,25 @@ static int tg_throttle_down(struct task_group *tg, void *data)
 	struct rq *rq = data;
 	struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
 
+	if (cfs_rq->throttle_count++)
+		return 0;
+
+
 	/* group is entering throttled state, stop time */
-	if (!cfs_rq->throttle_count) {
-		cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+	WARN_ON_ONCE(cfs_rq->throttled_clock_self);
+	if (cfs_rq->nr_queued)
+		cfs_rq->throttled_clock_self = rq_clock(rq);
+	else {
+		/*
+		 * For cfs_rqs that still have entities enqueued, PELT clock
+		 * stop happens at dequeue time when all entities are dequeued.
+		 */
 		list_del_leaf_cfs_rq(cfs_rq);
-
-		WARN_ON_ONCE(cfs_rq->throttled_clock_self);
-		if (cfs_rq->nr_queued)
-			cfs_rq->throttled_clock_self = rq_clock(rq);
+		cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+		cfs_rq->pelt_clock_throttled = 1;
 	}
-	cfs_rq->throttle_count++;
 
+	WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
 	return 0;
 }
 
@@ -5869,8 +5979,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
 {
 	struct rq *rq = rq_of(cfs_rq);
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
-	struct sched_entity *se;
-	long queued_delta, runnable_delta, idle_delta, dequeue = 1;
+	int dequeue = 1;
 
 	raw_spin_lock(&cfs_b->lock);
 	/* This will start the period timer if necessary */
@@ -5893,68 +6002,11 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	if (!dequeue)
 		return false;  /* Throttle no longer required. */
 
-	se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
-
 	/* freeze hierarchy runnable averages while throttled */
 	rcu_read_lock();
 	walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq);
 	rcu_read_unlock();
 
-	queued_delta = cfs_rq->h_nr_queued;
-	runnable_delta = cfs_rq->h_nr_runnable;
-	idle_delta = cfs_rq->h_nr_idle;
-	for_each_sched_entity(se) {
-		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-		int flags;
-
-		/* throttled entity or throttle-on-deactivate */
-		if (!se->on_rq)
-			goto done;
-
-		/*
-		 * Abuse SPECIAL to avoid delayed dequeue in this instance.
-		 * This avoids teaching dequeue_entities() about throttled
-		 * entities and keeps things relatively simple.
-		 */
-		flags = DEQUEUE_SLEEP | DEQUEUE_SPECIAL;
-		if (se->sched_delayed)
-			flags |= DEQUEUE_DELAYED;
-		dequeue_entity(qcfs_rq, se, flags);
-
-		if (cfs_rq_is_idle(group_cfs_rq(se)))
-			idle_delta = cfs_rq->h_nr_queued;
-
-		qcfs_rq->h_nr_queued -= queued_delta;
-		qcfs_rq->h_nr_runnable -= runnable_delta;
-		qcfs_rq->h_nr_idle -= idle_delta;
-
-		if (qcfs_rq->load.weight) {
-			/* Avoid re-evaluating load for this entity: */
-			se = parent_entity(se);
-			break;
-		}
-	}
-
-	for_each_sched_entity(se) {
-		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-		/* throttled entity or throttle-on-deactivate */
-		if (!se->on_rq)
-			goto done;
-
-		update_load_avg(qcfs_rq, se, 0);
-		se_update_runnable(se);
-
-		if (cfs_rq_is_idle(group_cfs_rq(se)))
-			idle_delta = cfs_rq->h_nr_queued;
-
-		qcfs_rq->h_nr_queued -= queued_delta;
-		qcfs_rq->h_nr_runnable -= runnable_delta;
-		qcfs_rq->h_nr_idle -= idle_delta;
-	}
-
-	/* At this point se is NULL and we are at root level*/
-	sub_nr_running(rq, queued_delta);
-done:
 	/*
 	 * Note: distribution will already see us throttled via the
 	 * throttled-list.  rq->lock protects completion.
@@ -5970,9 +6022,20 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 {
 	struct rq *rq = rq_of(cfs_rq);
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
-	struct sched_entity *se;
-	long queued_delta, runnable_delta, idle_delta;
-	long rq_h_nr_queued = rq->cfs.h_nr_queued;
+	struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)];
+
+	/*
+	 * It's possible we are called with !runtime_remaining due to things
+	 * like user changed quota setting(see tg_set_cfs_bandwidth()) or async
+	 * unthrottled us with a positive runtime_remaining but other still
+	 * running entities consumed those runtime before we reached here.
+	 *
+	 * Anyway, we can't unthrottle this cfs_rq without any runtime remaining
+	 * because any enqueue in tg_unthrottle_up() will immediately trigger a
+	 * throttle, which is not supposed to happen on unthrottle path.
+	 */
+	if (cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <= 0)
+		return;
 
 	se = cfs_rq->tg->se[cpu_of(rq)];
 
@@ -6002,62 +6065,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 			if (list_add_leaf_cfs_rq(cfs_rq_of(se)))
 				break;
 		}
-		goto unthrottle_throttle;
 	}
 
-	queued_delta = cfs_rq->h_nr_queued;
-	runnable_delta = cfs_rq->h_nr_runnable;
-	idle_delta = cfs_rq->h_nr_idle;
-	for_each_sched_entity(se) {
-		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-
-		/* Handle any unfinished DELAY_DEQUEUE business first. */
-		if (se->sched_delayed) {
-			int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
-
-			dequeue_entity(qcfs_rq, se, flags);
-		} else if (se->on_rq)
-			break;
-		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
-
-		if (cfs_rq_is_idle(group_cfs_rq(se)))
-			idle_delta = cfs_rq->h_nr_queued;
-
-		qcfs_rq->h_nr_queued += queued_delta;
-		qcfs_rq->h_nr_runnable += runnable_delta;
-		qcfs_rq->h_nr_idle += idle_delta;
-
-		/* end evaluation on encountering a throttled cfs_rq */
-		if (cfs_rq_throttled(qcfs_rq))
-			goto unthrottle_throttle;
-	}
-
-	for_each_sched_entity(se) {
-		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-
-		update_load_avg(qcfs_rq, se, UPDATE_TG);
-		se_update_runnable(se);
-
-		if (cfs_rq_is_idle(group_cfs_rq(se)))
-			idle_delta = cfs_rq->h_nr_queued;
-
-		qcfs_rq->h_nr_queued += queued_delta;
-		qcfs_rq->h_nr_runnable += runnable_delta;
-		qcfs_rq->h_nr_idle += idle_delta;
-
-		/* end evaluation on encountering a throttled cfs_rq */
-		if (cfs_rq_throttled(qcfs_rq))
-			goto unthrottle_throttle;
-	}
-
-	/* Start the fair server if un-throttling resulted in new runnable tasks */
-	if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
-		dl_server_start(&rq->fair_server);
-
-	/* At this point se is NULL and we are at root level*/
-	add_nr_running(rq, queued_delta);
-
-unthrottle_throttle:
 	assert_list_leaf_cfs_rq(rq);
 
 	/* Determine whether we need to wake up potentially idle CPU: */
@@ -6711,6 +6720,8 @@ static inline void sync_throttle(struct task_group *tg, int cpu) {}
 static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
 static void task_throttle_setup_work(struct task_struct *p) {}
 static bool task_is_throttled(struct task_struct *p) { return false; }
+static void dequeue_throttled_task(struct task_struct *p, int flags) {}
+static bool enqueue_throttled_task(struct task_struct *p) { return false; }
 
 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
 {
@@ -6903,6 +6914,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	int rq_h_nr_queued = rq->cfs.h_nr_queued;
 	u64 slice = 0;
 
+	if (unlikely(task_is_throttled(p) && enqueue_throttled_task(p)))
+		return;
+
 	/*
 	 * The code below (indirectly) updates schedutil which looks at
 	 * the cfs_rq utilization to select a frequency.
@@ -6955,10 +6969,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		if (cfs_rq_is_idle(cfs_rq))
 			h_nr_idle = 1;
 
-		/* end evaluation on encountering a throttled cfs_rq */
-		if (cfs_rq_throttled(cfs_rq))
-			goto enqueue_throttle;
-
 		flags = ENQUEUE_WAKEUP;
 	}
 
@@ -6980,10 +6990,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 
 		if (cfs_rq_is_idle(cfs_rq))
 			h_nr_idle = 1;
-
-		/* end evaluation on encountering a throttled cfs_rq */
-		if (cfs_rq_throttled(cfs_rq))
-			goto enqueue_throttle;
 	}
 
 	if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
@@ -7013,7 +7019,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	if (!task_new)
 		check_update_overutilized_status(rq);
 
-enqueue_throttle:
 	assert_list_leaf_cfs_rq(rq);
 
 	hrtick_update(rq);
@@ -7068,10 +7073,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 		if (cfs_rq_is_idle(cfs_rq))
 			h_nr_idle = h_nr_queued;
 
-		/* end evaluation on encountering a throttled cfs_rq */
-		if (cfs_rq_throttled(cfs_rq))
-			return 0;
-
 		/* Don't dequeue parent if it has other entities besides us */
 		if (cfs_rq->load.weight) {
 			slice = cfs_rq_min_slice(cfs_rq);
@@ -7108,10 +7109,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 
 		if (cfs_rq_is_idle(cfs_rq))
 			h_nr_idle = h_nr_queued;
-
-		/* end evaluation on encountering a throttled cfs_rq */
-		if (cfs_rq_throttled(cfs_rq))
-			return 0;
 	}
 
 	sub_nr_running(rq, h_nr_queued);
@@ -7145,6 +7142,11 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
  */
 static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 {
+	if (unlikely(task_is_throttled(p))) {
+		dequeue_throttled_task(p, flags);
+		return true;
+	}
+
 	if (!p->se.sched_delayed)
 		util_est_dequeue(&rq->cfs, p);
 
@@ -8813,19 +8815,22 @@ static struct task_struct *pick_task_fair(struct rq *rq)
 {
 	struct sched_entity *se;
 	struct cfs_rq *cfs_rq;
+	struct task_struct *p;
+	bool throttled;
 
 again:
 	cfs_rq = &rq->cfs;
 	if (!cfs_rq->nr_queued)
 		return NULL;
 
+	throttled = false;
+
 	do {
 		/* Might not have done put_prev_entity() */
 		if (cfs_rq->curr && cfs_rq->curr->on_rq)
 			update_curr(cfs_rq);
 
-		if (unlikely(check_cfs_rq_runtime(cfs_rq)))
-			goto again;
+		throttled |= check_cfs_rq_runtime(cfs_rq);
 
 		se = pick_next_entity(rq, cfs_rq);
 		if (!se)
@@ -8833,7 +8838,10 @@ static struct task_struct *pick_task_fair(struct rq *rq)
 		cfs_rq = group_cfs_rq(se);
 	} while (cfs_rq);
 
-	return task_of(se);
+	p = task_of(se);
+	if (unlikely(throttled))
+		task_throttle_setup_work(p);
+	return p;
 }
 
 static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool first);
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 62c3fa543c0f2..f921302dc40fb 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -162,7 +162,7 @@ static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
 {
 	u64 throttled;
 
-	if (unlikely(cfs_rq->throttle_count))
+	if (unlikely(cfs_rq->pelt_clock_throttled))
 		throttled = U64_MAX;
 	else
 		throttled = cfs_rq->throttled_clock_pelt_time;
@@ -173,7 +173,7 @@ static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
 /* rq->task_clock normalized against any time this cfs_rq has spent throttled */
 static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
 {
-	if (unlikely(cfs_rq->throttle_count))
+	if (unlikely(cfs_rq->pelt_clock_throttled))
 		return cfs_rq->throttled_clock_pelt - cfs_rq->throttled_clock_pelt_time;
 
 	return rq_clock_pelt(rq_of(cfs_rq)) - cfs_rq->throttled_clock_pelt_time;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b0c9559992d8a..51d000ab4a70d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -737,7 +737,8 @@ struct cfs_rq {
 	u64			throttled_clock_pelt_time;
 	u64			throttled_clock_self;
 	u64			throttled_clock_self_time;
-	int			throttled;
+	bool			throttled:1;
+	bool			pelt_clock_throttled:1;
 	int			throttle_count;
 	struct list_head	throttled_list;
 	struct list_head	throttled_csd_list;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
  2025-07-16  6:57   ` Aaron Lu
  2025-07-16  7:40     ` Philip Li
@ 2025-07-16 11:27     ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2025-07-16 11:27 UTC (permalink / raw)
  To: Aaron Lu
  Cc: kernel test robot, Valentin Schneider, Ben Segall,
	K Prateek Nayak, Chengming Zhou, Josh Don, Ingo Molnar,
	Vincent Guittot, Xi Wang, llvm, oe-kbuild-all, linux-kernel,
	Juri Lelli, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Chuyi Zhou, Jan Kiszka, Florian Bezdeka, Songtang Liu

On Wed, Jul 16, 2025 at 02:57:07PM +0800, Aaron Lu wrote:
> On Wed, Jul 16, 2025 at 07:29:37AM +0800, kernel test robot wrote:
> > Hi Aaron,
> > 
> > kernel test robot noticed the following build warnings:
> > 
> > [auto build test WARNING on tip/sched/core]
> > [also build test WARNING on next-20250715]
> > [cannot apply to linus/master v6.16-rc6]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> > 
> > url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Lu/sched-fair-Add-related-data-structure-for-task-based-throttle/20250715-152307
> > base:   tip/sched/core
> > patch link:    https://lore.kernel.org/r/20250715071658.267-4-ziqianlu%40bytedance.com
> > patch subject: [PATCH v3 3/5] sched/fair: Switch to task based throttle model
> > config: i386-buildonly-randconfig-006-20250716 (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/config)
> > compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250716/202507160730.0cXkgs0S-lkp@intel.com/reproduce)
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202507160730.0cXkgs0S-lkp@intel.com/
> > 
> > All warnings (new ones prefixed by >>):
> > 
> > >> kernel/sched/fair.c:5456:33: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
> >     5456 |                         cfs_rq->pelt_clock_throttled = 1;
> >          |                                                      ^ ~

Nice warning from clang.

> 
> Thanks for the report.
> 
> I don't think this will affect correctness since both cfs_rq's throttled
> and pelt_clock_throttled fields are used as true(not 0) or false(0). I
> used bitfield for them to save some space.
> 
> Change their types to either unsigned int or bool should cure this
> warning, I suppose bool looks more clear?
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index dbe52e18b93a0..434f816a56701 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -737,8 +737,8 @@ struct cfs_rq {
>  	u64			throttled_clock_pelt_time;
>  	u64			throttled_clock_self;
>  	u64			throttled_clock_self_time;
> -	int			throttled:1;
> -	int			pelt_clock_throttled:1;
> +	bool			throttled:1;
> +	bool			pelt_clock_throttled:1;
>  	int			throttle_count;
>  	struct list_head	throttled_list;
>  	struct list_head	throttled_csd_list;

Yeah, either this or any unsigned type will do.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-16 11:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250715071658.267-4-ziqianlu@bytedance.com>
2025-07-15 23:29 ` [PATCH v3 3/5] sched/fair: Switch to task based throttle model kernel test robot
2025-07-16  6:57   ` Aaron Lu
2025-07-16  7:40     ` Philip Li
2025-07-16 11:15       ` [PATCH v3 update " Aaron Lu
2025-07-16 11:27     ` [PATCH v3 " Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).