* [PATCH 0/3] sched: Miscellaneous isolation related cleanups
@ 2024-08-18 23:45 Waiman Long
2024-08-18 23:45 ` [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full Waiman Long
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Waiman Long @ 2024-08-18 23:45 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Frederic Weisbecker
Cc: linux-kernel, Waiman Long
This series contains a number of miscellaneous sched/isolation related
cleanups.
Waiman Long (3):
sched/isolation: Add HK_FLAG_SCHED to nohz_full
sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
sched/isolation: Consolidate housekeeping cpumasks that are always
identical
include/linux/sched/isolation.h | 19 +++++++++++++------
kernel/sched/fair.c | 7 ++-----
kernel/sched/isolation.c | 2 +-
3 files changed, 16 insertions(+), 12 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-08-18 23:45 [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
@ 2024-08-18 23:45 ` Waiman Long
2024-09-03 13:10 ` Frederic Weisbecker
2024-08-18 23:45 ` [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs Waiman Long
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2024-08-18 23:45 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Frederic Weisbecker
Cc: linux-kernel, Waiman Long
The HK_FLAG_SCHED/HK_TYPE_SCHED flag is defined and is also used
in kernel/sched/fair.c since commit de201559df87 ("sched/isolation:
Introduce housekeeping flags"). However, the corresponding cpumask isn't
currently updated anywhere. So the mask is always cpu_possible_mask.
Add it in nohz_full setup so that nohz_full CPUs will now be removed
from HK_TYPE_SCHED cpumask.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/sched/isolation.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5891e715f00d..a514994af319 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -196,7 +196,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
unsigned long flags;
flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
- HK_FLAG_MISC | HK_FLAG_KTHREAD;
+ HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_SCHED;
return housekeeping_setup(str, flags);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
2024-08-18 23:45 [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
2024-08-18 23:45 ` [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full Waiman Long
@ 2024-08-18 23:45 ` Waiman Long
2024-09-03 13:12 ` Frederic Weisbecker
2024-08-18 23:45 ` [PATCH 3/3] sched/isolation: Consolidate housekeeping cpumasks that are always identical Waiman Long
2024-08-23 18:23 ` [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
3 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2024-08-18 23:45 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Frederic Weisbecker
Cc: linux-kernel, Waiman Long
As the previous commit has enabled the setting of HK_TYPE_SCHED
housekeeping CPUs in nohz_full setup, we can now use the more aptly
named HK_TYPE_SCHED housekeeping CPUs instead of HK_TYPE_MISC.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/sched/fair.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6be618110885..0350667f5ce8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12128,16 +12128,13 @@ static inline int on_null_domain(struct rq *rq)
* - When one of the busy CPUs notices that there may be an idle rebalancing
* needed, they will kick the idle load balancer, which then does idle
* load balancing for all the idle CPUs.
- *
- * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED is not set
- * anywhere yet.
*/
static inline int find_new_ilb(void)
{
const struct cpumask *hk_mask;
int ilb_cpu;
- hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
+ hk_mask = housekeeping_cpumask(HK_TYPE_SCHED);
for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask) {
@@ -12155,7 +12152,7 @@ static inline int find_new_ilb(void)
* Kick a CPU to do the NOHZ balancing, if it is time for it, via a cross-CPU
* SMP function call (IPI).
*
- * We pick the first idle CPU in the HK_TYPE_MISC housekeeping set (if there is one).
+ * We pick the first idle CPU in the HK_TYPE_SCHED housekeeping set (if there is one).
*/
static void kick_ilb(unsigned int flags)
{
--
2.43.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/3] sched/isolation: Consolidate housekeeping cpumasks that are always identical
2024-08-18 23:45 [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
2024-08-18 23:45 ` [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full Waiman Long
2024-08-18 23:45 ` [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs Waiman Long
@ 2024-08-18 23:45 ` Waiman Long
2024-08-23 18:23 ` [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
3 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2024-08-18 23:45 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Frederic Weisbecker
Cc: linux-kernel, Waiman Long
The housekeeping cpumasks are only set by two boot commandline
parameters: "nohz_full" and "isolcpus". When there is more than one of
"nohz_full" or "isolcpus", the extra ones must have the same CPU list
or the setup will fail partially.
The HK_TYPE_TICK, HK_TYPE_DOMAIN and HK_TYPE_MANAGED_IRQ types are
settable by "isolcpus" and they can be set individually. The other
housekeeping types are all set by "nohz_full" without a way to set them
individually. So they all have identical cpumasks.
There is actually no point to have different cpumasks for these
"nohz_full" only housekeeping types. Consolidate these types to use the
same cpumask by aliasing them to the same value. If there is a need to
set any of them independently in the future, we can break them out to
their own cpumasks again.
With this change, the number of cpumasks in the housekeeping structure
drops from 9 to 4.
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/sched/isolation.h | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 2b461129d1fa..da32b502f0c6 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -7,15 +7,22 @@
#include <linux/tick.h>
enum hk_type {
- HK_TYPE_TIMER,
- HK_TYPE_RCU,
- HK_TYPE_MISC,
- HK_TYPE_SCHED,
HK_TYPE_TICK,
HK_TYPE_DOMAIN,
- HK_TYPE_WQ,
HK_TYPE_MANAGED_IRQ,
- HK_TYPE_KTHREAD,
+ HK_TYPE_NOHZ_FULL,
+
+ /*
+ * The following housekeeping types are only set by the nohz_full
+ * boot commandline option. So they can share the same value.
+ */
+ HK_TYPE_TIMER = HK_TYPE_NOHZ_FULL,
+ HK_TYPE_RCU = HK_TYPE_NOHZ_FULL,
+ HK_TYPE_MISC = HK_TYPE_NOHZ_FULL,
+ HK_TYPE_SCHED = HK_TYPE_NOHZ_FULL,
+ HK_TYPE_WQ = HK_TYPE_NOHZ_FULL,
+ HK_TYPE_KTHREAD = HK_TYPE_NOHZ_FULL,
+
HK_TYPE_MAX
};
--
2.43.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/3] sched: Miscellaneous isolation related cleanups
2024-08-18 23:45 [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
` (2 preceding siblings ...)
2024-08-18 23:45 ` [PATCH 3/3] sched/isolation: Consolidate housekeeping cpumasks that are always identical Waiman Long
@ 2024-08-23 18:23 ` Waiman Long
3 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2024-08-23 18:23 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Frederic Weisbecker
Cc: linux-kernel
On 8/18/24 19:45, Waiman Long wrote:
> This series contains a number of miscellaneous sched/isolation related
> cleanups.
>
> Waiman Long (3):
> sched/isolation: Add HK_FLAG_SCHED to nohz_full
> sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
> sched/isolation: Consolidate housekeeping cpumasks that are always
> identical
>
> include/linux/sched/isolation.h | 19 +++++++++++++------
> kernel/sched/fair.c | 7 ++-----
> kernel/sched/isolation.c | 2 +-
> 3 files changed, 16 insertions(+), 12 deletions(-)
Ping. Any comment or suggestion on these patches?
Thanks,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-08-18 23:45 ` [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full Waiman Long
@ 2024-09-03 13:10 ` Frederic Weisbecker
2024-09-03 13:24 ` Waiman Long
0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2024-09-03 13:10 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
Le Sun, Aug 18, 2024 at 07:45:18PM -0400, Waiman Long a écrit :
> The HK_FLAG_SCHED/HK_TYPE_SCHED flag is defined and is also used
> in kernel/sched/fair.c since commit de201559df87 ("sched/isolation:
> Introduce housekeeping flags"). However, the corresponding cpumask isn't
> currently updated anywhere. So the mask is always cpu_possible_mask.
>
> Add it in nohz_full setup so that nohz_full CPUs will now be removed
> from HK_TYPE_SCHED cpumask.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/sched/isolation.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 5891e715f00d..a514994af319 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -196,7 +196,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
> unsigned long flags;
>
> flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
> - HK_FLAG_MISC | HK_FLAG_KTHREAD;
> + HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_SCHED;
>
> return housekeeping_setup(str, flags);
> }
find_new_ilb() already has HK_FLAG_MISC to prevent an isolated CPU
from being elected as an ilb. So I think we should simply remove HK_FLAG_SCHED.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
2024-08-18 23:45 ` [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs Waiman Long
@ 2024-09-03 13:12 ` Frederic Weisbecker
2024-09-03 13:53 ` Waiman Long
0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2024-09-03 13:12 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
Le Sun, Aug 18, 2024 at 07:45:19PM -0400, Waiman Long a écrit :
> As the previous commit has enabled the setting of HK_TYPE_SCHED
> housekeeping CPUs in nohz_full setup, we can now use the more aptly
> named HK_TYPE_SCHED housekeeping CPUs instead of HK_TYPE_MISC.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
Can we instead merge HK_FLAG_TICK, HK_FLAG_WQ, HK_FLAG_TIMER, HK_FLAG_RCU,
HK_FLAG_MISC and HK_FLAG_KTHREAD into a single
HK_FLAG_KERNEL_NOISE / HK_TYPE_KERNEL_NOISE ?
Thanks.
> ---
> kernel/sched/fair.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6be618110885..0350667f5ce8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12128,16 +12128,13 @@ static inline int on_null_domain(struct rq *rq)
> * - When one of the busy CPUs notices that there may be an idle rebalancing
> * needed, they will kick the idle load balancer, which then does idle
> * load balancing for all the idle CPUs.
> - *
> - * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED is not set
> - * anywhere yet.
> */
> static inline int find_new_ilb(void)
> {
> const struct cpumask *hk_mask;
> int ilb_cpu;
>
> - hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
> + hk_mask = housekeeping_cpumask(HK_TYPE_SCHED);
>
> for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask) {
>
> @@ -12155,7 +12152,7 @@ static inline int find_new_ilb(void)
> * Kick a CPU to do the NOHZ balancing, if it is time for it, via a cross-CPU
> * SMP function call (IPI).
> *
> - * We pick the first idle CPU in the HK_TYPE_MISC housekeeping set (if there is one).
> + * We pick the first idle CPU in the HK_TYPE_SCHED housekeeping set (if there is one).
> */
> static void kick_ilb(unsigned int flags)
> {
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-03 13:10 ` Frederic Weisbecker
@ 2024-09-03 13:24 ` Waiman Long
2024-09-03 21:32 ` Frederic Weisbecker
0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2024-09-03 13:24 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On 9/3/24 09:10, Frederic Weisbecker wrote:
> Le Sun, Aug 18, 2024 at 07:45:18PM -0400, Waiman Long a écrit :
>> The HK_FLAG_SCHED/HK_TYPE_SCHED flag is defined and is also used
>> in kernel/sched/fair.c since commit de201559df87 ("sched/isolation:
>> Introduce housekeeping flags"). However, the corresponding cpumask isn't
>> currently updated anywhere. So the mask is always cpu_possible_mask.
>>
>> Add it in nohz_full setup so that nohz_full CPUs will now be removed
>> from HK_TYPE_SCHED cpumask.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/sched/isolation.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>> index 5891e715f00d..a514994af319 100644
>> --- a/kernel/sched/isolation.c
>> +++ b/kernel/sched/isolation.c
>> @@ -196,7 +196,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
>> unsigned long flags;
>>
>> flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
>> - HK_FLAG_MISC | HK_FLAG_KTHREAD;
>> + HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_SCHED;
>>
>> return housekeeping_setup(str, flags);
>> }
> find_new_ilb() already has HK_FLAG_MISC to prevent an isolated CPU
> from being elected as an ilb. So I think we should simply remove HK_FLAG_SCHED.
There is a check for HK_TYPE_SCHED in nohz_balance_enter_idle() and
nohz_newidle_balance(), though it is essentially a no-op as the cpumask
has all the CPUs. If we remove HK_TYPE_SCHED, the question now will be
whether we should remove the checks at these 2 functions or change them
to HK_TYPE_MISC.
Cheers,
Longman
>
> Thanks.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
2024-09-03 13:12 ` Frederic Weisbecker
@ 2024-09-03 13:53 ` Waiman Long
2024-09-04 14:54 ` Waiman Long
0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2024-09-03 13:53 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On 9/3/24 09:12, Frederic Weisbecker wrote:
> Le Sun, Aug 18, 2024 at 07:45:19PM -0400, Waiman Long a écrit :
>> As the previous commit has enabled the setting of HK_TYPE_SCHED
>> housekeeping CPUs in nohz_full setup, we can now use the more aptly
>> named HK_TYPE_SCHED housekeeping CPUs instead of HK_TYPE_MISC.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
> Can we instead merge HK_FLAG_TICK, HK_FLAG_WQ, HK_FLAG_TIMER, HK_FLAG_RCU,
> HK_FLAG_MISC and HK_FLAG_KTHREAD into a single
> HK_FLAG_KERNEL_NOISE / HK_TYPE_KERNEL_NOISE ?
Sure. I am open to new name as I am not good on that.
HK_FLAG_KERNEL_NOISE does make sense.
Cheers,
Longman
>
> Thanks.
>
>> ---
>> kernel/sched/fair.c | 7 ++-----
>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6be618110885..0350667f5ce8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -12128,16 +12128,13 @@ static inline int on_null_domain(struct rq *rq)
>> * - When one of the busy CPUs notices that there may be an idle rebalancing
>> * needed, they will kick the idle load balancer, which then does idle
>> * load balancing for all the idle CPUs.
>> - *
>> - * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED is not set
>> - * anywhere yet.
>> */
>> static inline int find_new_ilb(void)
>> {
>> const struct cpumask *hk_mask;
>> int ilb_cpu;
>>
>> - hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
>> + hk_mask = housekeeping_cpumask(HK_TYPE_SCHED);
>>
>> for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask) {
>>
>> @@ -12155,7 +12152,7 @@ static inline int find_new_ilb(void)
>> * Kick a CPU to do the NOHZ balancing, if it is time for it, via a cross-CPU
>> * SMP function call (IPI).
>> *
>> - * We pick the first idle CPU in the HK_TYPE_MISC housekeeping set (if there is one).
>> + * We pick the first idle CPU in the HK_TYPE_SCHED housekeeping set (if there is one).
>> */
>> static void kick_ilb(unsigned int flags)
>> {
>> --
>> 2.43.5
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-03 13:24 ` Waiman Long
@ 2024-09-03 21:32 ` Frederic Weisbecker
2024-09-04 1:23 ` Waiman Long
0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2024-09-03 21:32 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
Le Tue, Sep 03, 2024 at 09:24:08AM -0400, Waiman Long a écrit :
>
> On 9/3/24 09:10, Frederic Weisbecker wrote:
> > Le Sun, Aug 18, 2024 at 07:45:18PM -0400, Waiman Long a écrit :
> > > The HK_FLAG_SCHED/HK_TYPE_SCHED flag is defined and is also used
> > > in kernel/sched/fair.c since commit de201559df87 ("sched/isolation:
> > > Introduce housekeeping flags"). However, the corresponding cpumask isn't
> > > currently updated anywhere. So the mask is always cpu_possible_mask.
> > >
> > > Add it in nohz_full setup so that nohz_full CPUs will now be removed
> > > from HK_TYPE_SCHED cpumask.
> > >
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > > kernel/sched/isolation.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > index 5891e715f00d..a514994af319 100644
> > > --- a/kernel/sched/isolation.c
> > > +++ b/kernel/sched/isolation.c
> > > @@ -196,7 +196,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
> > > unsigned long flags;
> > > flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
> > > - HK_FLAG_MISC | HK_FLAG_KTHREAD;
> > > + HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_SCHED;
> > > return housekeeping_setup(str, flags);
> > > }
> > find_new_ilb() already has HK_FLAG_MISC to prevent an isolated CPU
> > from being elected as an ilb. So I think we should simply remove HK_FLAG_SCHED.
>
> There is a check for HK_TYPE_SCHED in nohz_balance_enter_idle() and
> nohz_newidle_balance(), though it is essentially a no-op as the cpumask has
> all the CPUs. If we remove HK_TYPE_SCHED, the question now will be whether
> we should remove the checks at these 2 functions or change them to
> HK_TYPE_MISC.
Just remove those two. They are dead code and the nohz_full handling
of load balancing needs a rethink anyway.
After discussing with Peter lately, the rules should be:
1) If a nohz_full CPU is part of a multi-CPU domain, then it should
be part of load balancing. Peter even says that nohz_full should be
forbidden in this case, because the tick plays a role in the
load balancing.
2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
domains, then it can be out of the load balancing machinery.
I'm a bit scared about rule 1) because I know there are existing users of
nohz_full on multi-CPU domains... So I feel a bit trapped.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-03 21:32 ` Frederic Weisbecker
@ 2024-09-04 1:23 ` Waiman Long
2024-09-04 12:44 ` Frederic Weisbecker
0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2024-09-04 1:23 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On 9/3/24 17:32, Frederic Weisbecker wrote:
> Le Tue, Sep 03, 2024 at 09:24:08AM -0400, Waiman Long a écrit :
>> On 9/3/24 09:10, Frederic Weisbecker wrote:
>>> Le Sun, Aug 18, 2024 at 07:45:18PM -0400, Waiman Long a écrit :
>>>> The HK_FLAG_SCHED/HK_TYPE_SCHED flag is defined and is also used
>>>> in kernel/sched/fair.c since commit de201559df87 ("sched/isolation:
>>>> Introduce housekeeping flags"). However, the corresponding cpumask isn't
>>>> currently updated anywhere. So the mask is always cpu_possible_mask.
>>>>
>>>> Add it in nohz_full setup so that nohz_full CPUs will now be removed
>>>> from HK_TYPE_SCHED cpumask.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>> kernel/sched/isolation.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>>>> index 5891e715f00d..a514994af319 100644
>>>> --- a/kernel/sched/isolation.c
>>>> +++ b/kernel/sched/isolation.c
>>>> @@ -196,7 +196,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
>>>> unsigned long flags;
>>>> flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
>>>> - HK_FLAG_MISC | HK_FLAG_KTHREAD;
>>>> + HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_SCHED;
>>>> return housekeeping_setup(str, flags);
>>>> }
>>> find_new_ilb() already has HK_FLAG_MISC to prevent an isolated CPU
>>> from being elected as an ilb. So I think we should simply remove HK_FLAG_SCHED.
>> There is a check for HK_TYPE_SCHED in nohz_balance_enter_idle() and
>> nohz_newidle_balance(), though it is essentially a no-op as the cpumask has
>> all the CPUs. If we remove HK_TYPE_SCHED, the question now will be whether
>> we should remove the checks at these 2 functions or change them to
>> HK_TYPE_MISC.
> Just remove those two. They are dead code and the nohz_full handling
> of load balancing needs a rethink anyway.
OK, I will modified the patch to remove the dead code.
>
> After discussing with Peter lately, the rules should be:
>
> 1) If a nohz_full CPU is part of a multi-CPU domain, then it should
> be part of load balancing. Peter even says that nohz_full should be
> forbidden in this case, because the tick plays a role in the
> load balancing.
My understand is that most users will use nohz_full together with
isolcpus. So nohz_full CPUs are also isolated and not in a sched domain.
There may still be user setting nohz_full without isolcpus though, but
that should be relatively rare.
Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs
in isolated cpuset partitions which will not be in a sched domain.
>
> 2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
> domains, then it can be out of the load balancing machinery.
I am aware that a single-cpu domain is the same as being isolated with
no load balancing.
>
> I'm a bit scared about rule 1) because I know there are existing users of
> nohz_full on multi-CPU domains... So I feel a bit trapped.
As stated before, this is not a common use case.
The isolcpus boot option is deprecated, as stated in
kernel-parameters.txt. My plan is to deprecate nohz_full as well once we
are able to make dynamic CPU isolation via cpuset works almost as good
as isolcpus + nohz_full.
Cheers,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 1:23 ` Waiman Long
@ 2024-09-04 12:44 ` Frederic Weisbecker
2024-09-04 13:04 ` Peter Zijlstra
0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2024-09-04 12:44 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
Le Tue, Sep 03, 2024 at 09:23:53PM -0400, Waiman Long a écrit :
> > After discussing with Peter lately, the rules should be:
> >
> > 1) If a nohz_full CPU is part of a multi-CPU domain, then it should
> > be part of load balancing. Peter even says that nohz_full should be
> > forbidden in this case, because the tick plays a role in the
> > load balancing.
>
> My understand is that most users will use nohz_full together with isolcpus.
> So nohz_full CPUs are also isolated and not in a sched domain. There may
> still be user setting nohz_full without isolcpus though, but that should be
> relatively rare.
Apparently there are users wanting to use isolation along with automatic
containers deployments such as kubernetes, which doesn't seem to work
well with isolcpus...
>
> Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs in
> isolated cpuset partitions which will not be in a sched domain.
>
> >
> > 2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
> > domains, then it can be out of the load balancing machinery.
> I am aware that a single-cpu domain is the same as being isolated with no
> load balancing.
By the way is it possible to have a single-cpu domain (sorry I'm a noob here)
or do such CPU always end up on a null domain?
If it is possible, it's interesting to notice that such CPU can become ilb
(as opposed to null domain).
> >
> > I'm a bit scared about rule 1) because I know there are existing users of
> > nohz_full on multi-CPU domains... So I feel a bit trapped.
>
> As stated before, this is not a common use case.
Not sure and anyway it's not a forbidden usecase. But this is anyway outside
the scope of this patchset.
> The isolcpus boot option is deprecated, as stated in kernel-parameters.txt.
We should undeprecate it, apparently it's still widely used. Perhaps by people
who can't afford to use cpusets/cgroups.
> My plan is to deprecate nohz_full as well once we are able to make dynamic
> CPU isolation via cpuset works almost as good as isolcpus + nohz_full.
You can't really deprecate such a kernel boot option unfortunately. Believe me
I wish we could.
Thanks.
> Cheers,
> Longman
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 12:44 ` Frederic Weisbecker
@ 2024-09-04 13:04 ` Peter Zijlstra
2024-09-04 13:41 ` Waiman Long
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Peter Zijlstra @ 2024-09-04 13:04 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On Wed, Sep 04, 2024 at 02:44:26PM +0200, Frederic Weisbecker wrote:
> Le Tue, Sep 03, 2024 at 09:23:53PM -0400, Waiman Long a écrit :
> > > After discussing with Peter lately, the rules should be:
> > >
> > > 1) If a nohz_full CPU is part of a multi-CPU domain, then it should
> > > be part of load balancing. Peter even says that nohz_full should be
> > > forbidden in this case, because the tick plays a role in the
> > > load balancing.
> >
> > My understand is that most users will use nohz_full together with isolcpus.
> > So nohz_full CPUs are also isolated and not in a sched domain. There may
> > still be user setting nohz_full without isolcpus though, but that should be
> > relatively rare.
>
> Apparently there are users wanting to use isolation along with automatic
> containers deployments such as kubernetes, which doesn't seem to work
> well with isolcpus...
I've been proposing to get rid of isolcpus for at least the last 15
years or so. There just isn't a good reason to ever use it. We were
close and then the whole NOHZ_FULL thing came along.
You can create single CPU partitions using cpusets dynamically.
> > Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs in
> > isolated cpuset partitions which will not be in a sched domain.
> >
> > >
> > > 2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
> > > domains, then it can be out of the load balancing machinery.
> > I am aware that a single-cpu domain is the same as being isolated with no
> > load balancing.
>
> By the way is it possible to have a single-cpu domain (sorry I'm a noob here)
> or do such CPU always end up on a null domain?
IIRC they always end up with the null domain; but its been a while. It
simply doesn't make much sense to have a 1 cpu domain. The way the
topology code works is by always building the full domain tree, and then
throwing away all levels that do not contribute, and in the 1 cpu case,
that would be all of them.
Look for 'degenerate' in kernel/sched/topology.c.
> > >
> > > I'm a bit scared about rule 1) because I know there are existing users of
> > > nohz_full on multi-CPU domains... So I feel a bit trapped.
> >
> > As stated before, this is not a common use case.
>
> Not sure and anyway it's not a forbidden usecase. But this is anyway outside
> the scope of this patchset.
Most crucially, it is a completely broken setup. It doesn't actually
work well.
Taking it away will force people to fix their broken. That's a good
thing, no?
> > The isolcpus boot option is deprecated, as stated in kernel-parameters.txt.
>
> We should undeprecate it, apparently it's still widely used. Perhaps by people
> who can't afford to use cpusets/cgroups.
What is the actual problem with using cpusets? At the very least the
whole nohz_full thing needs to be moved into cpusets so it isn't a fixed
boot time thing anymore.
> > My plan is to deprecate nohz_full as well once we are able to make dynamic
> > CPU isolation via cpuset works almost as good as isolcpus + nohz_full.
>
> You can't really deprecate such a kernel boot option unfortunately. Believe me
> I wish we could.
Why not? As I said, the only thing that's kept it around, and worse,
made it more popular again, is this nohz_full nonsense. That never
should've used isolcpus, but that's not something we can do anything
about now.
Rigid, boot time only things are teh suck.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 13:04 ` Peter Zijlstra
@ 2024-09-04 13:41 ` Waiman Long
2024-09-04 13:44 ` Phil Auld
2024-09-06 12:36 ` Frederic Weisbecker
2 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2024-09-04 13:41 UTC (permalink / raw)
To: Peter Zijlstra, Frederic Weisbecker
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel
On 9/4/24 09:04, Peter Zijlstra wrote:
> On Wed, Sep 04, 2024 at 02:44:26PM +0200, Frederic Weisbecker wrote:
>>>> I'm a bit scared about rule 1) because I know there are existing users of
>>>> nohz_full on multi-CPU domains... So I feel a bit trapped.
>>> As stated before, this is not a common use case.
>> Not sure and anyway it's not a forbidden usecase. But this is anyway outside
>> the scope of this patchset.
> Most crucially, it is a completely broken setup. It doesn't actually
> work well.
>
> Taking it away will force people to fix their broken. That's a good
> thing, no?
It will be hard to take it away without a good substitute. This is
exactly what I am trying to accomplish with the dynamic CPU isolation work.
>
>>> The isolcpus boot option is deprecated, as stated in kernel-parameters.txt.
>> We should undeprecate it, apparently it's still widely used. Perhaps by people
>> who can't afford to use cpusets/cgroups.
> What is the actual problem with using cpusets? At the very least the
> whole nohz_full thing needs to be moved into cpusets so it isn't a fixed
> boot time thing anymore.
Right.
>>> My plan is to deprecate nohz_full as well once we are able to make dynamic
>>> CPU isolation via cpuset works almost as good as isolcpus + nohz_full.
>> You can't really deprecate such a kernel boot option unfortunately. Believe me
>> I wish we could.
> Why not? As I said, the only thing that's kept it around, and worse,
> made it more popular again, is this nohz_full nonsense. That never
> should've used isolcpus, but that's not something we can do anything
> about now.
>
> Rigid, boot time only things are teh suck.
Declaring a feature as being deprecated is the first step in the process
of removing it. Users can still use a deprecated feature as the code is
still there. The actual removal will be at least several releases later
if not more.
Cheers,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 13:04 ` Peter Zijlstra
2024-09-04 13:41 ` Waiman Long
@ 2024-09-04 13:44 ` Phil Auld
2024-09-04 14:02 ` Peter Zijlstra
2024-09-04 14:09 ` Waiman Long
2024-09-06 12:36 ` Frederic Weisbecker
2 siblings, 2 replies; 22+ messages in thread
From: Phil Auld @ 2024-09-04 13:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Frederic Weisbecker, Waiman Long, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, linux-kernel
Throwing my 2 cents in ...
On Wed, Sep 04, 2024 at 03:04:45PM +0200 Peter Zijlstra wrote:
> On Wed, Sep 04, 2024 at 02:44:26PM +0200, Frederic Weisbecker wrote:
> > Le Tue, Sep 03, 2024 at 09:23:53PM -0400, Waiman Long a écrit :
> > > > After discussing with Peter lately, the rules should be:
> > > >
> > > > 1) If a nohz_full CPU is part of a multi-CPU domain, then it should
> > > > be part of load balancing. Peter even says that nohz_full should be
> > > > forbidden in this case, because the tick plays a role in the
> > > > load balancing.
> > >
> > > My understand is that most users will use nohz_full together with isolcpus.
> > > So nohz_full CPUs are also isolated and not in a sched domain. There may
> > > still be user setting nohz_full without isolcpus though, but that should be
> > > relatively rare.
> >
> > Apparently there are users wanting to use isolation along with automatic
> > containers deployments such as kubernetes, which doesn't seem to work
> > well with isolcpus...
>
Definitely this ^^^
> I've been proposing to get rid of isolcpus for at least the last 15
> years or so. There just isn't a good reason to ever use it. We were
> close and then the whole NOHZ_FULL thing came along.
>
> You can create single CPU partitions using cpusets dynamically.
>
This is somewhat new though... Although you could turn off load balancing with
groups v1, in v2 you could not.
> > > Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs in
> > > isolated cpuset partitions which will not be in a sched domain.
> > >
> > > >
> > > > 2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
> > > > domains, then it can be out of the load balancing machinery.
> > > I am aware that a single-cpu domain is the same as being isolated with no
> > > load balancing.
> >
> > By the way is it possible to have a single-cpu domain (sorry I'm a noob here)
> > or do such CPU always end up on a null domain?
>
> IIRC they always end up with the null domain; but its been a while. It
> simply doesn't make much sense to have a 1 cpu domain. The way the
> topology code works is by always building the full domain tree, and then
> throwing away all levels that do not contribute, and in the 1 cpu case,
> that would be all of them.
>
> Look for 'degenerate' in kernel/sched/topology.c.
>
> > > >
> > > > I'm a bit scared about rule 1) because I know there are existing users of
> > > > nohz_full on multi-CPU domains... So I feel a bit trapped.
> > >
> > > As stated before, this is not a common use case.
> >
> > Not sure and anyway it's not a forbidden usecase. But this is anyway outside
> > the scope of this patchset.
>
> Most crucially, it is a completely broken setup. It doesn't actually
> work well.
>
> Taking it away will force people to fix their broken. That's a good
> thing, no?
>
I wonder if people using nohz_full full w/o isolcpus is a bit of a historical
artifact. It used to be the case that you could use nohz and then turn off load
balancing from userspace using the domain flags. That allowed it to work w/o
isolcpus. When that ability was removed isolcpus often had to be used as a
replacement. Probably accounts for some of its increased use.
It takes a while for all of these changes to bubble up and get used by the
layers above. It's generally not download and build the new kernel and adjust
your configuration settings and scripts :)
> > > The isolcpus boot option is deprecated, as stated in kernel-parameters.txt.
> >
> > We should undeprecate it, apparently it's still widely used. Perhaps by people
> > who can't afford to use cpusets/cgroups.
>
> What is the actual problem with using cpusets? At the very least the
> whole nohz_full thing needs to be moved into cpusets so it isn't a fixed
> boot time thing anymore.
Until fairly recently you could not do the isolation with cgroups v2.
I agree with the direction Waiman et al. are heading. Getting rid of the
kernel commandline settings for this is a good thing.
>
> > > My plan is to deprecate nohz_full as well once we are able to make dynamic
> > > CPU isolation via cpuset works almost as good as isolcpus + nohz_full.
> >
> > You can't really deprecate such a kernel boot option unfortunately. Believe me
> > I wish we could.
>
> Why not? As I said, the only thing that's kept it around, and worse,
> made it more popular again, is this nohz_full nonsense. That never
> should've used isolcpus, but that's not something we can do anything
> about now.
Deprecate-and-not-remove should be fine. Then in 15 years we can revisit it
like isolcpus ;)
Cheers,
Phil
>
> Rigid, boot time only things are teh suck.
>
--
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 13:44 ` Phil Auld
@ 2024-09-04 14:02 ` Peter Zijlstra
2024-09-04 14:32 ` Phil Auld
2024-09-04 14:09 ` Waiman Long
1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2024-09-04 14:02 UTC (permalink / raw)
To: Phil Auld
Cc: Frederic Weisbecker, Waiman Long, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, linux-kernel
On Wed, Sep 04, 2024 at 09:44:27AM -0400, Phil Auld wrote:
> > You can create single CPU partitions using cpusets dynamically.
> >
>
> This is somewhat new though... Although you could turn off load balancing with
> groups v1, in v2 you could not.
I can't remember, but I think I stalled cpuset-v2 until this was
adressed -- that is, AFAIR any v2 can actually do this too. You create
many 1 cpu partitions. It is more cumbersome that v1, but it is very
much doable.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 13:44 ` Phil Auld
2024-09-04 14:02 ` Peter Zijlstra
@ 2024-09-04 14:09 ` Waiman Long
1 sibling, 0 replies; 22+ messages in thread
From: Waiman Long @ 2024-09-04 14:09 UTC (permalink / raw)
To: Phil Auld, Peter Zijlstra
Cc: Frederic Weisbecker, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On 9/4/24 09:44, Phil Auld wrote:
> Throwing my 2 cents in ...
>
> On Wed, Sep 04, 2024 at 03:04:45PM +0200 Peter Zijlstra wrote:
>> On Wed, Sep 04, 2024 at 02:44:26PM +0200, Frederic Weisbecker wrote:
>>> Le Tue, Sep 03, 2024 at 09:23:53PM -0400, Waiman Long a écrit :
>>>>> After discussing with Peter lately, the rules should be:
>>>>>
>>>>> 1) If a nohz_full CPU is part of a multi-CPU domain, then it should
>>>>> be part of load balancing. Peter even says that nohz_full should be
>>>>> forbidden in this case, because the tick plays a role in the
>>>>> load balancing.
>>>> My understand is that most users will use nohz_full together with isolcpus.
>>>> So nohz_full CPUs are also isolated and not in a sched domain. There may
>>>> still be user setting nohz_full without isolcpus though, but that should be
>>>> relatively rare.
>>> Apparently there are users wanting to use isolation along with automatic
>>> containers deployments such as kubernetes, which doesn't seem to work
>>> well with isolcpus...
> Definitely this ^^^
>
>> I've been proposing to get rid of isolcpus for at least the last 15
>> years or so. There just isn't a good reason to ever use it. We were
>> close and then the whole NOHZ_FULL thing came along.
>>
>> You can create single CPU partitions using cpusets dynamically.
>>
> This is somewhat new though... Although you could turn off load balancing with
> groups v1, in v2 you could not.
That is not true for the newer kernels. It was true that turning off
load balancing isn't possible with the initial version of cpuset v2.
That capability was later added with the isolated partition feature in
cpuset v2.
>
>
>>>> Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs in
>>>> isolated cpuset partitions which will not be in a sched domain.
>>>>
>>>>> 2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
>>>>> domains, then it can be out of the load balancing machinery.
>>>> I am aware that a single-cpu domain is the same as being isolated with no
>>>> load balancing.
>>> By the way is it possible to have a single-cpu domain (sorry I'm a noob here)
>>> or do such CPU always end up on a null domain?
>> IIRC they always end up with the null domain; but its been a while. It
>> simply doesn't make much sense to have a 1 cpu domain. The way the
>> topology code works is by always building the full domain tree, and then
>> throwing away all levels that do not contribute, and in the 1 cpu case,
>> that would be all of them.
>>
>> Look for 'degenerate' in kernel/sched/topology.c.
>>
>>>>> I'm a bit scared about rule 1) because I know there are existing users of
>>>>> nohz_full on multi-CPU domains... So I feel a bit trapped.
>>>> As stated before, this is not a common use case.
>>> Not sure and anyway it's not a forbidden usecase. But this is anyway outside
>>> the scope of this patchset.
>> Most crucially, it is a completely broken setup. It doesn't actually
>> work well.
>>
>> Taking it away will force people to fix their broken. That's a good
>> thing, no?
>>
> I wonder if people using nohz_full full w/o isolcpus is a bit of a historical
> artifact. It used to be the case that you could use nohz and then turn off load
> balancing from userspace using the domain flags. That allowed it to work w/o
> isolcpus. When that ability was removed isolcpus often had to be used as a
> replacement. Probably accounts for some of its increased use.
>
> It takes a while for all of these changes to bubble up and get used by the
> layers above. It's generally not download and build the new kernel and adjust
> your configuration settings and scripts :)
Yes, it will take a while for the users to use the new features
available and abandon the old ways.
>
>
>>>> The isolcpus boot option is deprecated, as stated in kernel-parameters.txt.
>>> We should undeprecate it, apparently it's still widely used. Perhaps by people
>>> who can't afford to use cpusets/cgroups.
>> What is the actual problem with using cpusets? At the very least the
>> whole nohz_full thing needs to be moved into cpusets so it isn't a fixed
>> boot time thing anymore.
> Until fairly recently you could not do the isolation with cgroups v2.
>
> I agree with the direction Waiman et al. are heading. Getting rid of the
> kernel commandline settings for this is a good thing.
>
>>>> My plan is to deprecate nohz_full as well once we are able to make dynamic
>>>> CPU isolation via cpuset works almost as good as isolcpus + nohz_full.
>>> You can't really deprecate such a kernel boot option unfortunately. Believe me
>>> I wish we could.
>> Why not? As I said, the only thing that's kept it around, and worse,
>> made it more popular again, is this nohz_full nonsense. That never
>> should've used isolcpus, but that's not something we can do anything
>> about now.
> Deprecate-and-not-remove should be fine. Then in 15 years we can revisit it
> like isolcpus ;)
Yes, it will take a while :-)
Cheers,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 14:02 ` Peter Zijlstra
@ 2024-09-04 14:32 ` Phil Auld
0 siblings, 0 replies; 22+ messages in thread
From: Phil Auld @ 2024-09-04 14:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Frederic Weisbecker, Waiman Long, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, linux-kernel
On Wed, Sep 04, 2024 at 04:02:10PM +0200 Peter Zijlstra wrote:
> On Wed, Sep 04, 2024 at 09:44:27AM -0400, Phil Auld wrote:
>
> > > You can create single CPU partitions using cpusets dynamically.
> > >
> >
> > This is somewhat new though... Although you could turn off load balancing with
> > groups v1, in v2 you could not.
>
> I can't remember, but I think I stalled cpuset-v2 until this was
> adressed -- that is, AFAIR any v2 can actually do this too. You create
> many 1 cpu partitions. It is more cumbersome that v1, but it is very
> much doable.
>
>
v2 can do this _now_. It couldn't before. Or at least not in a way that was
usable until Waiman's partitioning patches I believe.
I did say that in the past tense :)
Cheers,
Phil
--
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
2024-09-03 13:53 ` Waiman Long
@ 2024-09-04 14:54 ` Waiman Long
2024-09-06 12:53 ` Frederic Weisbecker
0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2024-09-04 14:54 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On 9/3/24 09:53, Waiman Long wrote:
>
> On 9/3/24 09:12, Frederic Weisbecker wrote:
>> Le Sun, Aug 18, 2024 at 07:45:19PM -0400, Waiman Long a écrit :
>>> As the previous commit has enabled the setting of HK_TYPE_SCHED
>>> housekeeping CPUs in nohz_full setup, we can now use the more aptly
>>> named HK_TYPE_SCHED housekeeping CPUs instead of HK_TYPE_MISC.
>>>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>> Can we instead merge HK_FLAG_TICK, HK_FLAG_WQ, HK_FLAG_TIMER,
>> HK_FLAG_RCU,
>> HK_FLAG_MISC and HK_FLAG_KTHREAD into a single
>> HK_FLAG_KERNEL_NOISE / HK_TYPE_KERNEL_NOISE ?
>
> Sure. I am open to new name as I am not good on that.
> HK_FLAG_KERNEL_NOISE does make sense.
I can't merge HK_FLAG_TICK just yet as it can be independently set by
using isolcpus which can set just HK_TYPE_TICK, HK_TYPE_DOMAIN or
HK_TYPE_MANAGED_IRQ. That is the reason why they are still separate.
Cheers,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full
2024-09-04 13:04 ` Peter Zijlstra
2024-09-04 13:41 ` Waiman Long
2024-09-04 13:44 ` Phil Auld
@ 2024-09-06 12:36 ` Frederic Weisbecker
2 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2024-09-06 12:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
Le Wed, Sep 04, 2024 at 03:04:45PM +0200, Peter Zijlstra a écrit :
> On Wed, Sep 04, 2024 at 02:44:26PM +0200, Frederic Weisbecker wrote:
> > Le Tue, Sep 03, 2024 at 09:23:53PM -0400, Waiman Long a écrit :
> > > > After discussing with Peter lately, the rules should be:
> > > >
> > > > 1) If a nohz_full CPU is part of a multi-CPU domain, then it should
> > > > be part of load balancing. Peter even says that nohz_full should be
> > > > forbidden in this case, because the tick plays a role in the
> > > > load balancing.
> > >
> > > My understand is that most users will use nohz_full together with isolcpus.
> > > So nohz_full CPUs are also isolated and not in a sched domain. There may
> > > still be user setting nohz_full without isolcpus though, but that should be
> > > relatively rare.
> >
> > Apparently there are users wanting to use isolation along with automatic
> > containers deployments such as kubernetes, which doesn't seem to work
> > well with isolcpus...
>
> I've been proposing to get rid of isolcpus for at least the last 15
> years or so. There just isn't a good reason to ever use it. We were
> close and then the whole NOHZ_FULL thing came along.
>
> You can create single CPU partitions using cpusets dynamically.
I'm not sure we could have removed isolcpus= even back then.
It has always been widely used and we would have broke someone's box.
>
> > > Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs in
> > > isolated cpuset partitions which will not be in a sched domain.
> > >
> > > >
> > > > 2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
> > > > domains, then it can be out of the load balancing machinery.
> > > I am aware that a single-cpu domain is the same as being isolated with no
> > > load balancing.
> >
> > By the way is it possible to have a single-cpu domain (sorry I'm a noob here)
> > or do such CPU always end up on a null domain?
>
> IIRC they always end up with the null domain; but its been a while. It
> simply doesn't make much sense to have a 1 cpu domain. The way the
> topology code works is by always building the full domain tree, and then
> throwing away all levels that do not contribute, and in the 1 cpu case,
> that would be all of them.
>
> Look for 'degenerate' in kernel/sched/topology.c.
Ok.
>
> > > >
> > > > I'm a bit scared about rule 1) because I know there are existing users of
> > > > nohz_full on multi-CPU domains... So I feel a bit trapped.
> > >
> > > As stated before, this is not a common use case.
> >
> > Not sure and anyway it's not a forbidden usecase. But this is anyway outside
> > the scope of this patchset.
>
> Most crucially, it is a completely broken setup. It doesn't actually
> work well.
>
> Taking it away will force people to fix their broken. That's a good
> thing, no?
I'm all for it but isn't the rule not to break userspace?
>
> > > The isolcpus boot option is deprecated, as stated in kernel-parameters.txt.
> >
> > We should undeprecate it, apparently it's still widely used. Perhaps by people
> > who can't afford to use cpusets/cgroups.
>
> What is the actual problem with using cpusets? At the very least the
> whole nohz_full thing needs to be moved into cpusets so it isn't a fixed
> boot time thing anymore.
Sure that's the plan.
>
> > > My plan is to deprecate nohz_full as well once we are able to make dynamic
> > > CPU isolation via cpuset works almost as good as isolcpus + nohz_full.
> >
> > You can't really deprecate such a kernel boot option unfortunately. Believe me
> > I wish we could.
>
> Why not? As I said, the only thing that's kept it around, and worse,
> made it more popular again, is this nohz_full nonsense. That never
> should've used isolcpus, but that's not something we can do anything
> about now.
>
> Rigid, boot time only things are teh suck.
I know...
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
2024-09-04 14:54 ` Waiman Long
@ 2024-09-06 12:53 ` Frederic Weisbecker
2024-09-06 16:31 ` Waiman Long
0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2024-09-06 12:53 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
Le Wed, Sep 04, 2024 at 10:54:15AM -0400, Waiman Long a écrit :
>
> On 9/3/24 09:53, Waiman Long wrote:
> >
> > On 9/3/24 09:12, Frederic Weisbecker wrote:
> > > Le Sun, Aug 18, 2024 at 07:45:19PM -0400, Waiman Long a écrit :
> > > > As the previous commit has enabled the setting of HK_TYPE_SCHED
> > > > housekeeping CPUs in nohz_full setup, we can now use the more aptly
> > > > named HK_TYPE_SCHED housekeeping CPUs instead of HK_TYPE_MISC.
> > > >
> > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > Can we instead merge HK_FLAG_TICK, HK_FLAG_WQ, HK_FLAG_TIMER,
> > > HK_FLAG_RCU,
> > > HK_FLAG_MISC and HK_FLAG_KTHREAD into a single
> > > HK_FLAG_KERNEL_NOISE / HK_TYPE_KERNEL_NOISE ?
> >
> > Sure. I am open to new name as I am not good on that.
> > HK_FLAG_KERNEL_NOISE does make sense.
>
> I can't merge HK_FLAG_TICK just yet as it can be independently set by using
> isolcpus which can set just HK_TYPE_TICK, HK_TYPE_DOMAIN or
> HK_TYPE_MANAGED_IRQ. That is the reason why they are still separate.
Ah I think we really need to unify behaviours of isolcpus=nohz and nohz_full=
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5891e715f00d..9bc8fbf326cc 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -212,7 +212,8 @@ static int __init housekeeping_isolcpus_setup(char *str)
while (isalpha(*str)) {
if (!strncmp(str, "nohz,", 5)) {
str += 5;
- flags |= HK_FLAG_TICK;
+ flags |= HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
+ HK_FLAG_MISC | HK_FLAG_KTHREAD;
continue;
}
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs
2024-09-06 12:53 ` Frederic Weisbecker
@ 2024-09-06 16:31 ` Waiman Long
0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2024-09-06 16:31 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, linux-kernel
On 9/6/24 08:53, Frederic Weisbecker wrote:
> Le Wed, Sep 04, 2024 at 10:54:15AM -0400, Waiman Long a écrit :
>> On 9/3/24 09:53, Waiman Long wrote:
>>> On 9/3/24 09:12, Frederic Weisbecker wrote:
>>>> Le Sun, Aug 18, 2024 at 07:45:19PM -0400, Waiman Long a écrit :
>>>>> As the previous commit has enabled the setting of HK_TYPE_SCHED
>>>>> housekeeping CPUs in nohz_full setup, we can now use the more aptly
>>>>> named HK_TYPE_SCHED housekeeping CPUs instead of HK_TYPE_MISC.
>>>>>
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> Can we instead merge HK_FLAG_TICK, HK_FLAG_WQ, HK_FLAG_TIMER,
>>>> HK_FLAG_RCU,
>>>> HK_FLAG_MISC and HK_FLAG_KTHREAD into a single
>>>> HK_FLAG_KERNEL_NOISE / HK_TYPE_KERNEL_NOISE ?
>>> Sure. I am open to new name as I am not good on that.
>>> HK_FLAG_KERNEL_NOISE does make sense.
>> I can't merge HK_FLAG_TICK just yet as it can be independently set by using
>> isolcpus which can set just HK_TYPE_TICK, HK_TYPE_DOMAIN or
>> HK_TYPE_MANAGED_IRQ. That is the reason why they are still separate.
> Ah I think we really need to unify behaviours of isolcpus=nohz and nohz_full=
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 5891e715f00d..9bc8fbf326cc 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -212,7 +212,8 @@ static int __init housekeeping_isolcpus_setup(char *str)
> while (isalpha(*str)) {
> if (!strncmp(str, "nohz,", 5)) {
> str += 5;
> - flags |= HK_FLAG_TICK;
> + flags |= HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
> + HK_FLAG_MISC | HK_FLAG_KTHREAD;
> continue;
> }
I am not against it, but it is a change in userspace behavior. How about
we introduce "isolcpus=nohz_full" option that essentially combines
isolcpus and nohz_full in one option? That can be a separate patch.
Cheers,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-09-06 16:31 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-18 23:45 [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
2024-08-18 23:45 ` [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full Waiman Long
2024-09-03 13:10 ` Frederic Weisbecker
2024-09-03 13:24 ` Waiman Long
2024-09-03 21:32 ` Frederic Weisbecker
2024-09-04 1:23 ` Waiman Long
2024-09-04 12:44 ` Frederic Weisbecker
2024-09-04 13:04 ` Peter Zijlstra
2024-09-04 13:41 ` Waiman Long
2024-09-04 13:44 ` Phil Auld
2024-09-04 14:02 ` Peter Zijlstra
2024-09-04 14:32 ` Phil Auld
2024-09-04 14:09 ` Waiman Long
2024-09-06 12:36 ` Frederic Weisbecker
2024-08-18 23:45 ` [PATCH 2/3] sched/fair: Use HK_TYPE_SCHED housekeeping CPUs Waiman Long
2024-09-03 13:12 ` Frederic Weisbecker
2024-09-03 13:53 ` Waiman Long
2024-09-04 14:54 ` Waiman Long
2024-09-06 12:53 ` Frederic Weisbecker
2024-09-06 16:31 ` Waiman Long
2024-08-18 23:45 ` [PATCH 3/3] sched/isolation: Consolidate housekeeping cpumasks that are always identical Waiman Long
2024-08-23 18:23 ` [PATCH 0/3] sched: Miscellaneous isolation related cleanups Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox