[PATCH -next] sched/cputime: Fix mul_u64_u64_div

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
@ 2024-07-25 12:03 Zheng Zucheng
  2024-07-25 14:05 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Zheng Zucheng @ 2024-07-25 12:03 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, oleg
  Cc: linux-kernel

In extreme test scenarios:
the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
utime = 18446744073709518790 ns, rtime = 135989749728000 ns

In cputime_adjust() process, stime is greater than rtime due to
mul_u64_u64_div_u64() precision problem.
before call mul_u64_u64_div_u64(),
stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
after call mul_u64_u64_div_u64(),
stime = 135989949653530

unsigned reversion occurs because rtime is less than stime.
utime = rtime - stime = 135989749728000 - 135989949653530
		      = -199925530
		      = (u64)18446744073709518790

Trigger scenario:
1. User task run in kernel mode most of time.
2. The ARM64 architecture && CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y &&
   TICK_CPU_ACCOUNTING=y

Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime

Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
---
 kernel/sched/cputime.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index aa48b2ec879d..365c74e95537 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -582,6 +582,8 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
 	}
 
 	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
+	if (unlikely(stime > rtime))
+		stime = rtime;
 
 update:
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-25 12:03 [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime Zheng Zucheng
@ 2024-07-25 14:05 ` Peter Zijlstra
  2024-07-25 14:49   ` zhengzucheng
  2024-07-26  2:32 ` [PATCH v2 " Zheng Zucheng
  2024-09-02  1:56 ` [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu？ zhengzucheng
  2 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2024-07-25 14:05 UTC (permalink / raw)
  To: Zheng Zucheng
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, vschneid, oleg, linux-kernel

On Thu, Jul 25, 2024 at 12:03:15PM +0000, Zheng Zucheng wrote:
> In extreme test scenarios:
> the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
> utime = 18446744073709518790 ns, rtime = 135989749728000 ns
> 
> In cputime_adjust() process, stime is greater than rtime due to
> mul_u64_u64_div_u64() precision problem.
> before call mul_u64_u64_div_u64(),
> stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
> after call mul_u64_u64_div_u64(),
> stime = 135989949653530
> 
> unsigned reversion occurs because rtime is less than stime.
> utime = rtime - stime = 135989749728000 - 135989949653530
> 		      = -199925530
> 		      = (u64)18446744073709518790
> 
> Trigger scenario:
> 1. User task run in kernel mode most of time.
> 2. The ARM64 architecture && CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y &&
>    TICK_CPU_ACCOUNTING=y
> 
> Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
> 
> Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
> Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
> ---
>  kernel/sched/cputime.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index aa48b2ec879d..365c74e95537 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -582,6 +582,8 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
>  	}
>  
>  	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> +	if (unlikely(stime > rtime))
> +		stime = rtime;

But but but... for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y the code you're
patching is not compiled!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-25 14:05 ` Peter Zijlstra
@ 2024-07-25 14:49   ` zhengzucheng
  2024-07-25 15:14     ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: zhengzucheng @ 2024-07-25 14:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, vschneid, oleg, linux-kernel

Sorry, I made a mistake here. CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set.

在 2024/7/25 22:05, Peter Zijlstra 写道:
> On Thu, Jul 25, 2024 at 12:03:15PM +0000, Zheng Zucheng wrote:
>> In extreme test scenarios:
>> the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
>> utime = 18446744073709518790 ns, rtime = 135989749728000 ns
>>
>> In cputime_adjust() process, stime is greater than rtime due to
>> mul_u64_u64_div_u64() precision problem.
>> before call mul_u64_u64_div_u64(),
>> stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
>> after call mul_u64_u64_div_u64(),
>> stime = 135989949653530
>>
>> unsigned reversion occurs because rtime is less than stime.
>> utime = rtime - stime = 135989749728000 - 135989949653530
>> 		      = -199925530
>> 		      = (u64)18446744073709518790
>>
>> Trigger scenario:
>> 1. User task run in kernel mode most of time.
>> 2. The ARM64 architecture && CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y &&
>>     TICK_CPU_ACCOUNTING=y
>>
>> Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
>>
>> Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
>> Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
>> ---
>>   kernel/sched/cputime.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
>> index aa48b2ec879d..365c74e95537 100644
>> --- a/kernel/sched/cputime.c
>> +++ b/kernel/sched/cputime.c
>> @@ -582,6 +582,8 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
>>   	}
>>   
>>   	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
>> +	if (unlikely(stime > rtime))
>> +		stime = rtime;
> But but but... for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y the code you're
> patching is not compiled!
>
>
> .

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-25 14:49   ` zhengzucheng
@ 2024-07-25 15:14     ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2024-07-25 15:14 UTC (permalink / raw)
  To: zhengzucheng
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, oleg, linux-kernel

On Thu, Jul 25, 2024 at 10:49:46PM +0800, zhengzucheng wrote:
> Sorry, I made a mistake here. CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set.
> 
> 在 2024/7/25 22:05, Peter Zijlstra 写道:
> > On Thu, Jul 25, 2024 at 12:03:15PM +0000, Zheng Zucheng wrote:
> > > In extreme test scenarios:
> > > the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
> > > utime = 18446744073709518790 ns, rtime = 135989749728000 ns
> > > 
> > > In cputime_adjust() process, stime is greater than rtime due to
> > > mul_u64_u64_div_u64() precision problem.
> > > before call mul_u64_u64_div_u64(),
> > > stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
> > > after call mul_u64_u64_div_u64(),
> > > stime = 135989949653530
> > > 
> > > unsigned reversion occurs because rtime is less than stime.
> > > utime = rtime - stime = 135989749728000 - 135989949653530
> > > 		      = -199925530
> > > 		      = (u64)18446744073709518790
> > > 
> > > Trigger scenario:
> > > 1. User task run in kernel mode most of time.
> > > 2. The ARM64 architecture && CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y &&
> > >     TICK_CPU_ACCOUNTING=y
> > > 
> > > Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
> > > 
> > > Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
> > > Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
> > > ---
> > >   kernel/sched/cputime.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> > > index aa48b2ec879d..365c74e95537 100644
> > > --- a/kernel/sched/cputime.c
> > > +++ b/kernel/sched/cputime.c
> > > @@ -582,6 +582,8 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
> > >   	}
> > >   	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > > +	if (unlikely(stime > rtime))
> > > +		stime = rtime;

Ooh,.. I see, this is because the generic fallback for
mul_u64_u64_div_u64() is yuck :/

On x86_64 this is just two instructions and it does a native:

  u64*u64->u128
  u128/u64->u64

And this should never happen. But in the generic case, we appoximate and
urgh.

So yeah, but then perhaps add a comment like:

	/*
	 * Because mul_u64_u64_div_u64() can approximate on some
	 * achitectures; enforce the constraint that: a*b/(b+c) <= a.
	 */
	if (unlikely(stime > rtime))
		stime = rtime;

Also, I would look into doing a native arm64 version, I'd be surprised
if it could not do better than the generic variant.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-25 12:03 [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime Zheng Zucheng
  2024-07-25 14:05 ` Peter Zijlstra
@ 2024-07-26  2:32 ` Zheng Zucheng
  2024-07-26 10:44   ` Oleg Nesterov
  2024-07-29 10:34   ` [tip: sched/core] " tip-bot2 for Zheng Zucheng
  2024-09-02  1:56 ` [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu？ zhengzucheng
  2 siblings, 2 replies; 19+ messages in thread
From: Zheng Zucheng @ 2024-07-26  2:32 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, oleg
  Cc: linux-kernel

In extreme test scenarios:
the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
utime = 18446744073709518790 ns, rtime = 135989749728000 ns

In cputime_adjust() process, stime is greater than rtime due to
mul_u64_u64_div_u64() precision problem.
before call mul_u64_u64_div_u64(),
stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
after call mul_u64_u64_div_u64(),
stime = 135989949653530

unsigned reversion occurs because rtime is less than stime.
utime = rtime - stime = 135989749728000 - 135989949653530
		      = -199925530
		      = (u64)18446744073709518790

Trigger condition:
1). User task run in kernel mode most of time
2). ARM64 architecture
3). TICK_CPU_ACCOUNTING=y
    CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set

Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime

v2:
 - Add comment
 - Update trigger condition

Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
---
 kernel/sched/cputime.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index aa48b2ec879d..4feef0d4e449 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -582,6 +582,12 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
 	}
 
 	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
+	/*
+	 * Because mul_u64_u64_div_u64() can approximate on some
+	 * achitectures; enforce the constraint that: a*b/(b+c) <= a.
+	 */
+	if (unlikely(stime > rtime))
+		stime = rtime;
 
 update:
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-26  2:32 ` [PATCH v2 " Zheng Zucheng
@ 2024-07-26 10:44   ` Oleg Nesterov
  2024-07-26 13:04     ` Oleg Nesterov
  2024-07-30  6:55     ` Uwe Kleine-König
  2024-07-29 10:34   ` [tip: sched/core] " tip-bot2 for Zheng Zucheng
  1 sibling, 2 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-07-26 10:44 UTC (permalink / raw)
  To: Zheng Zucheng
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, linux-kernel

On 07/26, Zheng Zucheng wrote:
>
> before call mul_u64_u64_div_u64(),
> stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.

So stime + utime == 175138003500000

> after call mul_u64_u64_div_u64(),
> stime = 135989949653530

Hmm. On x86 mul_u64_u64_div_u64(175136586720000, 135989749728000, 175138003500000)
returns 135989749728000 == rtime, see below.

Nevermind...

> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -582,6 +582,12 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
>  	}
>  
>  	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> +	/*
> +	 * Because mul_u64_u64_div_u64() can approximate on some
> +	 * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> +	 */
> +	if (unlikely(stime > rtime))
> +		stime = rtime;

Thanks,

Acked-by: Oleg Nesterov <oleg@redhat.com>

-------------------------------------------------------------------------------
But perhaps it makes sense to improve the accuracy of mul_u64_u64_div_u64() ?
See the new() function in the code below.

Say, with the numbers above I get

	$ ./test 175136586720000 135989749728000 175138003500000
	old -> 135989749728000	e=1100089950.609375
	new -> 135988649638050	e=0.609375

Oleg.

-------------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

typedef unsigned long long u64;

static inline int fls64(u64 x)
{
	int bitpos = -1;
	/*
	 * AMD64 says BSRQ won't clobber the dest reg if x==0; Intel64 says the
	 * dest reg is undefined if x==0, but their CPU architect says its
	 * value is written to set it to the same as before.
	 */
	asm("bsrq %1,%q0"
	    : "+r" (bitpos)
	    : "rm" (x));
	return bitpos + 1;
}

static inline int ilog2(u64 n)
{
	return fls64(n) - 1;
}

#define swap(a, b) \
	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)

static inline u64 div64_u64_rem(u64 dividend, u64 divisor, u64 *remainder)
{
	*remainder = dividend % divisor;
	return dividend / divisor;
}
static inline u64 div64_u64(u64 dividend, u64 divisor)
{
	return dividend / divisor;
}

//-----------------------------------------------------------------------------

// current implementation of mul_u64_u64_div_u64
u64 old(u64 a, u64 b, u64 c)
{
	u64 res = 0, div, rem;
	int shift;

	/* can a * b overflow ? */
	if (ilog2(a) + ilog2(b) > 62) {
		/*
		 * Note that the algorithm after the if block below might lose
		 * some precision and the result is more exact for b > a. So
		 * exchange a and b if a is bigger than b.
		 *
		 * For example with a = 43980465100800, b = 100000000, c = 1000000000
		 * the below calculation doesn't modify b at all because div == 0
		 * and then shift becomes 45 + 26 - 62 = 9 and so the result
		 * becomes 4398035251080. However with a and b swapped the exact
		 * result is calculated (i.e. 4398046510080).
		 */
		if (a > b)
			swap(a, b);

		/*
		 * (b * a) / c is equal to
		 *
		 *      (b / c) * a +
		 *      (b % c) * a / c
		 *
		 * if nothing overflows. Can the 1st multiplication
		 * overflow? Yes, but we do not care: this can only
		 * happen if the end result can't fit in u64 anyway.
		 *
		 * So the code below does
		 *
		 *      res = (b / c) * a;
		 *      b = b % c;
		 */
		div = div64_u64_rem(b, c, &rem);
		res = div * a;
		b = rem;

		shift = ilog2(a) + ilog2(b) - 62;
		if (shift > 0) {
			/* drop precision */
			b >>= shift;
			c >>= shift;
			if (!c)
				return res;
		}
	}

	return res + div64_u64(a * b, c);
}

u64 new(u64 a, u64 b, u64 c)
{
	u64 res = 0, div, rem;

	/* can a * b overflow ? */
	while (ilog2(a) + ilog2(b) > 62) {
		if (a > b)
			swap(b, a);

		if (b >= c) {
			/*
			 * (b * a) / c is equal to
			 *
			 *	(b / c) * a +
			 *	(b % c) * a / c
			 *
			 * if nothing overflows. Can the 1st multiplication
			 * overflow? Yes, but we do not care: this can only
			 * happen if the end result can't fit in u64 anyway.
			 *
			 * So the code below does
			 *
			 *	res += (b / c) * a;
			 *	b = b % c;
			 */
			div = div64_u64_rem(b, c, &rem);
			res += div * a;
			b = rem;
			continue;
		}

		/* drop precision */
		b >>= 1;
		c >>= 1;
		if (!c)
			return res;
	}

	return res + div64_u64(a * b, c);
}

int main(int argc, char **argv)
{
	u64 a, b, c, ro, rn;
	double rd;

	assert(argc == 4);
	a = strtoull(argv[1], NULL, 0);
	b = strtoull(argv[2], NULL, 0);
	c = strtoull(argv[3], NULL, 0);

	rd = (((double)a) * b) / c;
	ro = old(a, b, c);
	rn = new(a, b, c);

	printf("old -> %lld\te=%f\n", ro, ro - rd);
	printf("new -> %lld\te=%f\n", rn, rn - rd);

	return 0;
}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-26 10:44   ` Oleg Nesterov
@ 2024-07-26 13:04     ` Oleg Nesterov
  2024-07-26 13:08       ` Peter Zijlstra
  2024-07-30  6:55     ` Uwe Kleine-König
  1 sibling, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2024-07-26 13:04 UTC (permalink / raw)
  To: Zheng Zucheng
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, linux-kernel

On 07/26, Oleg Nesterov wrote:
>
> On 07/26, Zheng Zucheng wrote:
> >
> > before call mul_u64_u64_div_u64(),
> > stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
>
> So stime + utime == 175138003500000
>
> > after call mul_u64_u64_div_u64(),
> > stime = 135989949653530
>
> Hmm. On x86 mul_u64_u64_div_u64(175136586720000, 135989749728000, 175138003500000)
> returns 135989749728000 == rtime, see below.

Seriously, can you re-check your numbers? it would be nice to understand why
x86_64 differs...

> But perhaps it makes sense to improve the accuracy of mul_u64_u64_div_u64() ?
> See the new() function in the code below.

Just in case, the usage of ilog2 can be improved, but this is minor.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-26 13:04     ` Oleg Nesterov
@ 2024-07-26 13:08       ` Peter Zijlstra
  2024-07-26 13:30         ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2024-07-26 13:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Zheng Zucheng, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	linux-kernel

On Fri, Jul 26, 2024 at 03:04:01PM +0200, Oleg Nesterov wrote:
> On 07/26, Oleg Nesterov wrote:
> >
> > On 07/26, Zheng Zucheng wrote:
> > >
> > > before call mul_u64_u64_div_u64(),
> > > stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
> >
> > So stime + utime == 175138003500000
> >
> > > after call mul_u64_u64_div_u64(),
> > > stime = 135989949653530
> >
> > Hmm. On x86 mul_u64_u64_div_u64(175136586720000, 135989749728000, 175138003500000)
> > returns 135989749728000 == rtime, see below.
> 
> Seriously, can you re-check your numbers? it would be nice to understand why
> x86_64 differs...

x86_64 has a custom mul_u64_u64_div_u64() implementation.

> > But perhaps it makes sense to improve the accuracy of mul_u64_u64_div_u64() ?
> > See the new() function in the code below.
> 
> Just in case, the usage of ilog2 can be improved, but this is minor.

I meant to go look at this, it seems to loop less than your improved
version, but I'm chasing crashes atm. Perhaps it provides inspiration.

https://codebrowser.dev/llvm/compiler-rt/lib/builtins/udivmodti4.c.html#__udivmodti4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-26 13:08       ` Peter Zijlstra
@ 2024-07-26 13:30         ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-07-26 13:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Zheng Zucheng, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	linux-kernel

On 07/26, Peter Zijlstra wrote:
>
> On Fri, Jul 26, 2024 at 03:04:01PM +0200, Oleg Nesterov wrote:
> > On 07/26, Oleg Nesterov wrote:
> > >
> > > On 07/26, Zheng Zucheng wrote:
> > > >
> > > > before call mul_u64_u64_div_u64(),
> > > > stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
> > >
> > > So stime + utime == 175138003500000
> > >
> > > > after call mul_u64_u64_div_u64(),
> > > > stime = 135989949653530
> > >
> > > Hmm. On x86 mul_u64_u64_div_u64(175136586720000, 135989749728000, 175138003500000)
> > > returns 135989749728000 == rtime, see below.
> >
> > Seriously, can you re-check your numbers? it would be nice to understand why
> > x86_64 differs...
>
> x86_64 has a custom mul_u64_u64_div_u64() implementation.

Yes sure, but my user-space test-case uses the "generic" version,

> > > But perhaps it makes sense to improve the accuracy of mul_u64_u64_div_u64() ?
> > > See the new() function in the code below.
> >
> > Just in case, the usage of ilog2 can be improved, but this is minor.
>
> I meant to go look at this, it seems to loop less than your improved
> version, but I'm chasing crashes atm. Perhaps it provides inspiration.
>
> https://codebrowser.dev/llvm/compiler-rt/lib/builtins/udivmodti4.c.html#__udivmodti4

Thanks... I'll try to take a look tommorrow, but at first glance I will
never understand this (clever) code ;)

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-26 10:44   ` Oleg Nesterov
  2024-07-26 13:04     ` Oleg Nesterov
@ 2024-07-30  6:55     ` Uwe Kleine-König
  1 sibling, 0 replies; 19+ messages in thread
From: Uwe Kleine-König @ 2024-07-30  6:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Zheng Zucheng, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	linux-kernel, Andrew Morton, Nicolas Pitre

[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]

Hello,

On Fri, Jul 26, 2024 at 12:44:29PM +0200, Oleg Nesterov wrote:
> On 07/26, Zheng Zucheng wrote:
> >
> > before call mul_u64_u64_div_u64(),
> > stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
> 
> So stime + utime == 175138003500000
> 
> > after call mul_u64_u64_div_u64(),
> > stime = 135989949653530
> 
> Hmm. On x86 mul_u64_u64_div_u64(175136586720000, 135989749728000, 175138003500000)
> returns 135989749728000 == rtime, see below.
> 
> Nevermind...
> 
> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -582,6 +582,12 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
> >  	}
> >  
> >  	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > +	/*
> > +	 * Because mul_u64_u64_div_u64() can approximate on some
> > +	 * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> > +	 */
> > +	if (unlikely(stime > rtime))
> > +		stime = rtime;
> 
> Thanks,
> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> 
> -------------------------------------------------------------------------------
> But perhaps it makes sense to improve the accuracy of mul_u64_u64_div_u64() ?

Note there is a patch by Nicolas Pitre currently in mm-nonmm-unstable
that makes mul_u64_u64_div_u64() precise. It was in next for a while as
commit 3cc8bf1a81ef ("mul_u64_u64_div_u64: make it precise always")
which might explain problems to reproduce the incorrect results.

An obvious alternative to backporting this change to
kernel/sched/cputime.c for stable is to backport Nicolas's patch
instead. Andrew asked me to provide a justification to send Nicolas's
patch for inclusion in the current devel cycle. So it might make it in
before 6.11.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip: sched/core] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
  2024-07-26  2:32 ` [PATCH v2 " Zheng Zucheng
  2024-07-26 10:44   ` Oleg Nesterov
@ 2024-07-29 10:34   ` tip-bot2 for Zheng Zucheng
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot2 for Zheng Zucheng @ 2024-07-29 10:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Zheng Zucheng, Peter Zijlstra (Intel), stable, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     77baa5bafcbe1b2a15ef9c37232c21279c95481c
Gitweb:        https://git.kernel.org/tip/77baa5bafcbe1b2a15ef9c37232c21279c95481c
Author:        Zheng Zucheng <zhengzucheng@huawei.com>
AuthorDate:    Fri, 26 Jul 2024 02:32:35 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 29 Jul 2024 12:22:32 +02:00

sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime

In extreme test scenarios:
the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
utime = 18446744073709518790 ns, rtime = 135989749728000 ns

In cputime_adjust() process, stime is greater than rtime due to
mul_u64_u64_div_u64() precision problem.
before call mul_u64_u64_div_u64(),
stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
after call mul_u64_u64_div_u64(),
stime = 135989949653530

unsigned reversion occurs because rtime is less than stime.
utime = rtime - stime = 135989749728000 - 135989949653530
		      = -199925530
		      = (u64)18446744073709518790

Trigger condition:
  1). User task run in kernel mode most of time
  2). ARM64 architecture
  3). TICK_CPU_ACCOUNTING=y
      CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set

Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime

Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20240726023235.217771-1-zhengzucheng@huawei.com
---
 kernel/sched/cputime.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a5e0029..0bed0fa 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -582,6 +582,12 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
 	}
 
 	stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
+	/*
+	 * Because mul_u64_u64_div_u64() can approximate on some
+	 * achitectures; enforce the constraint that: a*b/(b+c) <= a.
+	 */
+	if (unlikely(stime > rtime))
+		stime = rtime;
 
 update:
 	/*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu？
  2024-07-25 12:03 [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime Zheng Zucheng
  2024-07-25 14:05 ` Peter Zijlstra
  2024-07-26  2:32 ` [PATCH v2 " Zheng Zucheng
@ 2024-09-02  1:56 ` zhengzucheng
  2024-09-02  3:00   ` Waiman Long
  2 siblings, 1 reply; 19+ messages in thread
From: zhengzucheng @ 2024-09-02  1:56 UTC (permalink / raw)
  To: peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, vschneid, oleg, Frederic Weisbecker,
	mingo, peterx, tj, longman
  Cc: linux-kernel

In a cpuset subsystem, cpuset.cpus contains isolated cpu and 
non-isolated cpu.
Is there any way to ensure that the task runs only on the non-isolated cpus?
eg：
isolcpus=1, cpusete.cpus=0-7. It is found that some tasks are scheduled 
to cpu1.

In addition, task run on isolated cpu cann't be scheduled to other cpu 
in the future.

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu？
  2024-09-02  1:56 ` [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu？ zhengzucheng
@ 2024-09-02  3:00   ` Waiman Long
  2024-09-13  4:03     ` [Question] sched：the load is unbalanced in the VM overcommitment scenario zhengzucheng
  0 siblings, 1 reply; 19+ messages in thread
From: Waiman Long @ 2024-09-02  3:00 UTC (permalink / raw)
  To: zhengzucheng, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, oleg,
	Frederic Weisbecker, mingo, peterx, tj
  Cc: linux-kernel

On 9/1/24 21:56, zhengzucheng wrote:
> In a cpuset subsystem, cpuset.cpus contains isolated cpu and 
> non-isolated cpu.
> Is there any way to ensure that the task runs only on the non-isolated 
> cpus?
> eg：
> isolcpus=1, cpusete.cpus=0-7. It is found that some tasks are 
> scheduled to cpu1.
>
> In addition, task run on isolated cpu cann't be scheduled to other cpu 
> in the future.

The best way is to avoid mixing isolated and scheduling CPUs in the same 
cpuset especially if you are using cgroup v1.

If you are using cgroup v2, one way to avoid the use of isolated CPUs is 
to put all of them into an isolated partition. This will ensure that 
those isolated CPUs won't be used even if they are put into the 
cpuset.cpus of other cpusets accidentally

Cheers,
Longman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Question] sched：the load is unbalanced in the VM overcommitment scenario
  2024-09-02  3:00   ` Waiman Long
@ 2024-09-13  4:03     ` zhengzucheng
  2024-09-13 15:55       ` Vincent Guittot
  2024-09-13 17:17       ` Waiman Long
  0 siblings, 2 replies; 19+ messages in thread
From: zhengzucheng @ 2024-09-13  4:03 UTC (permalink / raw)
  To: Waiman Long, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, oleg,
	Frederic Weisbecker, mingo, peterx, tj, tjcao980311,
	vincent.guittot, zhengzucheng
  Cc: linux-kernel

In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8 
CPUs are overcommitted to 2 x 8u VMs,
and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs 
resources, the other VM has 6 CPUs.
The host is configured with 80 CPUs in a sched domain and other CPUs are 
in the idle state.
The root cause is that the load of the host is unbalanced, some vCPUs 
exclusively occupy CPU resources.
when the CPU that triggers load balance calculates imbalance value, 
env->imbalance = 0 is calculated because of
local->avg_load > sds->avg_load. As a result, the load balance fails.
The processing logic: 
https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70


It's normal from kernel load balance, but it's not reasonable from the 
perspective of VM users.
In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule 
domain to fix it.
Is there any other method to fix this problem? thanks.

Abstracted reproduction case：
1.environment information：

[root@localhost ~]# cat /proc/schedstat

cpu0
domain0 00000000,00000000,00010000,00000000,00000001
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
cpu1
domain0 00000000,00000000,00020000,00000000,00000002
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
cpu2
domain0 00000000,00000000,00040000,00000000,00000004
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
cpu3
domain0 00000000,00000000,00080000,00000000,00000008
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff

2.test case:

vcpu.c
#include <stdio.h>
#include <unistd.h>

int main()
{
         sleep(20);
         while (1);
         return 0;
}

gcc vcpu.c -o vcpu
-----------------------------------------------------------------
test.sh

#!/bin/bash

#vcpu1
mkdir /sys/fs/cgroup/cpuset/vcpu_1
echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
for i in {1..8}
do
         ./vcpu &
         pid=$!
         sleep 1
         echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
done

#vcpu2
mkdir /sys/fs/cgroup/cpuset/vcpu_2
echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
for i in {1..8}
do
         ./vcpu &
         pid=$!
         sleep 1
         echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
done
------------------------------------------------------------------
[root@localhost ~]# ./test.sh

[root@localhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)

14591 root      20   0    2448   1012    928 R 100.0   0.0 13:10.73 ./vcpu
14582 root      20   0    2448   1012    928 R 100.0   0.0 13:12.71 ./vcpu
14606 root      20   0    2448    872    784 R 100.0   0.0 13:09.72 ./vcpu
14620 root      20   0    2448    916    832 R 100.0   0.0 13:07.72 ./vcpu
14622 root      20   0    2448    920    836 R 100.0   0.0 13:06.72 ./vcpu
14629 root      20   0    2448    920    832 R 100.0   0.0 13:05.72 ./vcpu
14643 root      20   0    2448    924    836 R  21.0   0.0 2:37.13 ./vcpu
14645 root      20   0    2448    868    784 R  21.0   0.0 2:36.51 ./vcpu
14589 root      20   0    2448    900    816 R  20.0   0.0 2:45.16 ./vcpu
14608 root      20   0    2448    956    872 R  20.0   0.0 2:42.24 ./vcpu
14632 root      20   0    2448    872    788 R  20.0   0.0 2:38.08 ./vcpu
14638 root      20   0    2448    924    840 R  20.0   0.0 2:37.48 ./vcpu
14652 root      20   0    2448    928    844 R  20.0   0.0 2:36.42 ./vcpu
14654 root      20   0    2448    924    840 R  20.0   0.0 2:36.14 ./vcpu
14663 root      20   0    2448    900    816 R  20.0   0.0 2:35.38 ./vcpu
14669 root      20   0    2448    868    784 R  20.0   0.0 2:35.70 ./vcpu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Question] sched：the load is unbalanced in the VM overcommitment scenario
  2024-09-13  4:03     ` [Question] sched：the load is unbalanced in the VM overcommitment scenario zhengzucheng
@ 2024-09-13 15:55       ` Vincent Guittot
  2024-09-14  7:03         ` zhengzucheng
  2024-09-13 17:17       ` Waiman Long
  1 sibling, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2024-09-13 15:55 UTC (permalink / raw)
  To: zhengzucheng
  Cc: Waiman Long, peterz, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, oleg, Frederic Weisbecker, mingo,
	peterx, tj, tjcao980311, linux-kernel

On Fri, 13 Sept 2024 at 06:03, zhengzucheng <zhengzucheng@huawei.com> wrote:
>
> In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8
> CPUs are overcommitted to 2 x 8u VMs,
> and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs
> resources, the other VM has 6 CPUs.
> The host is configured with 80 CPUs in a sched domain and other CPUs are
> in the idle state.
> The root cause is that the load of the host is unbalanced, some vCPUs
> exclusively occupy CPU resources.
> when the CPU that triggers load balance calculates imbalance value,
> env->imbalance = 0 is calculated because of
> local->avg_load > sds->avg_load. As a result, the load balance fails.
> The processing logic:
> https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70
>
>
> It's normal from kernel load balance, but it's not reasonable from the
> perspective of VM users.
> In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule
> domain to fix it.
> Is there any other method to fix this problem? thanks.

I'm not sure how to understand your setup and why the load balance is
not balancing correctly 16 vCPU between the 8 CPUs.

From your test case description below,  you have 8 always running
threads in cgroup A and 8 always running threads in cgroup B and the 2
cgroups have only 8 CPUs among 80. This should not be a problem for
load balance. I tried something similar although not exactly the same
with cgroupv2 and rt-app and I don't have noticeable imbalance

Do you have more details that you can share about your system ?

Which kernel version are you using ? Which arch ?

>
> Abstracted reproduction case：
> 1.environment information：
>
> [root@localhost ~]# cat /proc/schedstat
>
> cpu0
> domain0 00000000,00000000,00010000,00000000,00000001
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> cpu1
> domain0 00000000,00000000,00020000,00000000,00000002
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> cpu2
> domain0 00000000,00000000,00040000,00000000,00000004
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> cpu3
> domain0 00000000,00000000,00080000,00000000,00000008
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff

Is it correct to assume that domain0 is SMT, domain1 MC and domain2 PKG  ?
 and cpu80-83 are in the other group of PKG ? and LLC is at domain1 level ?

>
> 2.test case:
>
> vcpu.c
> #include <stdio.h>
> #include <unistd.h>
>
> int main()
> {
>          sleep(20);
>          while (1);
>          return 0;
> }
>
> gcc vcpu.c -o vcpu
> -----------------------------------------------------------------
> test.sh
>
> #!/bin/bash
>
> #vcpu1
> mkdir /sys/fs/cgroup/cpuset/vcpu_1
> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
> echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
> for i in {1..8}
> do
>          ./vcpu &
>          pid=$!
>          sleep 1
>          echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
> done
>
> #vcpu2
> mkdir /sys/fs/cgroup/cpuset/vcpu_2
> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
> echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
> for i in {1..8}
> do
>          ./vcpu &
>          pid=$!
>          sleep 1
>          echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
> done
> ------------------------------------------------------------------
> [root@localhost ~]# ./test.sh
>
> [root@localhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)
>
> 14591 root      20   0    2448   1012    928 R 100.0   0.0 13:10.73 ./vcpu
> 14582 root      20   0    2448   1012    928 R 100.0   0.0 13:12.71 ./vcpu
> 14606 root      20   0    2448    872    784 R 100.0   0.0 13:09.72 ./vcpu
> 14620 root      20   0    2448    916    832 R 100.0   0.0 13:07.72 ./vcpu
> 14622 root      20   0    2448    920    836 R 100.0   0.0 13:06.72 ./vcpu
> 14629 root      20   0    2448    920    832 R 100.0   0.0 13:05.72 ./vcpu
> 14643 root      20   0    2448    924    836 R  21.0   0.0 2:37.13 ./vcpu
> 14645 root      20   0    2448    868    784 R  21.0   0.0 2:36.51 ./vcpu
> 14589 root      20   0    2448    900    816 R  20.0   0.0 2:45.16 ./vcpu
> 14608 root      20   0    2448    956    872 R  20.0   0.0 2:42.24 ./vcpu
> 14632 root      20   0    2448    872    788 R  20.0   0.0 2:38.08 ./vcpu
> 14638 root      20   0    2448    924    840 R  20.0   0.0 2:37.48 ./vcpu
> 14652 root      20   0    2448    928    844 R  20.0   0.0 2:36.42 ./vcpu
> 14654 root      20   0    2448    924    840 R  20.0   0.0 2:36.14 ./vcpu
> 14663 root      20   0    2448    900    816 R  20.0   0.0 2:35.38 ./vcpu
> 14669 root      20   0    2448    868    784 R  20.0   0.0 2:35.70 ./vcpu
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Question] sched：the load is unbalanced in the VM overcommitment scenario
  2024-09-13 15:55       ` Vincent Guittot
@ 2024-09-14  7:03         ` zhengzucheng
  2024-09-17  6:19           ` Vincent Guittot
  0 siblings, 1 reply; 19+ messages in thread
From: zhengzucheng @ 2024-09-14  7:03 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Waiman Long, peterz, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, oleg, Frederic Weisbecker, mingo,
	peterx, tj, tjcao980311, linux-kernel


在 2024/9/13 23:55, Vincent Guittot 写道:
> On Fri, 13 Sept 2024 at 06:03, zhengzucheng <zhengzucheng@huawei.com> wrote:
>> In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8
>> CPUs are overcommitted to 2 x 8u VMs,
>> and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs
>> resources, the other VM has 6 CPUs.
>> The host is configured with 80 CPUs in a sched domain and other CPUs are
>> in the idle state.
>> The root cause is that the load of the host is unbalanced, some vCPUs
>> exclusively occupy CPU resources.
>> when the CPU that triggers load balance calculates imbalance value,
>> env->imbalance = 0 is calculated because of
>> local->avg_load > sds->avg_load. As a result, the load balance fails.
>> The processing logic:
>> https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70
>>
>>
>> It's normal from kernel load balance, but it's not reasonable from the
>> perspective of VM users.
>> In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule
>> domain to fix it.
>> Is there any other method to fix this problem? thanks.
> I'm not sure how to understand your setup and why the load balance is
> not balancing correctly 16 vCPU between the 8 CPUs.
>
> >From your test case description below,  you have 8 always running
> threads in cgroup A and 8 always running threads in cgroup B and the 2
> cgroups have only 8 CPUs among 80. This should not be a problem for
> load balance. I tried something similar although not exactly the same
> with cgroupv2 and rt-app and I don't have noticeable imbalance
>
> Do you have more details that you can share about your system ?
>
> Which kernel version are you using ? Which arch ?

kernel version: 6.11.0-rc7
arch: X86_64 and cgroup v1

>> Abstracted reproduction case：
>> 1.environment information：
>>
>> [root@localhost ~]# cat /proc/schedstat
>>
>> cpu0
>> domain0 00000000,00000000,00010000,00000000,00000001
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>> cpu1
>> domain0 00000000,00000000,00020000,00000000,00000002
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>> cpu2
>> domain0 00000000,00000000,00040000,00000000,00000004
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>> cpu3
>> domain0 00000000,00000000,00080000,00000000,00000008
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> Is it correct to assume that domain0 is SMT, domain1 MC and domain2 PKG  ?
>   and cpu80-83 are in the other group of PKG ? and LLC is at domain1 level ?

domain0 is SMT and domain1 is MC
thread_siblings_list:0,80. 1,81. 2,82. 3,83
LLC is at domain1 level

>> 2.test case:
>>
>> vcpu.c
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> int main()
>> {
>>           sleep(20);
>>           while (1);
>>           return 0;
>> }
>>
>> gcc vcpu.c -o vcpu
>> -----------------------------------------------------------------
>> test.sh
>>
>> #!/bin/bash
>>
>> #vcpu1
>> mkdir /sys/fs/cgroup/cpuset/vcpu_1
>> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
>> echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
>> for i in {1..8}
>> do
>>           ./vcpu &
>>           pid=$!
>>           sleep 1
>>           echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
>> done
>>
>> #vcpu2
>> mkdir /sys/fs/cgroup/cpuset/vcpu_2
>> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
>> echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
>> for i in {1..8}
>> do
>>           ./vcpu &
>>           pid=$!
>>           sleep 1
>>           echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
>> done
>> ------------------------------------------------------------------
>> [root@localhost ~]# ./test.sh
>>
>> [root@localhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)
>>
>> 14591 root      20   0    2448   1012    928 R 100.0   0.0 13:10.73 ./vcpu
>> 14582 root      20   0    2448   1012    928 R 100.0   0.0 13:12.71 ./vcpu
>> 14606 root      20   0    2448    872    784 R 100.0   0.0 13:09.72 ./vcpu
>> 14620 root      20   0    2448    916    832 R 100.0   0.0 13:07.72 ./vcpu
>> 14622 root      20   0    2448    920    836 R 100.0   0.0 13:06.72 ./vcpu
>> 14629 root      20   0    2448    920    832 R 100.0   0.0 13:05.72 ./vcpu
>> 14643 root      20   0    2448    924    836 R  21.0   0.0 2:37.13 ./vcpu
>> 14645 root      20   0    2448    868    784 R  21.0   0.0 2:36.51 ./vcpu
>> 14589 root      20   0    2448    900    816 R  20.0   0.0 2:45.16 ./vcpu
>> 14608 root      20   0    2448    956    872 R  20.0   0.0 2:42.24 ./vcpu
>> 14632 root      20   0    2448    872    788 R  20.0   0.0 2:38.08 ./vcpu
>> 14638 root      20   0    2448    924    840 R  20.0   0.0 2:37.48 ./vcpu
>> 14652 root      20   0    2448    928    844 R  20.0   0.0 2:36.42 ./vcpu
>> 14654 root      20   0    2448    924    840 R  20.0   0.0 2:36.14 ./vcpu
>> 14663 root      20   0    2448    900    816 R  20.0   0.0 2:35.38 ./vcpu
>> 14669 root      20   0    2448    868    784 R  20.0   0.0 2:35.70 ./vcpu
>>
> .

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Question] sched：the load is unbalanced in the VM overcommitment scenario
  2024-09-14  7:03         ` zhengzucheng
@ 2024-09-17  6:19           ` Vincent Guittot
  0 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-09-17  6:19 UTC (permalink / raw)
  To: zhengzucheng
  Cc: Waiman Long, peterz, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, oleg, Frederic Weisbecker, mingo,
	peterx, tj, tjcao980311, linux-kernel

On Sat, 14 Sept 2024 at 09:04, zhengzucheng <zhengzucheng@huawei.com> wrote:
>
>
> 在 2024/9/13 23:55, Vincent Guittot 写道:
> > On Fri, 13 Sept 2024 at 06:03, zhengzucheng <zhengzucheng@huawei.com> wrote:
> >> In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8
> >> CPUs are overcommitted to 2 x 8u VMs,
> >> and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs
> >> resources, the other VM has 6 CPUs.
> >> The host is configured with 80 CPUs in a sched domain and other CPUs are
> >> in the idle state.
> >> The root cause is that the load of the host is unbalanced, some vCPUs
> >> exclusively occupy CPU resources.
> >> when the CPU that triggers load balance calculates imbalance value,
> >> env->imbalance = 0 is calculated because of
> >> local->avg_load > sds->avg_load. As a result, the load balance fails.
> >> The processing logic:
> >> https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70
> >>
> >>
> >> It's normal from kernel load balance, but it's not reasonable from the
> >> perspective of VM users.
> >> In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule
> >> domain to fix it.
> >> Is there any other method to fix this problem? thanks.
> > I'm not sure how to understand your setup and why the load balance is
> > not balancing correctly 16 vCPU between the 8 CPUs.
> >
> > >From your test case description below,  you have 8 always running
> > threads in cgroup A and 8 always running threads in cgroup B and the 2
> > cgroups have only 8 CPUs among 80. This should not be a problem for
> > load balance. I tried something similar although not exactly the same
> > with cgroupv2 and rt-app and I don't have noticeable imbalance
> >
> > Do you have more details that you can share about your system ?
> >
> > Which kernel version are you using ? Which arch ?
>
> kernel version: 6.11.0-rc7
> arch: X86_64 and cgroup v1

okay

>
> >> Abstracted reproduction case：
> >> 1.environment information：
> >>
> >> [root@localhost ~]# cat /proc/schedstat
> >>
> >> cpu0
> >> domain0 00000000,00000000,00010000,00000000,00000001
> >> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> >> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> >> cpu1
> >> domain0 00000000,00000000,00020000,00000000,00000002
> >> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> >> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> >> cpu2
> >> domain0 00000000,00000000,00040000,00000000,00000004
> >> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> >> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> >> cpu3
> >> domain0 00000000,00000000,00080000,00000000,00000008
> >> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> >> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> > Is it correct to assume that domain0 is SMT, domain1 MC and domain2 PKG  ?
> >   and cpu80-83 are in the other group of PKG ? and LLC is at domain1 level ?
>
> domain0 is SMT and domain1 is MC
> thread_siblings_list:0,80. 1,81. 2,82. 3,83

Yeah, I should have read more carefully the domain0 cpumask

> LLC is at domain1 level
>
> >> 2.test case:
> >>
> >> vcpu.c
> >> #include <stdio.h>
> >> #include <unistd.h>
> >>
> >> int main()
> >> {
> >>           sleep(20);
> >>           while (1);
> >>           return 0;
> >> }
> >>
> >> gcc vcpu.c -o vcpu
> >> -----------------------------------------------------------------
> >> test.sh
> >>
> >> #!/bin/bash
> >>
> >> #vcpu1
> >> mkdir /sys/fs/cgroup/cpuset/vcpu_1
> >> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
> >> echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
> >> for i in {1..8}
> >> do
> >>           ./vcpu &
> >>           pid=$!
> >>           sleep 1
> >>           echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
> >> done
> >>
> >> #vcpu2
> >> mkdir /sys/fs/cgroup/cpuset/vcpu_2
> >> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
> >> echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
> >> for i in {1..8}
> >> do
> >>           ./vcpu &
> >>           pid=$!
> >>           sleep 1
> >>           echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
> >> done
> >> ------------------------------------------------------------------
> >> [root@localhost ~]# ./test.sh
> >>
> >> [root@localhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)
> >>
> >> 14591 root      20   0    2448   1012    928 R 100.0   0.0 13:10.73 ./vcpu
> >> 14582 root      20   0    2448   1012    928 R 100.0   0.0 13:12.71 ./vcpu
> >> 14606 root      20   0    2448    872    784 R 100.0   0.0 13:09.72 ./vcpu
> >> 14620 root      20   0    2448    916    832 R 100.0   0.0 13:07.72 ./vcpu
> >> 14622 root      20   0    2448    920    836 R 100.0   0.0 13:06.72 ./vcpu
> >> 14629 root      20   0    2448    920    832 R 100.0   0.0 13:05.72 ./vcpu
> >> 14643 root      20   0    2448    924    836 R  21.0   0.0 2:37.13 ./vcpu
> >> 14645 root      20   0    2448    868    784 R  21.0   0.0 2:36.51 ./vcpu
> >> 14589 root      20   0    2448    900    816 R  20.0   0.0 2:45.16 ./vcpu
> >> 14608 root      20   0    2448    956    872 R  20.0   0.0 2:42.24 ./vcpu
> >> 14632 root      20   0    2448    872    788 R  20.0   0.0 2:38.08 ./vcpu
> >> 14638 root      20   0    2448    924    840 R  20.0   0.0 2:37.48 ./vcpu
> >> 14652 root      20   0    2448    928    844 R  20.0   0.0 2:36.42 ./vcpu
> >> 14654 root      20   0    2448    924    840 R  20.0   0.0 2:36.14 ./vcpu
> >> 14663 root      20   0    2448    900    816 R  20.0   0.0 2:35.38 ./vcpu
> >> 14669 root      20   0    2448    868    784 R  20.0   0.0 2:35.70 ./vcpu
> >>

So I finally understood your situation. The limited cpuset screws up
the avg load of system for domain1. The group_imbalanced state is
there to try to fix an imbalanced situation related to tasks that are
pinned to a subset of CPUs of the sched domain. But this can't cover
all cases.

> > .

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Question] sched：the load is unbalanced in the VM overcommitment scenario
  2024-09-13  4:03     ` [Question] sched：the load is unbalanced in the VM overcommitment scenario zhengzucheng
  2024-09-13 15:55       ` Vincent Guittot
@ 2024-09-13 17:17       ` Waiman Long
  2024-09-14  2:15         ` zhengzucheng
  1 sibling, 1 reply; 19+ messages in thread
From: Waiman Long @ 2024-09-13 17:17 UTC (permalink / raw)
  To: zhengzucheng, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, oleg,
	Frederic Weisbecker, mingo, peterx, tj, tjcao980311
  Cc: linux-kernel

On 9/13/24 00:03, zhengzucheng wrote:
> In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8 
> CPUs are overcommitted to 2 x 8u VMs,
> and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs 
> resources, the other VM has 6 CPUs.
> The host is configured with 80 CPUs in a sched domain and other CPUs 
> are in the idle state.
> The root cause is that the load of the host is unbalanced, some vCPUs 
> exclusively occupy CPU resources.
> when the CPU that triggers load balance calculates imbalance value, 
> env->imbalance = 0 is calculated because of
> local->avg_load > sds->avg_load. As a result, the load balance fails.
> The processing logic: 
> https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70
>
>
> It's normal from kernel load balance, but it's not reasonable from the 
> perspective of VM users.
> In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule 
> domain to fix it.
> Is there any other method to fix this problem? thanks.
>
> Abstracted reproduction case：
> 1.environment information：
>
> [root@localhost ~]# cat /proc/schedstat
>
> cpu0
> domain0 00000000,00000000,00010000,00000000,00000001
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> cpu1
> domain0 00000000,00000000,00020000,00000000,00000002
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> cpu2
> domain0 00000000,00000000,00040000,00000000,00000004
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> cpu3
> domain0 00000000,00000000,00080000,00000000,00000008
> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>
> 2.test case:
>
> vcpu.c
> #include <stdio.h>
> #include <unistd.h>
>
> int main()
> {
>         sleep(20);
>         while (1);
>         return 0;
> }
>
> gcc vcpu.c -o vcpu
> -----------------------------------------------------------------
> test.sh
>
> #!/bin/bash
>
> #vcpu1
> mkdir /sys/fs/cgroup/cpuset/vcpu_1
> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
> echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
> for i in {1..8}
> do
>         ./vcpu &
>         pid=$!
>         sleep 1
>         echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
> done
>
> #vcpu2
> mkdir /sys/fs/cgroup/cpuset/vcpu_2
> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
> echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
> for i in {1..8}
> do
>         ./vcpu &
>         pid=$!
>         sleep 1
>         echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
> done
> ------------------------------------------------------------------
> [root@localhost ~]# ./test.sh
>
> [root@localhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)
>
> 14591 root      20   0    2448   1012    928 R 100.0   0.0 13:10.73 
> ./vcpu
> 14582 root      20   0    2448   1012    928 R 100.0   0.0 13:12.71 
> ./vcpu
> 14606 root      20   0    2448    872    784 R 100.0   0.0 13:09.72 
> ./vcpu
> 14620 root      20   0    2448    916    832 R 100.0   0.0 13:07.72 
> ./vcpu
> 14622 root      20   0    2448    920    836 R 100.0   0.0 13:06.72 
> ./vcpu
> 14629 root      20   0    2448    920    832 R 100.0   0.0 13:05.72 
> ./vcpu
> 14643 root      20   0    2448    924    836 R  21.0   0.0 2:37.13 ./vcpu
> 14645 root      20   0    2448    868    784 R  21.0   0.0 2:36.51 ./vcpu
> 14589 root      20   0    2448    900    816 R  20.0   0.0 2:45.16 ./vcpu
> 14608 root      20   0    2448    956    872 R  20.0   0.0 2:42.24 ./vcpu
> 14632 root      20   0    2448    872    788 R  20.0   0.0 2:38.08 ./vcpu
> 14638 root      20   0    2448    924    840 R  20.0   0.0 2:37.48 ./vcpu
> 14652 root      20   0    2448    928    844 R  20.0   0.0 2:36.42 ./vcpu
> 14654 root      20   0    2448    924    840 R  20.0   0.0 2:36.14 ./vcpu
> 14663 root      20   0    2448    900    816 R  20.0   0.0 2:35.38 ./vcpu
> 14669 root      20   0    2448    868    784 R  20.0   0.0 2:35.70 ./vcpu
>
Your script creates two cpusets with the same set of CPUs. The 
scheduling aspect of the tasks, however, are not controlled by cpuset. 
It is controlled by cpu cgroup. I suppose that all these tasks are in 
the same cpu cgroup. It is possible that commit you mentioned might have 
caused some unfairness in allocating CPU time to different processes 
within the same cpu cgroup. Maybe you can try to put them into separate 
cpu cgroups as well with equal weight to see if that can improve the 
scheduling fairness?

BTW, you don't actually need to use 2 different cpusets if they all get 
the same set of CPUs and memory nodes. Also setting 
cpuset.sched_load_balance=0 may not actually get what you want unless 
all the cpusets that use those CPUs have cpuset.sched_load_balance set 0 
including the root cgroup. Turning off this flag may disable load 
balancing, but it may not guarantee fairness depending on what CPUs are 
being used by those tasks when they start unless you explicitly assign 
the CPUs to them when starting these tasks.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Question] sched：the load is unbalanced in the VM overcommitment scenario
  2024-09-13 17:17       ` Waiman Long
@ 2024-09-14  2:15         ` zhengzucheng
  0 siblings, 0 replies; 19+ messages in thread
From: zhengzucheng @ 2024-09-14  2:15 UTC (permalink / raw)
  To: Waiman Long, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, oleg,
	Frederic Weisbecker, mingo, peterx, tj, tjcao980311
  Cc: linux-kernel


在 2024/9/14 1:17, Waiman Long 写道:
> you don't actually need to use 2 different cpusets if they all get the 
> same set of CPUs and memory nodes

Yes, you're right. The purpose of setting two different cpusets is to 
simulate the scenario of two VMs.

each cpuset is a VM.

For example, the VM configuration is as follows:

<domain type='kvm' id='12676'>
   <name>master</name>
   <vcpu placement='static' cpuset='0-3,80-83'>8</vcpu>
   <iothreads>1</iothreads>
   <iothreadids>
     <iothread id='1'/>
   </iothreadids>
   <cputune>
     <vcpupin vcpu='0' cpuset='0-3,80-83'/>
     <vcpupin vcpu='1' cpuset='0-3,80-83'/>
     <vcpupin vcpu='2' cpuset='0-3,80-83'/>
     <vcpupin vcpu='3' cpuset='0-3,80-83'/>
     <vcpupin vcpu='4' cpuset='0-3,80-83'/>
     <vcpupin vcpu='5' cpuset='0-3,80-83'/>
     <vcpupin vcpu='6' cpuset='0-3,80-83'/>
     <vcpupin vcpu='7' cpuset='0-3,80-83'/>
     <emulatorpin cpuset='0-79'/>
   </cputune>
   <numatune>
     <memory mode='strict' nodeset='0'/>
     <memnode cellid='0' mode='strict' nodeset='0'/>
   </numatune>




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-09-17  6:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-25 12:03 [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime Zheng Zucheng
2024-07-25 14:05 ` Peter Zijlstra
2024-07-25 14:49   ` zhengzucheng
2024-07-25 15:14     ` Peter Zijlstra
2024-07-26  2:32 ` [PATCH v2 " Zheng Zucheng
2024-07-26 10:44   ` Oleg Nesterov
2024-07-26 13:04     ` Oleg Nesterov
2024-07-26 13:08       ` Peter Zijlstra
2024-07-26 13:30         ` Oleg Nesterov
2024-07-30  6:55     ` Uwe Kleine-König
2024-07-29 10:34   ` [tip: sched/core] " tip-bot2 for Zheng Zucheng
2024-09-02  1:56 ` [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu？ zhengzucheng
2024-09-02  3:00   ` Waiman Long
2024-09-13  4:03     ` [Question] sched：the load is unbalanced in the VM overcommitment scenario zhengzucheng
2024-09-13 15:55       ` Vincent Guittot
2024-09-14  7:03         ` zhengzucheng
2024-09-17  6:19           ` Vincent Guittot
2024-09-13 17:17       ` Waiman Long
2024-09-14  2:15         ` zhengzucheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox