* [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
@ 2022-08-10 22:33 Libo Chen
2022-08-15 11:01 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Libo Chen @ 2022-08-10 22:33 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann,
mgorman
Cc: linux-kernel
There are scenarios where non-affine wakeups are incorrectly counted as
affine wakeups by schedstats.
When wake_affine_idle() returns prev_cpu which doesn't equal to
nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
in wake_affine() and be counted as if target == this_cpu in schedstats.
Replace target == nr_cpumask_bits with target != this_cpu to make sure
affine wakeups are accurately tallied.
Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..b179da4f8105 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
schedstat_inc(p->stats.nr_wakeups_affine_attempts);
- if (target == nr_cpumask_bits)
+ if (target != this_cpu)
return prev_cpu;
schedstat_inc(sd->ttwu_move_affine);
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen @ 2022-08-15 11:01 ` Peter Zijlstra 2022-08-15 19:19 ` Libo Chen 2022-08-25 7:30 ` Gautham R. Shenoy 2023-04-06 10:05 ` [tip: sched/core] " tip-bot2 for Libo Chen 2 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2022-08-15 11:01 UTC (permalink / raw) To: Libo Chen Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman, linux-kernel On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: > There are scenarios where non-affine wakeups are incorrectly counted as > affine wakeups by schedstats. > > When wake_affine_idle() returns prev_cpu which doesn't equal to > nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits > in wake_affine() and be counted as if target == this_cpu in schedstats. > > Replace target == nr_cpumask_bits with target != this_cpu to make sure > affine wakeups are accurately tallied. > > Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) > Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> > Signed-off-by: Libo Chen <libo.chen@oracle.com> > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index da388657d5ac..b179da4f8105 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, > target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); > > schedstat_inc(p->stats.nr_wakeups_affine_attempts); > - if (target == nr_cpumask_bits) > + if (target != this_cpu) > return prev_cpu; > > schedstat_inc(sd->ttwu_move_affine); This not only changes the accounting but also the placement, no? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-15 11:01 ` Peter Zijlstra @ 2022-08-15 19:19 ` Libo Chen 2022-08-17 13:20 ` Vincent Guittot 2023-01-09 22:00 ` Libo Chen 0 siblings, 2 replies; 10+ messages in thread From: Libo Chen @ 2022-08-15 19:19 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman, linux-kernel, Daniel Jordan On 8/15/22 04:01, Peter Zijlstra wrote: > On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: >> There are scenarios where non-affine wakeups are incorrectly counted as >> affine wakeups by schedstats. >> >> When wake_affine_idle() returns prev_cpu which doesn't equal to >> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits >> in wake_affine() and be counted as if target == this_cpu in schedstats. >> >> Replace target == nr_cpumask_bits with target != this_cpu to make sure >> affine wakeups are accurately tallied. >> >> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) >> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> >> Signed-off-by: Libo Chen <libo.chen@oracle.com> >> --- >> kernel/sched/fair.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index da388657d5ac..b179da4f8105 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, >> target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); >> >> schedstat_inc(p->stats.nr_wakeups_affine_attempts); >> - if (target == nr_cpumask_bits) >> + if (target != this_cpu) >> return prev_cpu; >> >> schedstat_inc(sd->ttwu_move_affine); > This not only changes the accounting but also the placement, no? No, it should only change the accounting. wake_affine() still returns prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same behavior as before. Libo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-15 19:19 ` Libo Chen @ 2022-08-17 13:20 ` Vincent Guittot 2023-01-09 22:00 ` Libo Chen 1 sibling, 0 replies; 10+ messages in thread From: Vincent Guittot @ 2022-08-17 13:20 UTC (permalink / raw) To: Libo Chen Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, mgorman, linux-kernel, Daniel Jordan On Mon, 15 Aug 2022 at 21:19, Libo Chen <libo.chen@oracle.com> wrote: > > > > On 8/15/22 04:01, Peter Zijlstra wrote: > > On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: > >> There are scenarios where non-affine wakeups are incorrectly counted as > >> affine wakeups by schedstats. > >> > >> When wake_affine_idle() returns prev_cpu which doesn't equal to > >> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits > >> in wake_affine() and be counted as if target == this_cpu in schedstats. > >> > >> Replace target == nr_cpumask_bits with target != this_cpu to make sure > >> affine wakeups are accurately tallied. > >> > >> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) > >> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> > >> Signed-off-by: Libo Chen <libo.chen@oracle.com> > >> --- > >> kernel/sched/fair.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index da388657d5ac..b179da4f8105 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, > >> target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); > >> > >> schedstat_inc(p->stats.nr_wakeups_affine_attempts); > >> - if (target == nr_cpumask_bits) > >> + if (target != this_cpu) > >> return prev_cpu; > >> > >> schedstat_inc(sd->ttwu_move_affine); > > This not only changes the accounting but also the placement, no? > No, it should only change the accounting. wake_affine() still returns > prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same > behavior as before. Looks good to me Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> > > > Libo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-15 19:19 ` Libo Chen 2022-08-17 13:20 ` Vincent Guittot @ 2023-01-09 22:00 ` Libo Chen 2023-03-09 3:17 ` Libo Chen 1 sibling, 1 reply; 10+ messages in thread From: Libo Chen @ 2023-01-09 22:00 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman, linux-kernel, Daniel Jordan Hi Peter, A gentle ping~ Vincent has signed it off. Let me know what else I should do for this patch. Libo On 8/15/22 12:19 PM, Libo Chen wrote: > > > On 8/15/22 04:01, Peter Zijlstra wrote: >> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: >>> There are scenarios where non-affine wakeups are incorrectly counted as >>> affine wakeups by schedstats. >>> >>> When wake_affine_idle() returns prev_cpu which doesn't equal to >>> nr_cpumask_bits, it will slip through the check: target == >>> nr_cpumask_bits >>> in wake_affine() and be counted as if target == this_cpu in schedstats. >>> >>> Replace target == nr_cpumask_bits with target != this_cpu to make sure >>> affine wakeups are accurately tallied. >>> >>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is >>> idle) >>> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> >>> Signed-off-by: Libo Chen <libo.chen@oracle.com> >>> --- >>> kernel/sched/fair.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index da388657d5ac..b179da4f8105 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain >>> *sd, struct task_struct *p, >>> target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); >>> schedstat_inc(p->stats.nr_wakeups_affine_attempts); >>> - if (target == nr_cpumask_bits) >>> + if (target != this_cpu) >>> return prev_cpu; >>> schedstat_inc(sd->ttwu_move_affine); >> This not only changes the accounting but also the placement, no? > No, it should only change the accounting. wake_affine() still returns > prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same > behavior as before. > > > Libo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2023-01-09 22:00 ` Libo Chen @ 2023-03-09 3:17 ` Libo Chen 0 siblings, 0 replies; 10+ messages in thread From: Libo Chen @ 2023-03-09 3:17 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo@redhat.com, vincent.guittot@linaro.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, mgorman@suse.de, linux-kernel@vger.kernel.org, Daniel Jordan > On Jan 9, 2023, at 2:00 PM, Libo Chen <libo.chen@oracle.com> wrote: > > Hi Peter, > > A gentle ping~ Vincent has signed it off. Let me know what else I should do for this patch. > > Libo > > On 8/15/22 12:19 PM, Libo Chen wrote: >> >> >> On 8/15/22 04:01, Peter Zijlstra wrote: >>> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: >>>> There are scenarios where non-affine wakeups are incorrectly counted as >>>> affine wakeups by schedstats. >>>> >>>> When wake_affine_idle() returns prev_cpu which doesn't equal to >>>> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits >>>> in wake_affine() and be counted as if target == this_cpu in schedstats. >>>> >>>> Replace target == nr_cpumask_bits with target != this_cpu to make sure >>>> affine wakeups are accurately tallied. >>>> >>>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) >>>> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> >>>> Signed-off-by: Libo Chen <libo.chen@oracle.com> >>>> --- >>>> kernel/sched/fair.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>> index da388657d5ac..b179da4f8105 100644 >>>> --- a/kernel/sched/fair.c >>>> +++ b/kernel/sched/fair.c >>>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, >>>> target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); >>>> schedstat_inc(p->stats.nr_wakeups_affine_attempts); >>>> - if (target == nr_cpumask_bits) >>>> + if (target != this_cpu) >>>> return prev_cpu; >>>> schedstat_inc(sd->ttwu_move_affine); >>> This not only changes the accounting but also the placement, no? >> No, it should only change the accounting. wake_affine() still returns prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same behavior as before. >> Hi Peter, A second ping in case you missed the first one, what else should I do to get this fix in? Libo >> >> Libo > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen 2022-08-15 11:01 ` Peter Zijlstra @ 2022-08-25 7:30 ` Gautham R. Shenoy 2022-08-25 9:13 ` Libo Chen 2023-04-06 10:05 ` [tip: sched/core] " tip-bot2 for Libo Chen 2 siblings, 1 reply; 10+ messages in thread From: Gautham R. Shenoy @ 2022-08-25 7:30 UTC (permalink / raw) To: Libo Chen Cc: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman, linux-kernel On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: > There are scenarios where non-affine wakeups are incorrectly counted as > affine wakeups by schedstats. > > When wake_affine_idle() returns prev_cpu which doesn't equal to > nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits > in wake_affine() and be counted as if target == this_cpu in schedstats. > > Replace target == nr_cpumask_bits with target != this_cpu to make sure > affine wakeups are accurately tallied. > > Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) > Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> > Signed-off-by: Libo Chen <libo.chen@oracle.com> > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index da388657d5ac..b179da4f8105 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, > target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); > > schedstat_inc(p->stats.nr_wakeups_affine_attempts); > - if (target == nr_cpumask_bits) > + if (target != this_cpu) > return prev_cpu; This seems to be the right thing to do. However,.. if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu, technically is it still not an affine wakeup? > > schedstat_inc(sd->ttwu_move_affine); > -- > 2.31.1 > -- Thanks and Regards gautham. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-25 7:30 ` Gautham R. Shenoy @ 2022-08-25 9:13 ` Libo Chen 2023-03-30 10:39 ` Gautham R. Shenoy 0 siblings, 1 reply; 10+ messages in thread From: Libo Chen @ 2022-08-25 9:13 UTC (permalink / raw) To: Gautham R. Shenoy Cc: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman, linux-kernel, Daniel Jordan On 8/25/22 00:30, Gautham R. Shenoy wrote: > On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: >> There are scenarios where non-affine wakeups are incorrectly counted as >> affine wakeups by schedstats. >> >> When wake_affine_idle() returns prev_cpu which doesn't equal to >> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits >> in wake_affine() and be counted as if target == this_cpu in schedstats. >> >> Replace target == nr_cpumask_bits with target != this_cpu to make sure >> affine wakeups are accurately tallied. >> >> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) >> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> >> Signed-off-by: Libo Chen <libo.chen@oracle.com> >> --- >> kernel/sched/fair.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index da388657d5ac..b179da4f8105 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, >> target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); >> >> schedstat_inc(p->stats.nr_wakeups_affine_attempts); >> - if (target == nr_cpumask_bits) >> + if (target != this_cpu) >> return prev_cpu; > > This seems to be the right thing to do. However,.. > > if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu, > technically is it still not an affine wakeup? > I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined within a sched domain, so if the waking CPU and the previous CPU are in the same MC domain, then picking the previous CPU is a remote wakeup within that MC. If the two candidate CPUs are from two different NUMA nodes, then picking the waking CPU is an affine wakeup within that NUMA domain. Correct me if I am wrong, this definition is consistent across all levels of sched domains. But I do understand that when two candidate CPUs are within an LLC, a) all the fast-path wakeups should be affine wakeups if your definition of an affine wakeup is a wakeup to the same LLC of the waker. b) select_idle_sibling() may pick a CPU in that LLC other than the two candidate CPUs which makes the affine/remote stats here useless even if we are consistent with ttwu_move_affine/ttwu_wake_remote definition. I personally think it's just too much trouble to add additional code in the kernel to, let's say, treat all wakeups within an LLC as ttwu_move_affine. It's a lot easier to do that when you process schedstats data, whether you want to treat all wakeups in LLC domains as affine wakeups or ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains. >> >> schedstat_inc(sd->ttwu_move_affine); >> -- >> 2.31.1 >> > -- > Thanks and Regards > gautham. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-25 9:13 ` Libo Chen @ 2023-03-30 10:39 ` Gautham R. Shenoy 0 siblings, 0 replies; 10+ messages in thread From: Gautham R. Shenoy @ 2023-03-30 10:39 UTC (permalink / raw) To: Libo Chen Cc: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman, linux-kernel, Daniel Jordan Hello Libo, On Thu, Aug 25, 2022 at 02:13:17AM -0700, Libo Chen wrote: > Sorry, looks like this message got burried under the pile. > > On 8/25/22 00:30, Gautham R. Shenoy wrote: > > On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote: > > > There are scenarios where non-affine wakeups are incorrectly counted as > > > affine wakeups by schedstats. > > > > > > When wake_affine_idle() returns prev_cpu which doesn't equal to > > > nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits > > > in wake_affine() and be counted as if target == this_cpu in schedstats. > > > > > > Replace target == nr_cpumask_bits with target != this_cpu to make sure > > > affine wakeups are accurately tallied. > > > > > > Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) > > > Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> > > > Signed-off-by: Libo Chen <libo.chen@oracle.com> > > > --- > > > kernel/sched/fair.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index da388657d5ac..b179da4f8105 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, > > > target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); > > > schedstat_inc(p->stats.nr_wakeups_affine_attempts); > > > - if (target == nr_cpumask_bits) > > > + if (target != this_cpu) > > > return prev_cpu; > > > > This seems to be the right thing to do. However,.. > > > > if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu, > > technically is it still not an affine wakeup? > > > I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined within > a sched domain, so if the waking CPU and the previous CPU are in the same MC > domain, then picking the previous CPU is a remote wakeup > within that MC. If the two candidate CPUs are from two different NUMA nodes, > then picking the waking CPU is an affine wakeup within that NUMA domain. > Correct me if I am wrong, this definition is consistent across > all levels of sched domains. Yes, the definition of ttwu_wake_remote in the lowest sched-domain containing both the prev_cpu and this_cpu, is target_cpu != this_cpu. This is fairly unambiguous. From the code, the definition of an ttwu_move_affine is to capture an _attempt_ to chose the this_cpu as the target_cpu in the lowest sched-domain containing both prev CPU and this_cpu. It is merely an attempt since the actual target_CPU is selected by SIS and could be any idle CPU in the LLC of the prev/this_cpu. ttwu_move_affine makes sense for sched-domains higher than the LLC domain since we move the task to the LLC of this_cpu away from the LLC of the prev_cpu (This is possible on AMD processors which contains multiple LLC domains within a NUMA node). Having given it some more thought, I am not sure how to interpret this metric for the LLC domain and lower ones, since the eventual target CPU may not even be "closer" to this_cpu. > > But I do understand that when two candidate CPUs are within an LLC, > a) all the fast-path wakeups should be affine wakeups if your definition > of an affine wakeup is a wakeup to the same LLC of the waker. > b) select_idle_sibling() may pick a CPU in that LLC other than the two > candidate CPUs which makes the affine/remote stats here useless even if we > are consistent with ttwu_move_affine/ttwu_wake_remote > definition. Fair enough. > > I personally think it's just too much trouble to add additional code in the > kernel to, let's say, treat all wakeups within an LLC as ttwu_move_affine. > It's a lot easier to do that when you process schedstats data, > whether you want to treat all wakeups in LLC domains as affine wakeups or > ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains. I agree. I think that having your fix is the right thing, since currently the move_affine data in schedstats isn't accurate when wake_affine_idle() or wake_affine_weight() picks the prev_cpu, especially when prev_cpu and this_cpu are in sched-domains higher than the LLC. Thus today we overcount affine wakeups which is incorrect. So, Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> -- Thanks and Regards gautham. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [tip: sched/core] sched/fair: Fix inaccurate tally of ttwu_move_affine 2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen 2022-08-15 11:01 ` Peter Zijlstra 2022-08-25 7:30 ` Gautham R. Shenoy @ 2023-04-06 10:05 ` tip-bot2 for Libo Chen 2 siblings, 0 replies; 10+ messages in thread From: tip-bot2 for Libo Chen @ 2023-04-06 10:05 UTC (permalink / raw) To: linux-tip-commits Cc: Daniel Jordan, Libo Chen, Peter Zijlstra (Intel), Gautham R. Shenoy, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 39afe5d6fc59237ff7738bf3ede5a8856822d59d Gitweb: https://git.kernel.org/tip/39afe5d6fc59237ff7738bf3ede5a8856822d59d Author: Libo Chen <libo.chen@oracle.com> AuthorDate: Wed, 10 Aug 2022 15:33:13 -07:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 05 Apr 2023 09:58:48 +02:00 sched/fair: Fix inaccurate tally of ttwu_move_affine There are scenarios where non-affine wakeups are incorrectly counted as affine wakeups by schedstats. When wake_affine_idle() returns prev_cpu which doesn't equal to nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits in wake_affine() and be counted as if target == this_cpu in schedstats. Replace target == nr_cpumask_bits with target != this_cpu to make sure affine wakeups are accurately tallied. Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle) Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Libo Chen <libo.chen@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/20220810223313.386614-1-libo.chen@oracle.com --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bc358dc..f5da01a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6582,7 +6582,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); schedstat_inc(p->stats.nr_wakeups_affine_attempts); - if (target == nr_cpumask_bits) + if (target != this_cpu) return prev_cpu; schedstat_inc(sd->ttwu_move_affine); ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-04-06 10:06 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen 2022-08-15 11:01 ` Peter Zijlstra 2022-08-15 19:19 ` Libo Chen 2022-08-17 13:20 ` Vincent Guittot 2023-01-09 22:00 ` Libo Chen 2023-03-09 3:17 ` Libo Chen 2022-08-25 7:30 ` Gautham R. Shenoy 2022-08-25 9:13 ` Libo Chen 2023-03-30 10:39 ` Gautham R. Shenoy 2023-04-06 10:05 ` [tip: sched/core] " tip-bot2 for Libo Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox