schedstat false counting of domain load_balance() tried to move one or more tasks failed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* schedstat false counting of domain load_balance() tried to move one or more tasks failed
@ 2022-07-13  1:52 Steven Rostedt
  2022-07-18  9:51 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2022-07-13  1:52 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Linus Torvalds

I've been tasked to analyze the /proc/schedstat file to determine
appropriate metrics to look after in production. So I'm looking at both the
documentation and the code that generates it.

From the documentation at https://docs.kernel.org/scheduler/sched-stats.html

(and Documentation/scheduler/sched-stats.rst for those of you that are
allergic to html)

   Domain statistics
   -----------------
   One of these is produced per domain for each cpu described. (Note that if
   CONFIG_SMP is not defined, *no* domains are utilized and these lines
   will not appear in the output.)

   domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

   The first field is a bit mask indicating what cpus this domain operates over.

   The next 24 are a variety of load_balance() statistics in grouped into types
   of idleness (idle, busy, and newly idle):

       1)  # of times in this domain load_balance() was called when the
           cpu was idle
       2)  # of times in this domain load_balance() checked but found
           the load did not require balancing when the cpu was idle
       3)  # of times in this domain load_balance() tried to move one or
           more tasks and failed, when the cpu was idle

I was looking at this #3 (which is also #11 and #19 for CPU_BUSY and
CPU_NEW_IDLE respectively). It states that it gets incremented when one or
more tasks were tried to be moved but failed. I found this is not always
the case.

We have:

	ld_moved = 0;
	/* Clear this flag as soon as we find a pullable task */
	env.flags |= LBF_ALL_PINNED;
	if (busiest->nr_running > 1) {
[..]
	}

	if (!ld_moved) {
		schedstat_inc(sd->lb_failed[idle]);

Where the lb_failed[] is that counter. I added the following code:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 77b2048a9326..4835ea4d9d01 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9865,6 +9865,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	struct rq *busiest;
 	struct rq_flags rf;
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(load_balance_mask);
+	bool redo = false;
 
 	struct lb_env env = {
 		.sd		= sd,
@@ -10012,11 +10013,13 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 			if (!cpumask_subset(cpus, env.dst_grpmask)) {
 				env.loop = 0;
 				env.loop_break = sched_nr_migrate_break;
+				redo = true;
 				goto redo;
 			}
 			goto out_all_pinned;
 		}
-	}
+	} else if (!redo)
+		trace_printk("Did not try to move! %d\n", idle);
 
 	if (!ld_moved) {
 		schedstat_inc(sd->lb_failed[idle]);


And sure enough that triggers on CPU_IDLE and CPU_NEW_IDLE calls (I haven't
seen it for CPU_BUSY yet, but I didn't try).

Thus, if we get to that check for (busiest->nr_running > 1) and fail, then
we will increment that counter incorrectly.

Do we care? Should it be fixed? Should it be documented?

Thoughts?

-- Steve


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: schedstat false counting of domain load_balance() tried to move one or more tasks failed
  2022-07-13  1:52 schedstat false counting of domain load_balance() tried to move one or more tasks failed Steven Rostedt
@ 2022-07-18  9:51 ` Peter Zijlstra
  2022-07-18 16:42   ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2022-07-18  9:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Linus Torvalds

On Tue, Jul 12, 2022 at 09:52:59PM -0400, Steven Rostedt wrote:
> I've been tasked to analyze the /proc/schedstat file to determine
> appropriate metrics to look after in production. So I'm looking at both the
> documentation and the code that generates it.
> 
> From the documentation at https://docs.kernel.org/scheduler/sched-stats.html
> 
> (and Documentation/scheduler/sched-stats.rst for those of you that are
> allergic to html)

I'm allergic to both, it's plain text or bust.

>        3)  # of times in this domain load_balance() tried to move one or
>            more tasks and failed, when the cpu was idle

> Thus, if we get to that check for (busiest->nr_running > 1) and fail, then
> we will increment that counter incorrectly.
> 
> Do we care? Should it be fixed? Should it be documented?

*shrug*, I suppose we can fix. People using this stuff are the sort that
are likely to read documentation instead of code.

At the same time; I suspect it's been 'broken' like forever, so who
knows what people are actually expecting today.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: schedstat false counting of domain load_balance() tried to move one or more tasks failed
  2022-07-18  9:51 ` Peter Zijlstra
@ 2022-07-18 16:42   ` Steven Rostedt
  2022-07-22  3:51     ` Abel Wu
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2022-07-18 16:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Linus Torvalds

On Mon, 18 Jul 2022 11:51:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > Do we care? Should it be fixed? Should it be documented?  
> 
> *shrug*, I suppose we can fix. People using this stuff are the sort that
> are likely to read documentation instead of code.

Yep.

> 
> At the same time; I suspect it's been 'broken' like forever, so who
> knows what people are actually expecting today.

As you stated, it's used by people that read documentation more than the
code. My expectation is that they are making wrong decisions because what
they expect those numbers to mean are not what is actually happening.

I think it's better to make the functionality match the documentation, and
if people complain, we can ask them what exactly they expected and why. And
perhaps they might be complaining that a benchmark isn't behaving as
expected because they were interpreting the results incorrectly.

I'll go write up a fix.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: schedstat false counting of domain load_balance() tried to move one or more tasks failed
  2022-07-18 16:42   ` Steven Rostedt
@ 2022-07-22  3:51     ` Abel Wu
  0 siblings, 0 replies; 4+ messages in thread
From: Abel Wu @ 2022-07-22  3:51 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Linus Torvalds


On 7/19/22 12:42 AM, Steven Rostedt Wrote:
> On Mon, 18 Jul 2022 11:51:26 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>> Do we care? Should it be fixed? Should it be documented?
>>
>> *shrug*, I suppose we can fix. People using this stuff are the sort that
>> are likely to read documentation instead of code.
> 
> Yep.
> 
>>
>> At the same time; I suspect it's been 'broken' like forever, so who
>> knows what people are actually expecting today.
> 
> As you stated, it's used by people that read documentation more than the
> code. My expectation is that they are making wrong decisions because what
> they expect those numbers to mean are not what is actually happening.
> 
> I think it's better to make the functionality match the documentation, and
> if people complain, we can ask them what exactly they expected and why. And
> perhaps they might be complaining that a benchmark isn't behaving as
> expected because they were interpreting the results incorrectly.
> 
> I'll go write up a fix.

Hi Steven, Peter,

There are also some other fields broken I think, such as the imbalance
calculation. It is changed after Vincent's reworking load_balance().
Just for the record if you are planning on a fix. :)

Best regards,
Abel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-22  3:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-13  1:52 schedstat false counting of domain load_balance() tried to move one or more tasks failed Steven Rostedt
2022-07-18  9:51 ` Peter Zijlstra
2022-07-18 16:42   ` Steven Rostedt
2022-07-22  3:51     ` Abel Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox