Re: [peterz-queue:sched/eevdf] [sched/fair] 23669fce72: aim7.jobs-per-min -18.6% regression

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Chen Yu <yu.c.chen@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
	Oliver Sang <oliver.sang@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>, Ingo Molnar <mingo@kernel.org>
Subject: Re: [peterz-queue:sched/eevdf] [sched/fair]  23669fce72: aim7.jobs-per-min -18.6% regression
Date: Sun, 26 Mar 2023 13:00:24 +0200	[thread overview]
Message-ID: <20230326110024.GA2990748@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <ZBxEuboumEifedjG@chenyu5-mobl1>

On Thu, Mar 23, 2023 at 08:23:21PM +0800, Chen Yu wrote:

>     stress-ng (throughput, higher is better)
>     ==============================================================================
>     case                    nr_instance            baseline(std%)  compare%( std%)

>     futex                   25%                      1.00 (<2%)    -3.2 (<2%)
>     futex                   50%                      1.00 (3%)     -19.9 (5%)
>     futex                   75%                      1.00 (6%)     -19.1 (2%)
>     futex                   100%                     1.00 (16%)    -30.5 (10%)
>     futex                   125%                     1.00 (25%)    -39.3 (11%)
>     futex                   150%                     1.00 (20%)    -27.2% (17%)
>     futex                   175%                     1.00 (<2%)    -18.6 (<2%)
>     futex                   200%                     1.00 (<2%)    -47.5 (<2%)

>     It seems that when the load increases, there would be regression in "switch" and
>     "futex" case. In the futex case, the regression seems to be caused by fewer context
>     switch. The stress-ng futex would create a lot of 1:1 futex_wait/futex_wake pairs.
>     And it seems that with the patch applied, there are more wakeup, but less successful
>     wakeup. It is possible that the wakers are stacked on 1 CPU which delay the
>     wakeup.
> 
>     For example, more wakeup attempts:
> 
>     49.27 ±  4%     +13.4       62.63        perf-profile.calltrace.cycles-pp.futex_wake.do_futex
> 
>     However less successful wakeups(context switch):
> 
>     852533 ± 18%        -35.0%     553996 ±  9%  sched_debug.cpu.nr_switches.avg
>     1.01e+08 ± 24%      -36.2%   64471512 ±  9%  stress-ng.time.involuntary_context_switches
>     1.271e+08 ± 15%     -34.0%   83868905 ±  8%  stress-ng.time.voluntary_context_switches
> 
>     BTW, I thought this is a use case for short task wakeup placement. Waking
>     up the short task on current CPU when the system is overloaded might mitigate
>     this issue.

There's only a few hundred migrations in this workload at 100%,
placement is not an issue (nor should it be at that point).

What does seem to be the issue is sleeper bonus. The way this benchmark
is constructed (see stress-futex.c) is:

parent:

	do {
		futex_wake();
	} while (keep_stressing());

child:

	do {
		futex_wait();
		inc_counter();
	} while (keep_stressing());

That is, the parent is always running, while the child is blocking.
Consider the parent 100% running and the child 50%, then a truely fair
scheduler will make it 67% vs 33% runtime -- this is what EEVDF does
now. And as you can see, since the child gets less runtime, the counter
increases less and the benchmark drops.

CFS has sleeper bonus, which gives (short) blocking tasks a small
advantage to make it 50% vs 50%. And if you compute the drop from 50% to
33% then you get -33% and that's exactly the drop you see around the
100% case.

tip/sched/urgent:

root@ivb-ep:~# stress-ng --futex 40 -t 5 --metrics 2>&1 | awk '{ if ($4 == "futex") print $0 }'
stress-ng: info:  [1825] futex           9834434      5.00      5.56    193.94   1966762.43       49294.78        99.74          2288

sched/eevdf + place_bonus (based on tip/sched/urgent -- will push out
later today)

root@ivb-ep:~# echo NO_PLACE_BONUS > /debug/sched/features ; stress-ng --futex 40 -t 5 --metrics 2>&1 | awk '{ if ($4 == "futex") print $0 }'
stress-ng: info:  [2373] futex           6541589      5.00      4.28    194.83   1308211.07       32854.97        99.54          2288
root@ivb-ep:~# echo PLACE_BONUS > /debug/sched/features ; stress-ng --futex 40 -t 5 --metrics 2>&1 | awk '{ if ($4 == "futex") print $0 }'
stress-ng: info:  [2537] futex           9745715      5.00      5.38    194.55   1948945.01       48745.49        99.96          2288


The whole sleeper bonus is something that's fairly controversial but it
was needed in CFS to make some 'starvation' cases go away -- the lag
based placement cures them too. And given the whole (recent) trainwreck with:

  829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
  a53ce18cacb4 ("sched/fair: Sanitize vruntime of entity being migrated")

I'm happy to delete all that. Still, let me think a little, perhaps I
can come up with something slightly less horrible than all that which we
can default-disable for now...

next prev parent reply	other threads:[~2023-03-26 11:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20  7:46 [peterz-queue:sched/eevdf] [sched/fair] 23669fce72: aim7.jobs-per-min -18.6% regression kernel test robot
2023-03-20  7:58 ` Peter Zijlstra
2023-03-21  7:46   ` Oliver Sang
2023-03-21  8:04     ` Chen Yu
2023-03-21  9:03       ` Peter Zijlstra
2023-03-23 12:23         ` Chen Yu
2023-03-23 15:30           ` Peter Zijlstra
2023-03-26 11:00           ` Peter Zijlstra [this message]
2023-03-26 13:38             ` Peter Zijlstra
2023-03-27 13:39               ` Chen Yu
2023-03-27 15:18                 ` Peter Zijlstra
2023-03-27 13:51             ` Chen Yu
2023-03-27 15:30               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230326110024.GA2990748@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=lkp@intel.com \
    --cc=mingo@kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yu.chen.surf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.