All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] sched/fair: NOHZ cleanups and misfit improvement
@ 2019-02-11 17:59 Valentin Schneider
  2019-02-11 17:59 ` [PATCH v2 1/3] sched/fair: Comment some nohz_balancer_kick() kick conditions Valentin Schneider
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-02-11 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen,
	Dietmar.Eggemann

In

  commit 5fbdfae5221a ("sched/fair: Kick nohz balance if rq->misfit_task_load")

was added a trigger for nohz kicks, which is required to offload misfit tasks
from LITTLE to big CPUs. However, those kicks could be issued a lot more
frequently than what is strictly needed.

This patch-set tunes down unneeded nohz kicks.

- Patch 1 adds some more comments to nohz_balancer_kick()
- Patches [2-3] tweak the nohz kick conditions for asymmetric systems

* Changes since v1

  - Patches 1-3 from v1 are in tip/sched/core and thus not included
    tip HEAD is 1b5500d73466 ("sched/fair: Remove unused 'sd' parameter from select_idle_smt()")
  - Patch 1 from v2 is new (Peter)
  - Patch 3 from v2 (5 from v1) now shuffles conditions to avoid a goto (Peter)

* nohz_balancer_kick() shuffling impact

  The ASYM_PACKING loop used to be towards the end of nohz_balancer_kick(),
  and the LLC condition was higher up. Since the LLC condition is very
  often true, we probably were avoiding the loop most of the time on systems
  that use ASYM_PACKING. However, I don't have one at hand and I'm not sure
  hacking up a kernel to enable ASYM_PACKING on a system that doesn't need it
  would be truly relevant.

  I ran 20 iterations of

    'hackbench -g 1 -l 100000'

  on a 2-sockets Xeon E5 (40 logical cores, no ASYM_PACKING) but the difference
  (hackbench duration & nohz_balancer_kick() FTrace profiling) lies in the noise.

--------------------------------------------------------------------------------
* Testing
** kick_ilb() hits
  This causes a large reduction in calls to kick_ilb() (and thus subsequent
  rescheduling interrupts & useless nohz balance calls) in most scenarios.

  The "best case" one is running NR_BIG_CPUS big tasks, which I tested with
  4 50% periodic tasks running for 5 seconds on my HiKey960 (4x4 big.LITTLE):

  | CPU | hits (baseline) | hits (patchset) |
  |-----+-----------------+-----------------|
  |   0 |              31 |              41 |
  |   1 |              21 |               3 |
  |   2 |              35 |               2 |
  |   3 |               9 |               4 |
  |-----+-----------------+-----------------|
  |   4 |             170 |               4 |
  |   5 |             573 |               4 |
  |   6 |             544 |               4 |
  |   7 |             579 |               4 |


  Something a bit less idealistic with NR_CPUS-1 big tasks still shows some
  improvements (7 100% tasks running for 5 seconds on my HiKey960):

  | CPU | hits (baseline) | hits (patchset) |
  |-----+-----------------+-----------------|
  |   0 |              14 |             122 |
  |   1 |              47 |             162 |
  |   2 |              11 |             156 |
  |   3 |               9 |               3 |
  |-----+-----------------+-----------------|
  |   4 |              53 |               6 |
  |   5 |             276 |              13 |
  |   6 |             312 |               7 |
  |   7 |             250 |              11 |

  I was surprised to see such an increase in calls to kick_ilb() from LITTLE
  CPUs ([0-3]), but after a bit of investigation it turns out that the big
  CPUs would always run nohz_balancer_kick() a jiffy before the LITTLEs, so
  the LITTLEs would always bail out because nohz.next_balance had just been
  updated before they called nohz_balancer_kick(). IOW,

      time_before(now, nohz.next_balance)

  would always be true on CPUs [0-3] during my workload. Quieting the kicks
  issued by the big CPUs allowed the LITTLEs to execute nohz_balancer_kick()
  past that condition, explaining the higher number of kicks issued from LITTLE
  CPUs.
  
** misfit behaviour
  For good measure I also ran the usual misfit tests [1] which showed no
  particular change.

[1]: https://github.com/ARM-software/lisa/blob/next/lisa/tests/kernel/scheduler/misfit.py

Valentin Schneider (3):
  sched/fair: Comment some nohz_balancer_kick() kick conditions
  sched/fair: Tune down misfit nohz kicks
  sched/fair: Skip LLC nohz logic for asymmetric systems

 kernel/sched/fair.c | 84 +++++++++++++++++++++++++++++++++------------
 1 file changed, 63 insertions(+), 21 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-03-19 11:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-11 17:59 [PATCH v2 0/3] sched/fair: NOHZ cleanups and misfit improvement Valentin Schneider
2019-02-11 17:59 ` [PATCH v2 1/3] sched/fair: Comment some nohz_balancer_kick() kick conditions Valentin Schneider
2019-03-09 14:36   ` [tip:sched/urgent] " tip-bot for Valentin Schneider
2019-03-19 11:12   ` tip-bot for Valentin Schneider
2019-02-11 17:59 ` [PATCH v2 2/3] sched/fair: Tune down misfit nohz kicks Valentin Schneider
2019-03-09 14:37   ` [tip:sched/urgent] sched/fair: Tune down misfit NOHZ kicks tip-bot for Valentin Schneider
2019-03-19 11:13   ` tip-bot for Valentin Schneider
2019-02-11 17:59 ` [PATCH v2 3/3] sched/fair: Skip LLC nohz logic for asymmetric systems Valentin Schneider
2019-03-09 14:38   ` [tip:sched/urgent] sched/fair: Skip LLC NOHZ " tip-bot for Valentin Schneider
2019-03-19 11:13   ` tip-bot for Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.