stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/26] Performance-related backports for 4.12.2
@ 2017-07-20 21:21 Mel Gorman
  2017-07-20 21:21 ` [PATCH 01/26] sched/topology: Refactor function build_overlap_sched_groups() Mel Gorman
                   ` (26 more replies)
  0 siblings, 27 replies; 31+ messages in thread
From: Mel Gorman @ 2017-07-20 21:21 UTC (permalink / raw)
  To: Linux-Stable; +Cc: Mel Gorman

This is a second round of performance-related backports based on low-hanging
fruit in the 4.13 merge window based on 4.12.2.

As before, these have only been tested on 4.12-stable.  While they may
merge against older kernels, I have no data on how it behaves and cannot
guarantee it's a good idea so I don't recommend it.  There will also be
some major conflicts that are not trivial to resolve.

For most of the tests I conducted, the impact is marginal but patches the
first two sets of patches are important for large machines and for uses
of nohz_full. The load balancing patch is fairly specific but measurable.
The removal of unnecessary IRQ disabling/enabling is borderline in terms of
performance but they are trivial patches and avoiding unnecessary expensive
operations is always a plus.

Patches 1-17 resolve a number of topology problems in the scheduler that
	primarily impact NUMA machines with a ring topology. There are
	more patches in there than necessary but one adds very helpful
	comments on understanding how it works and a few bring the naming of
	functions in line with 4.13 which makes it a bit easier to follow.
	Others shuffle comments around and restructure the code which could
	have been avoided but then the backported patches would not look
	like their upstream equivalent.  While some of the extra patches are
	outside the scope of -stable, it removes the delta when comparing
	the 4.12-stable and 4.13 scheduler but I can drop them if necessary.

	Performance impact on UMA and fully-connected machines is marginal
	with minor gains/losses across multiple machines that is mostly
	within the noise but other reports indicate that the impact on
	ring topologies is substantial. In particular, the full machine
	will be properly utilised instead of saturating a subset of nodes
	for workloads with lots of threads or processes.

Patches 18-22 are more about accounting than performance. The bug is with
	workloads running on nohz_full+isolcpus configurations. If 2 or more
	processes are running on an isolated CPU are 100% userspace bound
	and normal processes are running on other CPUs then the isolated
	processes report a mix of userspace and system CPU usage.  It can
	be up to 100% system CPU usage even though in reality there is no
	time being spent in the kernel. This misaccounting is confusing
	when analysing workloads.

	For normal workloads, there is no measurable difference.

Patch 23 fixes a scheduler load balancing issue where an imbalanced domain
	is considered balanced when some tasks are pinned for affinity.
	Again, for many workloads the impact is marginal but it was a small
	boost (1-2% barely outside noise) for a specjbb configuration
	that pinned JVMs. It may be co-incidence but the patch is
	straight-forward.

Patches 24-25 avoid unnecessary IRQ disable/enable while updating writeback
	stats. In many cases this will not be noticable because it happens
	out-of-band and the cost of stats updates are often negligible
	compared to the overall cost of writeback. However, unnecessary
	IRQ disabling is never a good thing and it may be noticable during
	writeback to ultra-fast storage.

Patches 26 avoids an IRQ disable/enable in the fork path. It's noticable
	on fork-intensive workloads with a 1-3% boost on hackbench for
	example that is just outside the noise.

-- 
2.13.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-07-25 15:21 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-20 21:21 [PATCH 00/26] Performance-related backports for 4.12.2 Mel Gorman
2017-07-20 21:21 ` [PATCH 01/26] sched/topology: Refactor function build_overlap_sched_groups() Mel Gorman
2017-07-20 21:21 ` [PATCH 02/26] sched/topology: Fix building of overlapping sched-groups Mel Gorman
2017-07-20 21:21 ` [PATCH 03/26] sched/topology: Simplify build_overlap_sched_groups() Mel Gorman
2017-07-20 21:21 ` [PATCH 04/26] sched/debug: Print the scheduler topology group mask Mel Gorman
2017-07-20 21:21 ` [PATCH 05/26] sched/topology: Verify the first group matches the child domain Mel Gorman
2017-07-20 21:21 ` [PATCH 06/26] sched/topology: Optimize build_group_mask() Mel Gorman
2017-07-20 21:21 ` [PATCH 07/26] sched/topology: Move comment about asymmetric node setups Mel Gorman
2017-07-20 21:21 ` [PATCH 08/26] sched/topology: Remove FORCE_SD_OVERLAP Mel Gorman
2017-07-20 21:21 ` [PATCH 09/26] sched/topology: Fix overlapping sched_group_mask Mel Gorman
2017-07-20 21:21 ` [PATCH 10/26] sched/topology: Small cleanup Mel Gorman
2017-07-20 21:21 ` [PATCH 11/26] sched/topology: Add sched_group_capacity debugging Mel Gorman
2017-07-20 21:21 ` [PATCH 12/26] sched/topology: Fix overlapping sched_group_capacity Mel Gorman
2017-07-20 21:21 ` [PATCH 13/26] sched/topology: Add a few comments Mel Gorman
2017-07-20 21:21 ` [PATCH 14/26] sched/topology: Rewrite get_group() Mel Gorman
2017-07-20 21:21 ` [PATCH 15/26] sched/topology: Simplify sched_group_mask() usage Mel Gorman
2017-07-20 21:21 ` [PATCH 16/26] sched/topology: Rename sched_group_mask() Mel Gorman
2017-07-20 21:21 ` [PATCH 17/26] sched/topology: Rename sched_group_cpus() Mel Gorman
2017-07-20 21:21 ` [PATCH 18/26] vtime, sched/cputime: Remove vtime_account_user() Mel Gorman
2017-07-20 21:21 ` [PATCH 19/26] sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime Mel Gorman
2017-07-20 21:21 ` [PATCH 20/26] sched/cputime: Rename vtime fields Mel Gorman
2017-07-20 21:21 ` [PATCH 21/26] sched/cputime: Move the vtime task fields to their own struct Mel Gorman
2017-07-20 21:21 ` [PATCH 22/26] sched/cputime: Accumulate vtime on top of nsec clocksource Mel Gorman
2017-07-20 21:21 ` [PATCH 23/26] sched/fair: Fix load_balance() affinity redo path Mel Gorman
2017-07-20 21:21 ` [PATCH 24/26] percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch Mel Gorman
2017-07-20 21:21 ` [PATCH 25/26] writeback: rework wb_[dec|inc]_stat family of functions Mel Gorman
2017-07-20 21:21 ` [PATCH 26/26] kernel/fork.c: virtually mapped stacks: do not disable interrupts Mel Gorman
2017-07-24 16:44 ` [PATCH 00/26] Performance-related backports for 4.12.2 Mel Gorman
2017-07-24 23:29   ` Greg KH
2017-07-25  8:14     ` Mel Gorman
2017-07-25 15:21       ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).