public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Performance-related backports for 4.12
@ 2017-07-10 12:37 Mel Gorman
  2017-07-10 12:37 ` [PATCH 1/9] x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings Mel Gorman
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Mel Gorman @ 2017-07-10 12:37 UTC (permalink / raw)
  To: Linux-Stable; +Cc: Mel Gorman

The 4.12 release was large but there was a number of important
performance-related patches that are relatively low-hanging fruit. There
are other patches but data is still being collected. This is a collection
that have only been tested on 4.12 and while they may merge against older
kernels, I have no data on how it behaves and cannot guarantee it's a good
idea so I don't recommend it.

Patch 1 is an x86 microoptimisation for processors with ERMS. The improvement
	is marginal with effects often within the noise but it's a small
	boost on syscall-intensive workloads that move a lot of data
	to userspace.

Patches 2-3 reworks select_idle_cpu, particularly around idle scanning to
	use a limited scan instead of a complete cut-off. The boost for
	hackbench is variable with an old machine with limited CPUs only
	getting a 3-4% boost while a larger 2-socket machine with 48 cores
	saw a 7-20% boost for low thread counts and no difference when
	the machine was saturated. Other workloads that are not as
	wakeup intensive barely notice which is to be expected.

Patch 4 addresses a soft lockup that was detected on a memory-intensive
	workload with large numbers of threads and NUMA balancing
	implemented. While I personally cannot verify the fix as the
	workload in question is not available, I know it was confirmed
	to work by a user.

Patches 5-9 addresses a number of problems with automatic NUMA balancing.
	While the patch author said that there was a big boost on specjbb and
	NAS, this was on a 4-socket machine in a ring topology and I don't
	have access to a similar machine. However, on a 2-socket machine,
	there was a 5% boost to specjbb 2005 when running a single JVM
	and a 1-2% boost when using multiple JVMs. There was little or no
	difference to NAS on the same machine but this may be due to the
	fact it's a 2-socket machine and a relatively short-lived workload.
	It's also known to boost hackbench on some machines by roughly 20%.

-- 
2.13.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-07-10 15:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-10 12:37 [PATCH 0/9] Performance-related backports for 4.12 Mel Gorman
2017-07-10 12:37 ` [PATCH 1/9] x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings Mel Gorman
2017-07-10 12:37 ` [PATCH 2/9] sched/fair, cpumask: Export for_each_cpu_wrap() Mel Gorman
2017-07-10 12:37 ` [PATCH 3/9] sched/core: Implement new approach to scale select_idle_cpu() Mel Gorman
2017-07-10 12:37 ` [PATCH 4/9] sched/numa: Use down_read_trylock() for the mmap_sem Mel Gorman
2017-07-10 12:37 ` [PATCH 5/9] sched/numa: Override part of migrate_degrades_locality() when idle balancing Mel Gorman
2017-07-10 12:37 ` [PATCH 6/9] sched/fair: Simplify wake_affine() for the single socket case Mel Gorman
2017-07-10 12:37 ` [PATCH 7/9] sched/numa: Implement NUMA node level wake_affine() Mel Gorman
2017-07-10 12:37 ` [PATCH 8/9] sched/fair: Remove effective_load() Mel Gorman
2017-07-10 12:37 ` [PATCH 9/9] sched/numa: Hide numa_wake_affine() from UP build Mel Gorman
2017-07-10 15:25 ` [PATCH 0/9] Performance-related backports for 4.12 Greg KH
2017-07-10 15:32   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox