public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org,
	stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Eric Whitney <eric.whitney@hp.com>,
	"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Subject: [patch 33/30] mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()
Date: Fri, 2 Oct 2009 10:21:57 -0700	[thread overview]
Message-ID: <20091002172157.GB12576@kroah.com> (raw)
In-Reply-To: <20091001233504.GA17709@kroah.com>


2.6.30-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

commit 252c5f94d944487e9f50ece7942b0fbf659c5c31 upstream.

We noticed very erratic behavior [throughput] with the AIM7 shared
workload running on recent distro [SLES11] and mainline kernels on an
8-socket, 32-core, 256GB x86_64 platform.  On the SLES11 kernel
[2.6.27.19+] with Barcelona processors, as we increased the load [10s of
thousands of tasks], the throughput would vary between two "plateaus"--one
at ~65K jobs per minute and one at ~130K jpm.  The simple patch below
causes the results to smooth out at the ~130k plateau.

But wait, there's more:

We do not see this behavior on smaller platforms--e.g., 4 socket/8 core.
This could be the result of the larger number of cpus on the larger
platform--a scalability issue--or it could be the result of the larger
number of interconnect "hops" between some nodes in this platform and how
the tasks for a given load end up distributed over the nodes' cpus and
memories--a stochastic NUMA effect.

The variability in the results are less pronounced [on the same platform]
with Shanghai processors and with mainline kernels.  With 31-rc6 on
Shanghai processors and 288 file systems on 288 fibre attached storage
volumes, the curves [jpm vs load] are both quite flat with the patched
kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
jpm] than the unpatched kernel.

Profiling indicated that the "slow" runs were incurring high[er]
contention on an anon_vma lock in vma_adjust(), apparently called from the
sbrk() system call.

The patch:

A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
anon_vma lock when we're only adjusting the end of a vma, as is the case
for brk().  The comment questions whether it's worth while to optimize for
this case.  Apparently, on the newer, larger x86_64 platforms, with
interesting NUMA topologies, it is worth while--especially considering
that the patch [if correct!] is quite simple.

We can detect this condition--no overlap with next vma--by noting a NULL
"importer".  The anon_vma pointer will also be NULL in this case, so
simply avoid loading vma->anon_vma to avoid the lock.

However, we DO need to take the anon_vma lock when we're inserting a vma
['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
need to check for 'insert', as well.  And Hugh points out that we should
also take it when adjusting vm_start (so that rmap.c can rely upon
vma_address() while it holds the anon_vma lock).

akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it
might be -stable material even though thiss isn't a regression: "this
issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is
severe on large machine (4*8*2 cpu)"

[hugh.dickins@tiscali.co.uk: test vma start too]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Eric Whitney <eric.whitney@hp.com>
Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/mmap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -572,9 +572,9 @@ again:			remove_next = 1 + (end > next->
 
 	/*
 	 * When changing only vma->vm_end, we don't really need
-	 * anon_vma lock: but is that case worth optimizing out?
+	 * anon_vma lock.
 	 */
-	if (vma->anon_vma)
+	if (vma->anon_vma && (insert || importer || start != vma->vm_start))
 		anon_vma = vma->anon_vma;
 	if (anon_vma) {
 		spin_lock(&anon_vma->lock);

  parent reply	other threads:[~2009-10-02 17:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20091001233116.947658905@mini.kroah.org>
2009-10-01 23:35 ` [patch 00/30] 2.6.30.9-stable review Greg KH
2009-10-01 23:31   ` [patch 01/30] ACPI: pci_slot.ko wants a 64-bit _SUN Greg KH
2009-10-01 23:31   ` [patch 02/30] fs: make sure data stored into inode is properly seen before unlocking new inode Greg KH
2009-10-01 23:31   ` [patch 03/30] kallsyms: fix segfault in prefix_underscores_count() Greg KH
2009-10-01 23:31   ` [patch 04/30] nilfs2: fix missing zero-fill initialization of btree node cache Greg KH
2009-10-01 23:31   ` [patch 05/30] p54usb: add Zcomax XG-705A usbid Greg KH
2009-10-01 23:31   ` [patch 06/30] [CIFS] Re-enable Lanman security Greg KH
2009-10-01 23:31   ` [patch 07/30] KVM: VMX: Check cpl before emulating debug register access Greg KH
2009-10-01 23:31   ` [patch 08/30] KVM: VMX: Fix cr8 exiting control clobbering by EPT Greg KH
2009-10-01 23:31   ` [patch 09/30] KVM: MMU: make __kvm_mmu_free_some_pages handle empty list Greg KH
2009-10-01 23:31   ` [patch 10/30] KVM: x86: Disallow hypercalls for guest callers in rings > 0 Greg KH
2009-10-01 23:31   ` [patch 11/30] KVM: MMU: fix missing locking in alloc_mmu_pages Greg KH
2009-10-01 23:31   ` [patch 12/30] KVM: MMU: fix bogus alloc_mmu_pages assignment Greg KH
2009-10-01 23:31   ` [patch 13/30] KVM: limit lapic periodic timer frequency Greg KH
2009-10-01 23:31   ` [patch 14/30] KVM guest: fix bogus wallclock physical address calculation Greg KH
2009-10-01 23:31   ` [patch 15/30] KVM: fix cpuid E2BIG handling for extended request types Greg KH
2009-10-01 23:31   ` [patch 16/30] Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs" Greg KH
2009-10-01 23:31   ` [patch 17/30] ahci: restore pci_intx() handling Greg KH
2009-10-01 23:31   ` [patch 18/30] net ax25: Fix signed comparison in the sockopt handler Greg KH
2009-10-01 23:31   ` [patch 19/30] net: Make the copy length in af_packet sockopt handler unsigned Greg KH
2009-10-01 23:31   ` [patch 20/30] [CPUFREQ] Fix NULL ptr regression in powernow-k8 Greg KH
2009-10-01 23:31   ` [patch 21/30] netfilter: bridge: refcount fix Greg KH
2009-10-01 23:31   ` [patch 22/30] netfilter: ebt_ulog: fix checkentry return value Greg KH
2009-10-01 23:31   ` [patch 23/30] netfilter: nf_nat: fix inverted logic for persistent NAT mappings Greg KH
2009-10-01 23:31   ` [patch 24/30] Fix idle time field in /proc/uptime Greg KH
2009-10-01 23:31   ` [patch 25/30] hugetlb: restore interleaving of bootmem huge pages (2.6.31) Greg KH
2009-10-01 23:31   ` [patch 26/30] powerpc/8xx: Fix regression introduced by cache coherency rewrite Greg KH
2009-10-01 23:31   ` [patch 27/30] powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL Greg KH
2009-10-01 23:31   ` [patch 28/30] /proc/kcore: work around a BUG() Greg KH
2009-10-01 23:31   ` [patch 29/30] PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend() Greg KH
2009-10-01 23:31   ` [patch 30/30] PM / yenta: Fix cardbus suspend/resume regression Greg KH
2009-10-02  2:43   ` [patch 00/30] 2.6.30.9-stable review Henrique de Moraes Holschuh
2009-10-02 14:20     ` [stable] " Greg KH
2009-10-03 14:39       ` Henrique de Moraes Holschuh
2009-10-02 16:42   ` [31/30] thinkpad-acpi: fix incorrect use of TPACPI_BRGHT_MODE_ECNVRAM Greg KH
2009-10-02 17:20   ` [patch 32/30] mm: fix anonymous dirtying Greg KH
2009-10-02 17:21   ` Greg KH [this message]
2009-10-02 17:23   ` [patch 00/30] 2.6.30.9-stable review Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091002172157.GB12576@kroah.com \
    --to=gregkh@suse.de \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=eric.whitney@hp.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox