All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	lee.shermerhorn@hp.com
Subject: Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock
Date: Wed, 19 Dec 2007 10:52:09 -0500	[thread overview]
Message-ID: <1198079529.5333.12.camel@localhost> (raw)
In-Reply-To: <200712191148.06506.nickpiggin@yahoo.com.au>

On Wed, 2007-12-19 at 11:48 +1100, Nick Piggin wrote:
> On Wednesday 19 December 2007 08:15, Rik van Riel wrote:
> > I have seen soft cpu lockups in page_referenced_file() due to
> > contention on i_mmap_lock() for different pages.  Making the
> > i_mmap_lock a reader/writer lock should increase parallelism
> > in vmscan for file back pages mapped into many address spaces.
> >
> > Read lock the i_mmap_lock for all usage except:
> >
> > 1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
> > 2) unmap_mapping_range:   protecting vm_truncate_count
> >
> > rmap:  try_to_unmap_file() required new cond_resched_rwlock().
> > To reduce code duplication, I recast cond_resched_lock() as a
> > [static inline] wrapper around reworked cond_sched_lock() =>
> > __cond_resched_lock(void *lock, int type).
> > New cond_resched_rwlock() implemented as another wrapper.
> 
> Reader/writer locks really suck in terms of fairness and starvation,
> especially when the read-side is common and frequent. (also, single
> threaded performance of the read-side is worse).
> 
> I know Lee saw some big latencies on the anon_vma list lock when
> running (IIRC) a large benchmark... but are there more realistic
> situations where this is a problem?

Yes, we see the stall on the anon_vma lock most frequently running the
AIM benchmark with several tens of thousands of processes--all forked
from the same parent.  If we push the system into reclaim, all cpus end
up spinning on the lock in one of the anon_vma's shared by all the
tasks.  Quite easy to reproduce.  I have also seen this running stress
tests to force reclaim under Dave Anderson's "usex" exerciser--e.g.,
testing the split LRU and noreclaim patches--even with the reader-writer
lock patch. 

I've seen the lockups on the i_mmap_lock running Oracle workloads on our
large servers.  This is running an OLTP workload with only a thousand or
so "clients" all running the same application image.   Again, when the
system attempts to reclaim we end up spinning on the i_mmap_lock of one
of the files [possibly the shared global shmem segment] shared by all
the applications.  I also see it with the usex stress load--also, with
and without this patch.  I think this is a more probably
scenario--thousands of processes sharing a single file, such as
libc.so--than thousands of processes all descended from a single
ancestor w/o exec'ing.

I keep these patches up to date for testing.  I don't have conclusive
evidence whether they alleviate or exacerbate the problem nor by how
much.  

Lee


WARNING: multiple messages have this Message-ID (diff)
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	lee.shermerhorn@hp.com
Subject: Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock
Date: Wed, 19 Dec 2007 10:52:09 -0500	[thread overview]
Message-ID: <1198079529.5333.12.camel@localhost> (raw)
In-Reply-To: <200712191148.06506.nickpiggin@yahoo.com.au>

On Wed, 2007-12-19 at 11:48 +1100, Nick Piggin wrote:
> On Wednesday 19 December 2007 08:15, Rik van Riel wrote:
> > I have seen soft cpu lockups in page_referenced_file() due to
> > contention on i_mmap_lock() for different pages.  Making the
> > i_mmap_lock a reader/writer lock should increase parallelism
> > in vmscan for file back pages mapped into many address spaces.
> >
> > Read lock the i_mmap_lock for all usage except:
> >
> > 1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
> > 2) unmap_mapping_range:   protecting vm_truncate_count
> >
> > rmap:  try_to_unmap_file() required new cond_resched_rwlock().
> > To reduce code duplication, I recast cond_resched_lock() as a
> > [static inline] wrapper around reworked cond_sched_lock() =>
> > __cond_resched_lock(void *lock, int type).
> > New cond_resched_rwlock() implemented as another wrapper.
> 
> Reader/writer locks really suck in terms of fairness and starvation,
> especially when the read-side is common and frequent. (also, single
> threaded performance of the read-side is worse).
> 
> I know Lee saw some big latencies on the anon_vma list lock when
> running (IIRC) a large benchmark... but are there more realistic
> situations where this is a problem?

Yes, we see the stall on the anon_vma lock most frequently running the
AIM benchmark with several tens of thousands of processes--all forked
from the same parent.  If we push the system into reclaim, all cpus end
up spinning on the lock in one of the anon_vma's shared by all the
tasks.  Quite easy to reproduce.  I have also seen this running stress
tests to force reclaim under Dave Anderson's "usex" exerciser--e.g.,
testing the split LRU and noreclaim patches--even with the reader-writer
lock patch. 

I've seen the lockups on the i_mmap_lock running Oracle workloads on our
large servers.  This is running an OLTP workload with only a thousand or
so "clients" all running the same application image.   Again, when the
system attempts to reclaim we end up spinning on the i_mmap_lock of one
of the files [possibly the shared global shmem segment] shared by all
the applications.  I also see it with the usex stress load--also, with
and without this patch.  I think this is a more probably
scenario--thousands of processes sharing a single file, such as
libc.so--than thousands of processes all descended from a single
ancestor w/o exec'ing.

I keep these patches up to date for testing.  I don't have conclusive
evidence whether they alleviate or exacerbate the problem nor by how
much.  

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-12-19 15:51 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-18 21:15 [patch 00/20] VM pageout scalability improvements Rik van Riel
2007-12-18 21:15 ` Rik van Riel
2007-12-18 21:15 ` [patch 01/20] convert anon_vma list lock a read/write lock Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-20  7:07   ` Christoph Lameter
2007-12-20  7:07     ` Christoph Lameter
2007-12-18 21:15 ` [patch 02/20] make the inode i_mmap_lock a reader/writer lock Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-19  0:48   ` Nick Piggin
2007-12-19  0:48     ` Nick Piggin
2007-12-19  4:09     ` KOSAKI Motohiro
2007-12-19  4:09       ` KOSAKI Motohiro
2007-12-19 15:52     ` Lee Schermerhorn [this message]
2007-12-19 15:52       ` Lee Schermerhorn
2007-12-19 16:31       ` Rik van Riel
2007-12-19 16:31         ` Rik van Riel
2007-12-19 16:53         ` Lee Schermerhorn
2007-12-19 16:53           ` Lee Schermerhorn
2007-12-19 19:28           ` Peter Zijlstra
2007-12-19 19:28             ` Peter Zijlstra
2007-12-19 23:40             ` Nick Piggin
2007-12-19 23:40               ` Nick Piggin
2007-12-20  7:04               ` Christoph Lameter
2007-12-20  7:04                 ` Christoph Lameter
2007-12-20  7:59                 ` Nick Piggin
2007-12-20  7:59                   ` Nick Piggin
2008-01-02 23:35                   ` Mike Travis
2008-01-02 23:35                     ` Mike Travis
2008-01-03  6:07                     ` Nick Piggin
2008-01-03  6:07                       ` Nick Piggin
2008-01-03  8:55                       ` Ingo Molnar
2008-01-03  8:55                         ` Ingo Molnar
2008-01-07  9:01                         ` Nick Piggin
2008-01-07  9:01                           ` Nick Piggin
2007-12-18 21:15 ` [patch 03/20] move isolate_lru_page() to vmscan.c Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-20  7:08   ` Christoph Lameter
2007-12-20  7:08     ` Christoph Lameter
2007-12-18 21:15 ` [patch 04/20] free swap space on swap-in/activation Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 05/20] define page_file_cache() function Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 06/20] debugging checks for page_file_cache() Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 07/20] Use an indexed array for LRU variables Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 08/20] split LRU lists into anon & file sets Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 09/20] split anon & file LRUs for memcontrol code Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 10/20] SEQ replacement for anonymous pages Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-19  5:17   ` KOSAKI Motohiro
2007-12-19  5:17     ` KOSAKI Motohiro
2007-12-19 13:40     ` Rik van Riel
2007-12-19 13:40       ` Rik van Riel
2007-12-20  2:04       ` KOSAKI Motohiro
2007-12-20  2:04         ` KOSAKI Motohiro
2007-12-18 21:15 ` [patch 11/20] add newly swapped in pages to the inactive list Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 12/20] No Reclaim LRU Infrastructure Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 13/20] Non-reclaimable page statistics Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 14/20] Scan noreclaim list for reclaimable pages Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 15/20] ramfs pages are non-reclaimable Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 16/20] SHM_LOCKED pages are nonreclaimable Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 17/20] non-reclaimable mlocked pages Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-19  0:56   ` Nick Piggin
2007-12-19  0:56     ` Nick Piggin
2007-12-19 13:45     ` Rik van Riel
2007-12-19 13:45       ` Rik van Riel
2007-12-19 14:24       ` Peter Zijlstra
2007-12-19 14:24         ` Peter Zijlstra
2007-12-19 14:53         ` Rik van Riel
2007-12-19 14:53           ` Rik van Riel
2007-12-19 16:08           ` Lee Schermerhorn
2007-12-19 16:08             ` Lee Schermerhorn
2007-12-19 16:04       ` Lee Schermerhorn
2007-12-19 16:04         ` Lee Schermerhorn
2007-12-20 20:56         ` Rik van Riel
2007-12-20 20:56           ` Rik van Riel
2007-12-21 10:52           ` Nick Piggin
2007-12-21 10:52             ` Nick Piggin
2007-12-21 14:17             ` Rik van Riel
2007-12-21 14:17               ` Rik van Riel
2007-12-23 12:22               ` Nick Piggin
2007-12-24  1:00                 ` Rik van Riel
2007-12-24  1:00                   ` Rik van Riel
2007-12-19 23:34       ` Nick Piggin
2007-12-19 23:34         ` Nick Piggin
2007-12-20  7:19     ` Christoph Lameter
2007-12-20  7:19       ` Christoph Lameter
2007-12-20 15:33       ` Rik van Riel
2007-12-20 15:33         ` Rik van Riel
2007-12-21 17:13         ` Lee Schermerhorn
2007-12-21 17:13           ` Lee Schermerhorn
2007-12-18 21:15 ` [patch 18/20] mlock vma pages under mmap_sem held for read Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 19/20] handle mlocked pages during map/unmap and truncate Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-18 21:15 ` [patch 20/20] account mlocked pages Rik van Riel
2007-12-18 21:15   ` Rik van Riel
2007-12-22 20:27 ` [patch 00/20] VM pageout scalability improvements Balbir Singh
2007-12-22 20:27   ` Balbir Singh
2007-12-23  0:21   ` Rik van Riel
2007-12-23  0:21     ` Rik van Riel
2007-12-23 22:59     ` Balbir Singh
2007-12-23 22:59       ` Balbir Singh
2007-12-24  1:11       ` Rik van Riel
2007-12-24  1:11         ` Rik van Riel
2007-12-28  3:20         ` Matt Mackall
2007-12-28  3:20           ` Matt Mackall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1198079529.5333.12.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=lee.shermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.