From: Rik van Riel <riel@redhat.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH -v3] take all anon_vma locks in anon_vma_lock
Date: Wed, 28 Apr 2010 22:10:13 -0400 [thread overview]
Message-ID: <4BD8EA85.2000209@redhat.com> (raw)
In-Reply-To: <y2s28c262361004281728we31e3b9fsd2427aacdc76a9e7@mail.gmail.com>
On 04/28/2010 08:28 PM, Minchan Kim wrote:
> On Thu, Apr 29, 2010 at 5:57 AM, Rik van Riel<riel@redhat.com> wrote:
>> Take all the locks for all the anon_vmas in anon_vma_lock, this properly
>> excludes migration and the transparent hugepage code from VMA changes done
>> by mmap/munmap/mprotect/expand_stack/etc...
>>
>> Unfortunately, this requires adding a new lock (mm->anon_vma_chain_lock),
>> otherwise we have an unavoidable lock ordering conflict. This changes the
>> locking rules for the "same_vma" list to be either mm->mmap_sem for write,
>> or mm->mmap_sem for read plus the new mm->anon_vma_chain lock. This limits
>> the place where the new lock is taken to 2 locations - anon_vma_prepare and
>> expand_downwards.
>>
>> Document the locking rules for the same_vma list in the anon_vma_chain and
>> remove the anon_vma_lock call from expand_upwards, which does not need it.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
> This patch makes things simple. So I like this.
> Actually, I wanted this all-at-once locks approach.
> But I was worried about that how the patch affects AIM 7 workload
> which is cause of anon_vma_chain about scalability by Rik.
> But now Rik himself is sending the patch. So I assume the patch
> couldn't decrease scalability of the workload heavily.
The thing is, the number of anon_vmas attached to a VMA is
small (depth of the tree, so for apache or aim the typical
depth is 2). This N is between 1 and 3.
The problem we had originally is the _width_ of the tree,
where every sibling process was attached to the same anon_vma
and the rmap code had to walk the page tables of all the
processes, for every privately owned page in each child process.
For large server workloads, this N is between a few hundred and
a few thousand.
What matters most at this point is correctness - we need to be
able to exclude rmap walks when messing with a VMA in any way
that breaks lookups, because rmap walks for page migration and
hugepage conversion have to be 100% reliable.
That is not a constraint I had in mind with the original
anon_vma changes, so the code needs to be fixed up now...
I suspect that taking one or two extra spinlocks in the code
paths changed by this patch (mmap/munmap/...) is going to make
a difference at all, since all of those paths are pretty
infrequently taken.
--
All rights reversed
WARNING: multiple messages have this Message-ID (diff)
From: Rik van Riel <riel@redhat.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH -v3] take all anon_vma locks in anon_vma_lock
Date: Wed, 28 Apr 2010 22:10:13 -0400 [thread overview]
Message-ID: <4BD8EA85.2000209@redhat.com> (raw)
In-Reply-To: <y2s28c262361004281728we31e3b9fsd2427aacdc76a9e7@mail.gmail.com>
On 04/28/2010 08:28 PM, Minchan Kim wrote:
> On Thu, Apr 29, 2010 at 5:57 AM, Rik van Riel<riel@redhat.com> wrote:
>> Take all the locks for all the anon_vmas in anon_vma_lock, this properly
>> excludes migration and the transparent hugepage code from VMA changes done
>> by mmap/munmap/mprotect/expand_stack/etc...
>>
>> Unfortunately, this requires adding a new lock (mm->anon_vma_chain_lock),
>> otherwise we have an unavoidable lock ordering conflict. This changes the
>> locking rules for the "same_vma" list to be either mm->mmap_sem for write,
>> or mm->mmap_sem for read plus the new mm->anon_vma_chain lock. This limits
>> the place where the new lock is taken to 2 locations - anon_vma_prepare and
>> expand_downwards.
>>
>> Document the locking rules for the same_vma list in the anon_vma_chain and
>> remove the anon_vma_lock call from expand_upwards, which does not need it.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
> This patch makes things simple. So I like this.
> Actually, I wanted this all-at-once locks approach.
> But I was worried about that how the patch affects AIM 7 workload
> which is cause of anon_vma_chain about scalability by Rik.
> But now Rik himself is sending the patch. So I assume the patch
> couldn't decrease scalability of the workload heavily.
The thing is, the number of anon_vmas attached to a VMA is
small (depth of the tree, so for apache or aim the typical
depth is 2). This N is between 1 and 3.
The problem we had originally is the _width_ of the tree,
where every sibling process was attached to the same anon_vma
and the rmap code had to walk the page tables of all the
processes, for every privately owned page in each child process.
For large server workloads, this N is between a few hundred and
a few thousand.
What matters most at this point is correctness - we need to be
able to exclude rmap walks when messing with a VMA in any way
that breaks lookups, because rmap walks for page migration and
hugepage conversion have to be 100% reliable.
That is not a constraint I had in mind with the original
anon_vma changes, so the code needs to be fixed up now...
I suspect that taking one or two extra spinlocks in the code
paths changed by this patch (mmap/munmap/...) is going to make
a difference at all, since all of those paths are pretty
infrequently taken.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-29 2:10 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-27 21:30 [PATCH 0/3] Fix migration races in rmap_walk() V2 Mel Gorman
2010-04-27 21:30 ` Mel Gorman
2010-04-27 21:30 ` [PATCH 1/3] mm,migration: During fork(), wait for migration to end if migration PTE is encountered Mel Gorman
2010-04-27 21:30 ` Mel Gorman
2010-04-27 22:22 ` Andrea Arcangeli
2010-04-27 22:22 ` Andrea Arcangeli
2010-04-27 23:52 ` KAMEZAWA Hiroyuki
2010-04-27 23:52 ` KAMEZAWA Hiroyuki
2010-04-28 0:18 ` Andrea Arcangeli
2010-04-28 0:18 ` Andrea Arcangeli
2010-04-28 0:19 ` Andrea Arcangeli
2010-04-28 0:19 ` Andrea Arcangeli
2010-04-28 0:28 ` KAMEZAWA Hiroyuki
2010-04-28 0:28 ` KAMEZAWA Hiroyuki
2010-04-28 0:59 ` Andrea Arcangeli
2010-04-28 0:59 ` Andrea Arcangeli
2010-04-28 8:24 ` Mel Gorman
2010-04-28 8:24 ` Mel Gorman
2010-04-27 21:30 ` [PATCH 2/3] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
2010-04-27 21:30 ` Mel Gorman
2010-04-27 23:10 ` Andrea Arcangeli
2010-04-27 23:10 ` Andrea Arcangeli
2010-04-28 9:15 ` Mel Gorman
2010-04-28 9:15 ` Mel Gorman
2010-04-28 15:35 ` Andrea Arcangeli
2010-04-28 15:35 ` Andrea Arcangeli
2010-04-28 15:39 ` Andrea Arcangeli
2010-04-28 15:39 ` Andrea Arcangeli
2010-04-28 15:55 ` Mel Gorman
2010-04-28 15:55 ` Mel Gorman
2010-04-28 16:23 ` Andrea Arcangeli
2010-04-28 16:23 ` Andrea Arcangeli
2010-04-28 17:34 ` Mel Gorman
2010-04-28 17:34 ` Mel Gorman
2010-04-28 17:58 ` Andrea Arcangeli
2010-04-28 17:58 ` Andrea Arcangeli
2010-04-28 17:47 ` [RFC PATCH] take all anon_vma locks in anon_vma_lock Rik van Riel
2010-04-28 17:47 ` Rik van Riel
2010-04-28 18:03 ` Andrea Arcangeli
2010-04-28 18:03 ` Andrea Arcangeli
2010-04-28 18:09 ` Rik van Riel
2010-04-28 18:09 ` Rik van Riel
2010-04-28 18:25 ` [RFC PATCH -v2] " Rik van Riel
2010-04-28 18:25 ` Rik van Riel
2010-04-28 19:07 ` Mel Gorman
2010-04-28 19:07 ` Mel Gorman
2010-04-28 20:17 ` [RFC PATCH -v3] " Rik van Riel
2010-04-28 20:17 ` Rik van Riel
2010-04-28 20:57 ` Rik van Riel
2010-04-28 20:57 ` Rik van Riel
2010-04-29 0:28 ` Minchan Kim
2010-04-29 0:28 ` Minchan Kim
2010-04-29 2:10 ` Rik van Riel [this message]
2010-04-29 2:10 ` Rik van Riel
2010-04-29 2:55 ` Minchan Kim
2010-04-29 2:55 ` Minchan Kim
2010-04-29 6:42 ` Minchan Kim
2010-04-29 6:42 ` Minchan Kim
2010-04-29 15:39 ` Rik van Riel
2010-04-29 15:39 ` Rik van Riel
2010-04-29 7:37 ` Mel Gorman
2010-04-29 7:37 ` Mel Gorman
2010-04-29 8:15 ` Mel Gorman
2010-04-29 8:15 ` Mel Gorman
2010-04-29 8:32 ` Minchan Kim
2010-04-29 8:32 ` Minchan Kim
2010-04-29 8:44 ` Mel Gorman
2010-04-29 8:44 ` Mel Gorman
2010-04-27 21:30 ` [PATCH 3/3] mm,migration: Remove straggling migration PTEs when page tables are being moved after the VMA has already moved Mel Gorman
2010-04-27 21:30 ` Mel Gorman
2010-04-27 22:30 ` Andrea Arcangeli
2010-04-27 22:30 ` Andrea Arcangeli
2010-04-27 22:58 ` Andrea Arcangeli
2010-04-27 22:58 ` Andrea Arcangeli
2010-04-28 0:39 ` KAMEZAWA Hiroyuki
2010-04-28 0:39 ` KAMEZAWA Hiroyuki
2010-04-28 1:05 ` Andrea Arcangeli
2010-04-28 1:05 ` Andrea Arcangeli
2010-04-28 1:09 ` Andrea Arcangeli
2010-04-28 1:09 ` Andrea Arcangeli
2010-04-28 1:18 ` KAMEZAWA Hiroyuki
2010-04-28 1:18 ` KAMEZAWA Hiroyuki
2010-04-28 1:36 ` Andrea Arcangeli
2010-04-28 1:36 ` Andrea Arcangeli
2010-04-28 1:29 ` KAMEZAWA Hiroyuki
2010-04-28 1:29 ` KAMEZAWA Hiroyuki
2010-04-28 1:44 ` Andrea Arcangeli
2010-04-28 1:44 ` Andrea Arcangeli
2010-04-28 2:12 ` KAMEZAWA Hiroyuki
2010-04-28 2:12 ` KAMEZAWA Hiroyuki
2010-04-28 2:42 ` Andrea Arcangeli
2010-04-28 2:42 ` Andrea Arcangeli
2010-04-28 2:49 ` KAMEZAWA Hiroyuki
2010-04-28 2:49 ` KAMEZAWA Hiroyuki
2010-04-28 7:28 ` KAMEZAWA Hiroyuki
2010-04-28 7:28 ` KAMEZAWA Hiroyuki
2010-04-28 10:48 ` Mel Gorman
2010-04-28 10:48 ` Mel Gorman
2010-04-28 0:03 ` KAMEZAWA Hiroyuki
2010-04-28 0:03 ` KAMEZAWA Hiroyuki
2010-04-28 0:08 ` Andrea Arcangeli
2010-04-28 0:08 ` Andrea Arcangeli
2010-04-28 0:36 ` KAMEZAWA Hiroyuki
2010-04-28 0:36 ` KAMEZAWA Hiroyuki
2010-04-28 8:30 ` KAMEZAWA Hiroyuki
2010-04-28 8:30 ` KAMEZAWA Hiroyuki
2010-04-28 14:46 ` Andrea Arcangeli
2010-04-28 14:46 ` Andrea Arcangeli
2010-04-27 22:27 ` [PATCH 0/3] Fix migration races in rmap_walk() V2 Christoph Lameter
2010-04-27 22:27 ` Christoph Lameter
2010-04-27 22:32 ` Andrea Arcangeli
2010-04-27 22:32 ` Andrea Arcangeli
2010-04-28 0:13 ` KAMEZAWA Hiroyuki
2010-04-28 0:13 ` KAMEZAWA Hiroyuki
2010-04-28 0:20 ` Andrea Arcangeli
2010-04-28 0:20 ` Andrea Arcangeli
2010-04-28 14:23 ` Mel Gorman
2010-04-28 14:23 ` Mel Gorman
2010-04-28 14:57 ` Mel Gorman
2010-04-28 14:57 ` Mel Gorman
2010-04-28 15:16 ` Andrea Arcangeli
2010-04-28 15:16 ` Andrea Arcangeli
2010-04-28 15:23 ` Mel Gorman
2010-04-28 15:23 ` Mel Gorman
2010-04-28 15:45 ` Andrea Arcangeli
2010-04-28 15:45 ` Andrea Arcangeli
2010-04-28 20:40 ` Andrea Arcangeli
2010-04-28 20:40 ` Andrea Arcangeli
2010-04-28 21:05 ` Andrea Arcangeli
2010-04-28 21:05 ` Andrea Arcangeli
2010-04-28 9:17 ` Mel Gorman
2010-04-28 9:17 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD8EA85.2000209@redhat.com \
--to=riel@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.