From: Mel Gorman <mgorman@suse.de>
To: Simon Jeons <simon.jeons@gmail.com>
Cc: Michel Lespinasse <walken@google.com>,
Zhouping Liu <zliu@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
hughd@google.com, Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH] mm: thp: Acquire the anon_vma rwsem for lock during split
Date: Mon, 7 Jan 2013 15:09:35 +0000 [thread overview]
Message-ID: <20130107150935.GK3885@suse.de> (raw)
In-Reply-To: <1357388687.9001.3.camel@kernel.cn.ibm.com>
On Sat, Jan 05, 2013 at 06:24:47AM -0600, Simon Jeons wrote:
> On Fri, 2013-01-04 at 17:32 -0800, Michel Lespinasse wrote:
> > On Fri, Jan 4, 2013 at 6:08 AM, Mel Gorman <mgorman@suse.de> wrote:
> > > Despite the reason for these commits, NUMA balancing is not the direct
> > > source of the problem. split_huge_page() expected the anon_vma lock to be
> > > exclusive to serialise the whole split operation. Ordinarily it is expected
> > > that the anon_vma lock would only be required when updating the avcs but
> > > THP also uses it. The locking requirements for THP are complex and there
> > > is some overlap but broadly speaking they include the following
> > >
> > > 1. mmap_sem for read or write prevents THPs being created underneath
> > > 2. anon_vma is taken for write if collapsing a huge page
> > > 3. mm->page_table_lock should be taken when checking if pmd_trans_huge as
> > > split_huge_page can run in parallel
> > > 4. wait_split_huge_page uses anon_vma taken for write mode to serialise
> > > against other THP operations
> > > 5. compound_lock is used to serialise between
> > > __split_huge_page_refcount() and gup
> > >
> > > split_huge_page takes anon_vma for read but that does not serialise against
> > > parallel split_huge_page operations on the same page (rule 2). One process
> > > could be modifying the ref counts while the other modifies the page tables
> > > leading to counters not being reliable. This patch takes the anon_vma
> > > lock for write to serialise against parallel split_huge_page and parallel
> > > collapse operations as it is the most fine-grained lock available that
> > > protects against both.
> >
> > Your comment about this being the most fine-grained lock made me
> > think, couldn't we use lock_page() on the THP page here ?
> >
> > Now I don't necessarily want to push you that direction, because I
> > haven't fully thought it trough and because what you propose brings us
> > closer to what happened before anon_vma became an rwlock, which is
> > more obviously safe. But I felt I should still mention it, since we're
> > really only trying to protect from concurrent operations on the same
> > THP page, so locking at just that granularity would seem desirable.
>
> Why you said that anon_vma lock who will protect page associated to a
> list of vmas is fine-grained then page lock who just protect one page?
>
We did not say it was fine-grained, we said it was the most fine-grained
lock available. It's a coarse lock but using the page lock would be
problematic.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Simon Jeons <simon.jeons@gmail.com>
Cc: Michel Lespinasse <walken@google.com>,
Zhouping Liu <zliu@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
hughd@google.com, Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH] mm: thp: Acquire the anon_vma rwsem for lock during split
Date: Mon, 7 Jan 2013 15:09:35 +0000 [thread overview]
Message-ID: <20130107150935.GK3885@suse.de> (raw)
In-Reply-To: <1357388687.9001.3.camel@kernel.cn.ibm.com>
On Sat, Jan 05, 2013 at 06:24:47AM -0600, Simon Jeons wrote:
> On Fri, 2013-01-04 at 17:32 -0800, Michel Lespinasse wrote:
> > On Fri, Jan 4, 2013 at 6:08 AM, Mel Gorman <mgorman@suse.de> wrote:
> > > Despite the reason for these commits, NUMA balancing is not the direct
> > > source of the problem. split_huge_page() expected the anon_vma lock to be
> > > exclusive to serialise the whole split operation. Ordinarily it is expected
> > > that the anon_vma lock would only be required when updating the avcs but
> > > THP also uses it. The locking requirements for THP are complex and there
> > > is some overlap but broadly speaking they include the following
> > >
> > > 1. mmap_sem for read or write prevents THPs being created underneath
> > > 2. anon_vma is taken for write if collapsing a huge page
> > > 3. mm->page_table_lock should be taken when checking if pmd_trans_huge as
> > > split_huge_page can run in parallel
> > > 4. wait_split_huge_page uses anon_vma taken for write mode to serialise
> > > against other THP operations
> > > 5. compound_lock is used to serialise between
> > > __split_huge_page_refcount() and gup
> > >
> > > split_huge_page takes anon_vma for read but that does not serialise against
> > > parallel split_huge_page operations on the same page (rule 2). One process
> > > could be modifying the ref counts while the other modifies the page tables
> > > leading to counters not being reliable. This patch takes the anon_vma
> > > lock for write to serialise against parallel split_huge_page and parallel
> > > collapse operations as it is the most fine-grained lock available that
> > > protects against both.
> >
> > Your comment about this being the most fine-grained lock made me
> > think, couldn't we use lock_page() on the THP page here ?
> >
> > Now I don't necessarily want to push you that direction, because I
> > haven't fully thought it trough and because what you propose brings us
> > closer to what happened before anon_vma became an rwlock, which is
> > more obviously safe. But I felt I should still mention it, since we're
> > really only trying to protect from concurrent operations on the same
> > THP page, so locking at just that granularity would seem desirable.
>
> Why you said that anon_vma lock who will protect page associated to a
> list of vmas is fine-grained then page lock who just protect one page?
>
We did not say it was fine-grained, we said it was the most fine-grained
lock available. It's a coarse lock but using the page lock would be
problematic.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2013-01-07 15:09 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1621091901.34838094.1356409676820.JavaMail.root@redhat.com>
2012-12-25 4:38 ` kernel BUG at mm/huge_memory.c:1798! Zhouping Liu
2012-12-25 4:38 ` Zhouping Liu
2012-12-25 12:05 ` Hillf Danton
2012-12-25 12:05 ` Hillf Danton
2012-12-26 2:55 ` Zhouping Liu
2012-12-26 2:55 ` Zhouping Liu
2012-12-27 0:31 ` Alexander Beregalov
2012-12-27 0:31 ` Alexander Beregalov
2012-12-27 12:12 ` Hillf Danton
2012-12-27 12:12 ` Hillf Danton
2012-12-27 16:08 ` Alex Xu
2012-12-27 16:08 ` Alex Xu
2012-12-29 7:22 ` Hillf Danton
2012-12-29 7:22 ` Hillf Danton
2012-12-26 11:22 ` BUG: unable to handle kernel NULL pointer dereference at 0000000000000500 Zhouping Liu
2012-12-26 12:01 ` Hillf Danton
2012-12-26 12:01 ` Hillf Danton
2012-12-26 13:24 ` Zhouping Liu
2012-12-26 13:24 ` Zhouping Liu
2012-12-26 15:14 ` Hillf Danton
2012-12-26 15:14 ` Hillf Danton
2012-12-26 14:59 ` Zlatko Calusic
2012-12-26 14:59 ` Zlatko Calusic
2012-12-27 14:58 ` Zlatko Calusic
2012-12-27 14:58 ` Zlatko Calusic
2012-12-27 23:55 ` David R. Piegdon
2012-12-28 0:09 ` Zlatko Calusic
2012-12-28 0:09 ` Zlatko Calusic
2012-12-28 2:45 ` Zhouping Liu
2012-12-28 2:45 ` Zhouping Liu
2012-12-28 2:48 ` Zhouping Liu
2012-12-28 2:48 ` Zhouping Liu
2012-12-28 9:01 ` Zhouping Liu
2012-12-28 9:01 ` Zhouping Liu
2012-12-28 13:43 ` Zlatko Calusic
2012-12-28 13:43 ` Zlatko Calusic
2012-12-28 12:57 ` Zlatko Calusic
2012-12-28 12:57 ` Zlatko Calusic
2013-01-03 17:57 ` kernel BUG at mm/huge_memory.c:1798! Mel Gorman
2013-01-03 17:57 ` Mel Gorman
2013-01-04 14:08 ` [PATCH] mm: thp: Acquire the anon_vma rwsem for lock during split Mel Gorman
2013-01-04 14:08 ` Mel Gorman
2013-01-04 21:28 ` Hugh Dickins
2013-01-04 21:28 ` Hugh Dickins
2013-01-07 14:36 ` Mel Gorman
2013-01-07 14:36 ` Mel Gorman
2013-01-07 14:39 ` [PATCH] mm: thp: Acquire the anon_vma rwsem for write " Mel Gorman
2013-01-07 14:39 ` Mel Gorman
2013-01-05 1:32 ` [PATCH] mm: thp: Acquire the anon_vma rwsem for lock " Michel Lespinasse
2013-01-05 1:32 ` Michel Lespinasse
2013-01-05 12:24 ` Simon Jeons
2013-01-05 12:24 ` Simon Jeons
2013-01-07 15:09 ` Mel Gorman [this message]
2013-01-07 15:09 ` Mel Gorman
2013-01-07 15:08 ` Mel Gorman
2013-01-07 15:08 ` Mel Gorman
2013-01-05 5:51 ` Zhouping Liu
2013-01-05 5:51 ` Zhouping Liu
2013-01-07 14:38 ` Mel Gorman
2013-01-07 14:38 ` Mel Gorman
2013-01-05 12:21 ` Simon Jeons
2013-01-05 12:21 ` Simon Jeons
2013-01-04 16:58 ` kernel BUG at mm/huge_memory.c:1798! Zhouping Liu
2013-01-04 16:58 ` Zhouping Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130107150935.GK3885@suse.de \
--to=mgorman@suse.de \
--cc=aarcange@redhat.com \
--cc=hughd@google.com \
--cc=jweiner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=riel@redhat.com \
--cc=simon.jeons@gmail.com \
--cc=walken@google.com \
--cc=zliu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.