All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 2/3] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
Date: Wed, 28 Apr 2010 16:55:58 +0100	[thread overview]
Message-ID: <20100428155558.GI15815@csn.ul.ie> (raw)
In-Reply-To: <20100428153525.GR510@random.random>

On Wed, Apr 28, 2010 at 05:35:25PM +0200, Andrea Arcangeli wrote:
> On Wed, Apr 28, 2010 at 10:15:55AM +0100, Mel Gorman wrote:
> > It became unconditional because I wasn't sure of the optimisation versus the
> > new anon_vma changes (doesn't matter, should have been safe). At the time
> 
> Changeset 287d97ac032136724143cde8d5964b414d562ee3 is meant to explain
> the removal of the lock but I don't get it from the comments. Or at
> least I don't get from that comment why we can't resurrect the plain
> old deleted code that looked fine to me.

Frankly, I don't understand why it was safe to drop the lock either.
Maybe it was a mistake but I still haven't convinced myself I fully
understand the subtleties of the anon_vma changes.

> Like there is no reason to
> take the lock if start == vma->vm_start.
> 
> > So, the VMA list does not appear to be messed up but there still needs
> > to be protection against modification of VMA details that are already on
> > the list. For that, the seq counter would have been enough and
> > lighter-weight than acquiring the anon_vma->lock every time in
> > vma_adjust().
> > 
> > I'll drop this patch again as the execve race looks the most important.
> 
> You mean you're dropping patch 2 too?

Temporarily at least until I figured out if execve was the only problem. The
locking in vma_adjust didn't look like the prime reason for the crash
but the lack of locking there is still very odd.

> I agree dropping patch 1 but
> to me the having to take all the anon_vma locks for every
> vma->anon_vma->lock that we walk seems a must, otherwise
> expand_downwards and vma_adjust won't be ok, plus we need to re-add
> the anon_vma lock to vma_adjust, it can't be safe to alter vm_pgoff
> and vm_start outside of the anon_vma->lock. Or I am mistaken?
> 

No, you're not. If nothing else, vma_address can return the wrong value because
the VMAs vm_start and vm_pgoff were in the process of being updated but not
fully updated. It's hard to see how vma_address would return the wrong value
and miss a migration PTE as a result but it's a possibility.  It's probably
a lot more important for transparent hugepage support.

> Patch 2 wouldn't help the swapops crash we reproduced because at that
> point the anon_vma of the stack is the local one, it's just after
> execve.
> 
> vma_adjust and expand_downards would alter vm_pgoff and vm_start while
> taking only the vma->anon_vma->lock where the vma->anon_vma is the
> _local_ one of the vma. 

True, although in the case of expand_downwards, it's highly unlikely that
there is also a migration PTE to be cleaned up. It's hard to see how a
migration PTE would be left behind in that case but it still looks wrong to
be updating the VMA fields without locking.

> But a vma in mainline can be indexed in
> infinite anon_vmas, so to prevent breaking migration
> vma_adjust/expand_downards the rmap_walk would need to take _all_
> anon_vma->locks for every anon_vma that the vma is indexed into.

I felt this would be too heavy in the common case which is why I made
rmap_walk() do the try-lock-or-back-off instead because rmap_walk is typically
in far less critical paths.

> Or
> alternatively like you implemented rmap_walk would need to check if
> the vma we found in the rmap_walk is different from the original
> anon_vma and to take the vma->anon_vma->lock (so taking the
> anon_vma->lock of the local anon_vma of every vma) before it can
> actually read the vma->vm_pgoff/vm_start inside vma_address.
> 

To be absolutly sure, yes this is required. I don't think we've been hitting
this exact problem in these tests but it still is a better plan than adjusting
VMA details without locks.

> If the above is right it also means the new anon-vma changes also break
> the whole locking of transparent hugepage, see wait_split_huge_page,
> it does a spin_unlock_wait(&anon_vma->lock) thinking that waiting the
> "local" anon-vma is enough, when in fact the hugepage may be shared
> and belonging to the parent parent_vma->anon_vma and not to the local
> one of the last child that is waiting on the wrong lock. So I may have
> to rewrite this part of the thp locking to solve this. And for me it's
> not enough to just taking more locks inside the rmap walks inside
> split_huge_page as I used the anon_vma lock outside too.
> 

No fun. That potentially could be a lot of locks to take to split the
page.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mel@csn.ul.ie>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 2/3] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
Date: Wed, 28 Apr 2010 16:55:58 +0100	[thread overview]
Message-ID: <20100428155558.GI15815@csn.ul.ie> (raw)
In-Reply-To: <20100428153525.GR510@random.random>

On Wed, Apr 28, 2010 at 05:35:25PM +0200, Andrea Arcangeli wrote:
> On Wed, Apr 28, 2010 at 10:15:55AM +0100, Mel Gorman wrote:
> > It became unconditional because I wasn't sure of the optimisation versus the
> > new anon_vma changes (doesn't matter, should have been safe). At the time
> 
> Changeset 287d97ac032136724143cde8d5964b414d562ee3 is meant to explain
> the removal of the lock but I don't get it from the comments. Or at
> least I don't get from that comment why we can't resurrect the plain
> old deleted code that looked fine to me.

Frankly, I don't understand why it was safe to drop the lock either.
Maybe it was a mistake but I still haven't convinced myself I fully
understand the subtleties of the anon_vma changes.

> Like there is no reason to
> take the lock if start == vma->vm_start.
> 
> > So, the VMA list does not appear to be messed up but there still needs
> > to be protection against modification of VMA details that are already on
> > the list. For that, the seq counter would have been enough and
> > lighter-weight than acquiring the anon_vma->lock every time in
> > vma_adjust().
> > 
> > I'll drop this patch again as the execve race looks the most important.
> 
> You mean you're dropping patch 2 too?

Temporarily at least until I figured out if execve was the only problem. The
locking in vma_adjust didn't look like the prime reason for the crash
but the lack of locking there is still very odd.

> I agree dropping patch 1 but
> to me the having to take all the anon_vma locks for every
> vma->anon_vma->lock that we walk seems a must, otherwise
> expand_downwards and vma_adjust won't be ok, plus we need to re-add
> the anon_vma lock to vma_adjust, it can't be safe to alter vm_pgoff
> and vm_start outside of the anon_vma->lock. Or I am mistaken?
> 

No, you're not. If nothing else, vma_address can return the wrong value because
the VMAs vm_start and vm_pgoff were in the process of being updated but not
fully updated. It's hard to see how vma_address would return the wrong value
and miss a migration PTE as a result but it's a possibility.  It's probably
a lot more important for transparent hugepage support.

> Patch 2 wouldn't help the swapops crash we reproduced because at that
> point the anon_vma of the stack is the local one, it's just after
> execve.
> 
> vma_adjust and expand_downards would alter vm_pgoff and vm_start while
> taking only the vma->anon_vma->lock where the vma->anon_vma is the
> _local_ one of the vma. 

True, although in the case of expand_downwards, it's highly unlikely that
there is also a migration PTE to be cleaned up. It's hard to see how a
migration PTE would be left behind in that case but it still looks wrong to
be updating the VMA fields without locking.

> But a vma in mainline can be indexed in
> infinite anon_vmas, so to prevent breaking migration
> vma_adjust/expand_downards the rmap_walk would need to take _all_
> anon_vma->locks for every anon_vma that the vma is indexed into.

I felt this would be too heavy in the common case which is why I made
rmap_walk() do the try-lock-or-back-off instead because rmap_walk is typically
in far less critical paths.

> Or
> alternatively like you implemented rmap_walk would need to check if
> the vma we found in the rmap_walk is different from the original
> anon_vma and to take the vma->anon_vma->lock (so taking the
> anon_vma->lock of the local anon_vma of every vma) before it can
> actually read the vma->vm_pgoff/vm_start inside vma_address.
> 

To be absolutly sure, yes this is required. I don't think we've been hitting
this exact problem in these tests but it still is a better plan than adjusting
VMA details without locks.

> If the above is right it also means the new anon-vma changes also break
> the whole locking of transparent hugepage, see wait_split_huge_page,
> it does a spin_unlock_wait(&anon_vma->lock) thinking that waiting the
> "local" anon-vma is enough, when in fact the hugepage may be shared
> and belonging to the parent parent_vma->anon_vma and not to the local
> one of the last child that is waiting on the wrong lock. So I may have
> to rewrite this part of the thp locking to solve this. And for me it's
> not enough to just taking more locks inside the rmap walks inside
> split_huge_page as I used the anon_vma lock outside too.
> 

No fun. That potentially could be a lot of locks to take to split the
page.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-04-28 15:56 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-27 21:30 [PATCH 0/3] Fix migration races in rmap_walk() V2 Mel Gorman
2010-04-27 21:30 ` Mel Gorman
2010-04-27 21:30 ` [PATCH 1/3] mm,migration: During fork(), wait for migration to end if migration PTE is encountered Mel Gorman
2010-04-27 21:30   ` Mel Gorman
2010-04-27 22:22   ` Andrea Arcangeli
2010-04-27 22:22     ` Andrea Arcangeli
2010-04-27 23:52     ` KAMEZAWA Hiroyuki
2010-04-27 23:52       ` KAMEZAWA Hiroyuki
2010-04-28  0:18       ` Andrea Arcangeli
2010-04-28  0:18         ` Andrea Arcangeli
2010-04-28  0:19         ` Andrea Arcangeli
2010-04-28  0:19           ` Andrea Arcangeli
2010-04-28  0:28           ` KAMEZAWA Hiroyuki
2010-04-28  0:28             ` KAMEZAWA Hiroyuki
2010-04-28  0:59             ` Andrea Arcangeli
2010-04-28  0:59               ` Andrea Arcangeli
2010-04-28  8:24       ` Mel Gorman
2010-04-28  8:24         ` Mel Gorman
2010-04-27 21:30 ` [PATCH 2/3] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
2010-04-27 21:30   ` Mel Gorman
2010-04-27 23:10   ` Andrea Arcangeli
2010-04-27 23:10     ` Andrea Arcangeli
2010-04-28  9:15     ` Mel Gorman
2010-04-28  9:15       ` Mel Gorman
2010-04-28 15:35       ` Andrea Arcangeli
2010-04-28 15:35         ` Andrea Arcangeli
2010-04-28 15:39         ` Andrea Arcangeli
2010-04-28 15:39           ` Andrea Arcangeli
2010-04-28 15:55         ` Mel Gorman [this message]
2010-04-28 15:55           ` Mel Gorman
2010-04-28 16:23           ` Andrea Arcangeli
2010-04-28 16:23             ` Andrea Arcangeli
2010-04-28 17:34             ` Mel Gorman
2010-04-28 17:34               ` Mel Gorman
2010-04-28 17:58               ` Andrea Arcangeli
2010-04-28 17:58                 ` Andrea Arcangeli
2010-04-28 17:47             ` [RFC PATCH] take all anon_vma locks in anon_vma_lock Rik van Riel
2010-04-28 17:47               ` Rik van Riel
2010-04-28 18:03               ` Andrea Arcangeli
2010-04-28 18:03                 ` Andrea Arcangeli
2010-04-28 18:09                 ` Rik van Riel
2010-04-28 18:09                   ` Rik van Riel
2010-04-28 18:25               ` [RFC PATCH -v2] " Rik van Riel
2010-04-28 18:25                 ` Rik van Riel
2010-04-28 19:07                 ` Mel Gorman
2010-04-28 19:07                   ` Mel Gorman
2010-04-28 20:17                 ` [RFC PATCH -v3] " Rik van Riel
2010-04-28 20:17                   ` Rik van Riel
2010-04-28 20:57                   ` Rik van Riel
2010-04-28 20:57                     ` Rik van Riel
2010-04-29  0:28                     ` Minchan Kim
2010-04-29  0:28                       ` Minchan Kim
2010-04-29  2:10                       ` Rik van Riel
2010-04-29  2:10                         ` Rik van Riel
2010-04-29  2:55                         ` Minchan Kim
2010-04-29  2:55                           ` Minchan Kim
2010-04-29  6:42                           ` Minchan Kim
2010-04-29  6:42                             ` Minchan Kim
2010-04-29 15:39                           ` Rik van Riel
2010-04-29 15:39                             ` Rik van Riel
2010-04-29  7:37                       ` Mel Gorman
2010-04-29  7:37                         ` Mel Gorman
2010-04-29  8:15                     ` Mel Gorman
2010-04-29  8:15                       ` Mel Gorman
2010-04-29  8:32                       ` Minchan Kim
2010-04-29  8:32                         ` Minchan Kim
2010-04-29  8:44                         ` Mel Gorman
2010-04-29  8:44                           ` Mel Gorman
2010-04-27 21:30 ` [PATCH 3/3] mm,migration: Remove straggling migration PTEs when page tables are being moved after the VMA has already moved Mel Gorman
2010-04-27 21:30   ` Mel Gorman
2010-04-27 22:30   ` Andrea Arcangeli
2010-04-27 22:30     ` Andrea Arcangeli
2010-04-27 22:58     ` Andrea Arcangeli
2010-04-27 22:58       ` Andrea Arcangeli
2010-04-28  0:39       ` KAMEZAWA Hiroyuki
2010-04-28  0:39         ` KAMEZAWA Hiroyuki
2010-04-28  1:05         ` Andrea Arcangeli
2010-04-28  1:05           ` Andrea Arcangeli
2010-04-28  1:09           ` Andrea Arcangeli
2010-04-28  1:09             ` Andrea Arcangeli
2010-04-28  1:18           ` KAMEZAWA Hiroyuki
2010-04-28  1:18             ` KAMEZAWA Hiroyuki
2010-04-28  1:36             ` Andrea Arcangeli
2010-04-28  1:36               ` Andrea Arcangeli
2010-04-28  1:29       ` KAMEZAWA Hiroyuki
2010-04-28  1:29         ` KAMEZAWA Hiroyuki
2010-04-28  1:44         ` Andrea Arcangeli
2010-04-28  1:44           ` Andrea Arcangeli
2010-04-28  2:12           ` KAMEZAWA Hiroyuki
2010-04-28  2:12             ` KAMEZAWA Hiroyuki
2010-04-28  2:42             ` Andrea Arcangeli
2010-04-28  2:42               ` Andrea Arcangeli
2010-04-28  2:49               ` KAMEZAWA Hiroyuki
2010-04-28  2:49                 ` KAMEZAWA Hiroyuki
2010-04-28  7:28                 ` KAMEZAWA Hiroyuki
2010-04-28  7:28                   ` KAMEZAWA Hiroyuki
2010-04-28 10:48                   ` Mel Gorman
2010-04-28 10:48                     ` Mel Gorman
2010-04-28  0:03   ` KAMEZAWA Hiroyuki
2010-04-28  0:03     ` KAMEZAWA Hiroyuki
2010-04-28  0:08     ` Andrea Arcangeli
2010-04-28  0:08       ` Andrea Arcangeli
2010-04-28  0:36       ` KAMEZAWA Hiroyuki
2010-04-28  0:36         ` KAMEZAWA Hiroyuki
2010-04-28  8:30   ` KAMEZAWA Hiroyuki
2010-04-28  8:30     ` KAMEZAWA Hiroyuki
2010-04-28 14:46     ` Andrea Arcangeli
2010-04-28 14:46       ` Andrea Arcangeli
2010-04-27 22:27 ` [PATCH 0/3] Fix migration races in rmap_walk() V2 Christoph Lameter
2010-04-27 22:27   ` Christoph Lameter
2010-04-27 22:32   ` Andrea Arcangeli
2010-04-27 22:32     ` Andrea Arcangeli
2010-04-28  0:13     ` KAMEZAWA Hiroyuki
2010-04-28  0:13       ` KAMEZAWA Hiroyuki
2010-04-28  0:20       ` Andrea Arcangeli
2010-04-28  0:20         ` Andrea Arcangeli
2010-04-28 14:23         ` Mel Gorman
2010-04-28 14:23           ` Mel Gorman
2010-04-28 14:57           ` Mel Gorman
2010-04-28 14:57             ` Mel Gorman
2010-04-28 15:16             ` Andrea Arcangeli
2010-04-28 15:16               ` Andrea Arcangeli
2010-04-28 15:23               ` Mel Gorman
2010-04-28 15:23                 ` Mel Gorman
2010-04-28 15:45                 ` Andrea Arcangeli
2010-04-28 15:45                   ` Andrea Arcangeli
2010-04-28 20:40                   ` Andrea Arcangeli
2010-04-28 20:40                     ` Andrea Arcangeli
2010-04-28 21:05                     ` Andrea Arcangeli
2010-04-28 21:05                       ` Andrea Arcangeli
2010-04-28  9:17     ` Mel Gorman
2010-04-28  9:17       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100428155558.GI15815@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.