From: Nai Xia <nai.xia@gmail.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>,
Pawel Sikora <pluto@agmk.net>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, jpiszcz@lucidpixels.com, arekm@pld-linux.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mremap: enforce rmap src/dst vma ordering in case of vma_merge succeeding in copy_vma
Date: Sat, 5 Nov 2011 10:00:52 +0800 [thread overview]
Message-ID: <CAPQyPG5Y1e2dac38OLwZAinWb6xpPMWCya2vTaWLPi9+vp1JXQ@mail.gmail.com> (raw)
In-Reply-To: <20111105013317.GU18879@redhat.com>
On Sat, Nov 5, 2011 at 9:33 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Sat, Nov 05, 2011 at 08:21:03AM +0800, Nai Xia wrote:
>> copy_vma() ---> rmap_walk() scan dst VMA --> move_page_tables() moves src to dst
>> ---> rmap_walk() scan src VMA. :D
>
> Hmm yes. I think I got in the wrong track because I focused too much
> on that line you started talking about, the *vmap = new_vma, you said
> I had to reorder stuff there too, and that didn't make sense.
Oh, I think you misunderstood me in that. I was just saying:
if (*vmap = new_vma), then _NO_ PTEs need to be moved afterwards,
because vma has not yet been faulted at all. Otherwise, it breaks the
page->index semantics in the way I explained in my reply to Hugh.
So nothing need to be added there, but the reason is because
the above reasoning, not the same PTL locking...
And for this case alone, I think the proper solving place
should be outside move_vma() but inside do_mremap()
by only vma_adjust() and vma_merge() like stuff.
Because really it does not involve move_page_tables().
>
> The reason it doesn't make sense is that it can't be ok to reorder
> stuff when *vmap = new_vma (i.e. new_vma = old_vma). So if I didn't
> need to reorder in that case I thought I could extrapolate it was
> always ok.
>
> But the opposite is true: that case can't be solved.
>
> Can it really happen that vma_merge will pack (prev_vma, new_range,
> old_vma) together in a single vma? (i.e. prev_vma extended to
> old_vma->vm_end)
>
> Even if there's no prev_vma in the picture (but that's the extreme
> case) it cannot be safe: i.e. a (new_range, old_vma) or (old_vma,
> new_range).
>
> 1 single "vma" for src and dst virtual ranges, means 1 single
> vma->vm_pgoff. But we've two virtual addresses and two ptes. So the
> same page->index can't work for both if the vma->vm_pgoff is the
> same.
>
> So regardless of the ordering here we're dealing with something more
> fundamental.
>
> If rmap_walk runs immediately after vma_merge completes and releases
> the anon_vma_lock, it won't find any pte in the vma anymore. No matter
> the order.
>
> I thought at this before and I didn't mention it but at the light of
> the above issue I start to think this is the only possible correct
> solution to the problem. We should just never call vma_merge before
> move_page_tables. And do the merge by hand later after mremap is
> complete.
>
> The only safe way to do it is to have _two_ different vmas, with two
> different ->vm_pgoff. Then it will work. And by always creating a new
> vma we'll always have it queued at the end, and it'll be safe for the
> same reasons fork is safe.
>
> Always allocate a new vma, and then after the whole vma copy is
> complete, look if we can merge and free some vma. After the fact, so
> it means we can't use vma_merge anymore. vma_merge assumes the
> new_range is "virtual" and no vma is mapped there I think. Anyway
> that's an implementation issue. In some unlikely case we'll allocate 1
> more vma than before, and we'll free it once mremap is finished, but
> that's small problem compared to solving this once and for all.
>
> And that will fix it without ordering games and it'll fix the *vmap=
> new_vma case too. That case really tripped on me as I was assuming
> *that* was correct.
Yes. "Allocating a new vma, copy first and merge later " seems
another solution without the tricky reordering. But you know,
I now share some of Hugh's feeling that maybe we are too
desperate using racing in places where locks are simpler
and guaranteed to be safe.
But I think Mel indicated that anon_vma_locking might be
harmful to JVM SMP performance.
How severe you expect this to be, Mel ?
Thanks
Nai
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-11-05 2:00 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <201110122012.33767.pluto@agmk.net>
[not found] ` <alpine.LSU.2.00.1110131547550.1346@sister.anvils>
2011-10-13 23:30 ` kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110 Hugh Dickins
2011-10-16 16:11 ` Christoph Hellwig
2011-10-16 23:54 ` Andrea Arcangeli
2011-10-17 18:51 ` Hugh Dickins
2011-10-17 22:05 ` Andrea Arcangeli
2011-10-19 7:43 ` Mel Gorman
2011-10-19 13:39 ` Linus Torvalds
2011-10-19 19:42 ` Hugh Dickins
2011-10-20 6:30 ` Paweł Sikora
2011-10-20 6:51 ` Linus Torvalds
2011-10-21 6:54 ` Nai Xia
2011-10-21 7:35 ` Pawel Sikora
2011-10-20 12:51 ` Nai Xia
2011-10-20 18:36 ` Hugh Dickins
2011-10-21 6:22 ` Nai Xia
2011-10-21 8:07 ` Pawel Sikora
2011-10-21 9:07 ` Nai Xia
2011-10-21 21:36 ` Paweł Sikora
2011-10-22 6:21 ` Nai Xia
2011-10-22 16:42 ` Paweł Sikora
2011-10-20 9:11 ` Nai Xia
2011-10-21 15:56 ` Mel Gorman
2011-10-21 17:21 ` Nai Xia
2011-10-21 17:41 ` Andrea Arcangeli
2011-10-21 22:50 ` Andrea Arcangeli
2011-10-22 5:52 ` Nai Xia
2011-10-31 17:14 ` Andrea Arcangeli
2011-10-31 17:27 ` [PATCH] mremap: enforce rmap src/dst vma ordering in case of vma_merge succeeding in copy_vma Andrea Arcangeli
2011-11-01 12:07 ` Mel Gorman
2011-11-01 14:35 ` Nai Xia
2011-11-04 7:31 ` Hugh Dickins
2011-11-04 14:34 ` Nai Xia
2011-11-04 15:59 ` Pawel Sikora
2011-11-05 2:21 ` Nai Xia
2011-11-04 19:16 ` Hugh Dickins
2011-11-04 20:54 ` Andrea Arcangeli
2011-11-05 0:09 ` Nai Xia
2011-11-05 2:21 ` Hugh Dickins
2011-11-05 3:07 ` Andrea Arcangeli
2011-11-05 17:06 ` Andrea Arcangeli
2011-12-08 3:24 ` David Rientjes
2011-12-08 12:42 ` Andrea Arcangeli
2011-12-09 0:08 ` Andrew Morton
2011-12-09 1:55 ` Andrea Arcangeli
2011-11-04 23:56 ` Andrea Arcangeli
2011-11-05 0:21 ` Nai Xia
2011-11-05 0:59 ` Nai Xia
2011-11-05 1:33 ` Andrea Arcangeli
2011-11-05 2:00 ` Nai Xia [this message]
2011-11-07 13:14 ` Mel Gorman
2011-11-07 15:42 ` Andrea Arcangeli
2011-11-07 16:28 ` Mel Gorman
2011-11-09 1:25 ` Andrea Arcangeli
2011-11-11 9:14 ` Nai Xia
2011-11-16 14:00 ` Andrea Arcangeli
2011-11-17 0:16 ` Hugh Dickins
2011-11-17 2:49 ` Nai Xia
2011-11-17 6:21 ` Nai Xia
2011-11-17 18:42 ` Andrea Arcangeli
2011-11-18 1:42 ` Nai Xia
2011-11-18 2:17 ` Andrea Arcangeli
2011-11-19 9:15 ` Nai Xia
2011-10-22 5:07 ` kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110 Nai Xia
2011-10-31 16:34 ` Andrea Arcangeli
2011-10-16 22:37 ` Linus Torvalds
2011-10-17 3:02 ` Hugh Dickins
2011-10-17 3:09 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPQyPG5Y1e2dac38OLwZAinWb6xpPMWCya2vTaWLPi9+vp1JXQ@mail.gmail.com \
--to=nai.xia@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arekm@pld-linux.org \
--cc=hughd@google.com \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=pluto@agmk.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).