From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Jan Vorlicek <janvorli@microsoft.com>,
Aditya Mandaleeka <adityam@microsoft.com>
Subject: Re: [PATCH 1/2] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk
Date: Fri, 23 Sep 2016 21:18:40 +0200 [thread overview]
Message-ID: <20160923191840.GK3485@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1609220224230.12486@eggly.anvils>
Hello Hugh,
On Thu, Sep 22, 2016 at 03:36:36AM -0700, Hugh Dickins wrote:
> I suppose: this one seems overblown to me, and risks more change
> (as the CONFIG_DEBUG_VM_RB=y crashes showed).
When DEBUG_VM_RB=n there was no bug that I know of. So I don't think
the fact there was a false positive in the validation code that didn't
immediately cope with the new changes, should be a major concern.
> But I've come back to it several times, not found any incorrectness,
> and was just about ready to Ack it (once the VM_RB fix is folded in,
> though I've not studied that yet): when I noticed that what I'd liked
> least about this one, looks unnecessary too - see below.
The reason the VM_RB=y incremental fix to the validation code is not a
few liner, is to micro-optimize it. I call directly
__vma_unlink_common to be sure the additional parameter is eliminated
at build time if CONFIG_DEBUG_VM_RB=n, and it never risks to go
through the stack in production.
> At the bottom I've appended my corrected version of Andrea's
> earlier patches for comparison: maybe better for stable?
I think it's perfectly suitable for -stable, if there is urgency to
merge it in -stable. OTOH with regard to urgency, this isn't
exploitable and the bug was there for 10+ years?
> > +static inline void __vma_unlink_prev(struct mm_struct *mm,
> > + struct vm_area_struct *vma,
> > + struct vm_area_struct *prev)
> > +{
> > + __vma_unlink_common(mm, vma, prev, true);
> > +}
> > +
> > +static inline void __vma_unlink(struct mm_struct *mm,
> > + struct vm_area_struct *vma)
> > +{
> > + __vma_unlink_common(mm, vma, NULL, false);
> > +}
> > +
>
> Umm, how many functions do we need to unlink a vma?
> Perhaps I'm missing some essential, but what's wrong with a single
> __vma_unlink(mm, vma)? (Could omit mm, but probably better with it.)
Of course that would work, I did that initially. I only had
__vma_unlink and I just removed the "prev" parameter from it.
> The existing __vma_unlink(mm, vma, prev) dates, of course, from
> long before Linus added vma->vm_prev in 2.6.36. It doesn't really
> need its prev arg nowadays, and I wonder if that misled you into
> all this prev and has_prev stuff?
After removing "prev" from __vma_unlink I reintroduced
__vma_unlink_prev as a microoptimization for remove_next = 1/2
cases.
In those two cases we have already "prev" and it's guaranteed not
null. So by keeping the _common version __always_inline the parameters
of the _common disappears in the assembly and in turn the
__vma_unlink_prev is a bit faster.
Perhaps it's not worth to do these kind of microoptimizations? The
only reason I reintroduced a version of __vma_unlink_prev that gets
prev not NULL as parameter was explicitly to microoptimize with
__always_inline.
> (Yes, of course it needs to handle the NULL vma->vm_prev mm->mmap
> case, but that doesn't need these three functions.)
>
> But I see this area gets touched again in yesterday's 3/4 to fix
> the VM_RB issue. I haven't tried applying that patch on top to
> see what the result looks like, but I hope simpler than this.
Right, to handle the case of DEBUG_VM_RB=y I need to pass a different
"ignore" parameter in remove_next == 3, so it's even more worth to
microoptimize now that I'm forced to have a different kind of call
anyway, and I can't just call __vma_unlink(next).
Once the two patches are folded, __vma_unlink is renamed to
__vma_unlink_prev that is a more accurate name anyway I think, given
the parameters and that assumption it does on prev being not NULL.
> > + if (remove_next != 3)
> > + __vma_unlink_prev(mm, next, vma);
> > + else
> > + /* vma is not before next if they've been swapped */
> > + __vma_unlink(mm, next);
>
> And if the VM_RB issue doesn't complicate it, this would just amount to
> __vma_unlink(mm, next);
> without any remove_next 3 variation.
Yes, and VM_RB complicates it.
> > + if (remove_next != 3) {
>
> if (vma == orig_vma), and you won't need the remove_next 3 state at all.
I think that would be less readable. I don't want to risk to mistake
case 1/2/3. I could use an enum and REMOVE_NEXT, REMOVE_NEXT_NEXT,
REMOVE_PREV, or I could use -1 instead of 3 to show it's removing prev
if you wish, but I would prefer not to use vma == orig_vma to detect
remove_next != 3. It can't improve performance either.
orig_vma is purely for trans_huge split when the vma->vm_start/end
(and next->vm_start if adjust_next) boundary changes.
The only point of orig_vma is to replace this statement: "remove_next
!= 3 : vma : next". I wouldn't mix up the detection of case 1/2/3 with
that micro-optimization.
> Here's my fixup of Andrea's earlier version, not swapping vma and next
> as the above does, but applying properties of next to vma as before.
> Maybe this version should go in first, so that it's available as an
> easier and safer candidate for stable backports: whatever akpm prefers.
I think this is a more conservative and in turn safer approach for
urgent -stable or for urgent backports. Performance-wise I doubt any
difference is measurable.
For the longer term upstream, I think removing the oddness factor from
case 8 for good is better, as we may get bitten by it again and it's
also quite counter intuitive for the callers of vma_merge to receive a
vma that isn't already fully in sync with all the parameters passed to
vma_merge. And having to overwrite the "different" bits by hand. I
feel the oddness in case 8 should be dropped for good and it's not
much more complicated to do so (especially if we ignore the
__vma_unlink details which are a fully self contained problem and they
cannot add up to the complexity of vma_merge/vma_adjust).
I successfully tested your fix with the testcase that exercises the
race and reviewed your fix and it's certainly correct too to solve the
race against rmap_walks that access vma_page_prot/vm_flags. The fix in
-mm however is solving the race condition for all fields, if any
rmap_walk accessed more than those two fields, and without having to
copy them off.
Tested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Overall I'm fine either ways, but I had to elaborate my preference :).
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-09-23 19:18 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-15 17:41 [PATCH 0/2] vma_merge vs rmap_walk SMP race condition fix Andrea Arcangeli
2016-09-15 17:41 ` [PATCH 1/2] mm: vm_page_prot: update with WRITE_ONCE/READ_ONCE Andrea Arcangeli
2016-09-15 18:27 ` Rik van Riel
2016-09-15 17:41 ` [PATCH 2/2] mm: vma_merge: fix race vm_page_prot race condition against rmap_walk Andrea Arcangeli
2016-09-15 18:28 ` Rik van Riel
2016-09-16 18:42 ` Hugh Dickins
2016-09-16 20:54 ` Andrea Arcangeli
2016-09-17 16:05 ` [PATCH 0/1] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk v2 Andrea Arcangeli
2016-09-17 16:05 ` [PATCH 1/1] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk Andrea Arcangeli
2016-09-18 0:36 ` Andrea Arcangeli
2016-09-19 18:25 ` [PATCH 1/2] " Andrea Arcangeli
2016-09-19 18:25 ` [PATCH 2/2] mm: vma_adjust: remove superfluous check for next not NULL Andrea Arcangeli
2016-09-22 10:36 ` [PATCH 1/2] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk Hugh Dickins
2016-09-23 19:18 ` Andrea Arcangeli [this message]
2016-09-23 20:25 ` Hugh Dickins
2016-09-28 5:09 ` [mm] 2129957506: kernel BUG at mm/mmap.c:329! kernel test robot
2016-09-28 5:09 ` [lkp] " kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160923191840.GK3485@redhat.com \
--to=aarcange@redhat.com \
--cc=adityam@microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=janvorli@microsoft.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.