From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Multiple potential races on vma->vm_flags
Date: Mon, 7 Sep 2015 14:40:48 +0300 [thread overview]
Message-ID: <20150907114048.GA5016@node.dhcp.inet.fi> (raw)
In-Reply-To: <55EC9221.4040603@oracle.com>
On Sun, Sep 06, 2015 at 03:21:05PM -0400, Sasha Levin wrote:
> ==================================================================
> ThreadSanitizer: data-race in munlock_vma_pages_range
>
> Write of size 8 by thread T378 (K2633, CPU3):
> [<ffffffff81212579>] munlock_vma_pages_range+0x59/0x3e0 mm/mlock.c:425
> [<ffffffff81212ac9>] mlock_fixup+0x1c9/0x280 mm/mlock.c:549
> [<ffffffff81212ccc>] do_mlock+0x14c/0x180 mm/mlock.c:589
> [< inlined >] SyS_munlock+0x74/0xb0 SYSC_munlock mm/mlock.c:651
> [<ffffffff812130b4>] SyS_munlock+0x74/0xb0 mm/mlock.c:643
> [<ffffffff81eb352e>] entry_SYSCALL_64_fastpath+0x12/0x71
> arch/x86/entry/entry_64.S:186
...
> Previous read of size 8 by thread T398 (K2623, CPU2):
> [<ffffffff8121d198>] try_to_unmap_one+0x78/0x4f0 mm/rmap.c:1208
> [< inlined >] rmap_walk+0x147/0x450 rmap_walk_file mm/rmap.c:1540
> [<ffffffff8121e7b7>] rmap_walk+0x147/0x450 mm/rmap.c:1559
> [<ffffffff8121ef72>] try_to_munlock+0xa2/0xc0 mm/rmap.c:1423
> [<ffffffff81211bb0>] __munlock_isolated_page+0x30/0x60 mm/mlock.c:129
> [<ffffffff81212066>] __munlock_pagevec+0x236/0x3f0 mm/mlock.c:331
> [<ffffffff812128a0>] munlock_vma_pages_range+0x380/0x3e0 mm/mlock.c:476
> [<ffffffff81212ac9>] mlock_fixup+0x1c9/0x280 mm/mlock.c:549
> [<ffffffff81212ccc>] do_mlock+0x14c/0x180 mm/mlock.c:589
> [< inlined >] SyS_munlock+0x74/0xb0 SYSC_munlock mm/mlock.c:651
> [<ffffffff812130b4>] SyS_munlock+0x74/0xb0 mm/mlock.c:643
> [<ffffffff81eb352e>] entry_SYSCALL_64_fastpath+0x12/0x71
> arch/x86/entry/entry_64.S:186
Okay, the detected race is mlock/munlock vs. rmap.
On rmap side we check vma->vm_flags in few places without taking
vma->vm_mm->mmap_sem. The vma cannot be freed since we hold i_mmap_rwsem
or anon_vma_lock, but nothing prevent vma->vm_flags from changing under
us.
In this particular case, speculative check in beginning of
try_to_unmap_one() is fine, since we re-check it under mmap_sem later in
the function.
False-negative is fine too here, since we will mlock the page in
__mm_populate() on mlock side after mlock_fixup().
BUT.
We *must* have all speculative vm_flags accesses wrapped READ_ONCE() to
avoid all compiler trickery, like duplication vm_flags access with
inconsistent results.
I looked only on VM_LOCKED checks, but there are few other flags checked
in rmap. All of them must be handled carefully. At least READ_ONCE() is
required.
Other solution would be to introduce per-vma spinlock to protect
vma->vm_flags and probably other vma fields and offload this duty
from mmap_sem.
But that's much bigger project.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Multiple potential races on vma->vm_flags
Date: Mon, 7 Sep 2015 14:40:48 +0300 [thread overview]
Message-ID: <20150907114048.GA5016@node.dhcp.inet.fi> (raw)
In-Reply-To: <55EC9221.4040603@oracle.com>
On Sun, Sep 06, 2015 at 03:21:05PM -0400, Sasha Levin wrote:
> ==================================================================
> ThreadSanitizer: data-race in munlock_vma_pages_range
>
> Write of size 8 by thread T378 (K2633, CPU3):
> [<ffffffff81212579>] munlock_vma_pages_range+0x59/0x3e0 mm/mlock.c:425
> [<ffffffff81212ac9>] mlock_fixup+0x1c9/0x280 mm/mlock.c:549
> [<ffffffff81212ccc>] do_mlock+0x14c/0x180 mm/mlock.c:589
> [< inlined >] SyS_munlock+0x74/0xb0 SYSC_munlock mm/mlock.c:651
> [<ffffffff812130b4>] SyS_munlock+0x74/0xb0 mm/mlock.c:643
> [<ffffffff81eb352e>] entry_SYSCALL_64_fastpath+0x12/0x71
> arch/x86/entry/entry_64.S:186
...
> Previous read of size 8 by thread T398 (K2623, CPU2):
> [<ffffffff8121d198>] try_to_unmap_one+0x78/0x4f0 mm/rmap.c:1208
> [< inlined >] rmap_walk+0x147/0x450 rmap_walk_file mm/rmap.c:1540
> [<ffffffff8121e7b7>] rmap_walk+0x147/0x450 mm/rmap.c:1559
> [<ffffffff8121ef72>] try_to_munlock+0xa2/0xc0 mm/rmap.c:1423
> [<ffffffff81211bb0>] __munlock_isolated_page+0x30/0x60 mm/mlock.c:129
> [<ffffffff81212066>] __munlock_pagevec+0x236/0x3f0 mm/mlock.c:331
> [<ffffffff812128a0>] munlock_vma_pages_range+0x380/0x3e0 mm/mlock.c:476
> [<ffffffff81212ac9>] mlock_fixup+0x1c9/0x280 mm/mlock.c:549
> [<ffffffff81212ccc>] do_mlock+0x14c/0x180 mm/mlock.c:589
> [< inlined >] SyS_munlock+0x74/0xb0 SYSC_munlock mm/mlock.c:651
> [<ffffffff812130b4>] SyS_munlock+0x74/0xb0 mm/mlock.c:643
> [<ffffffff81eb352e>] entry_SYSCALL_64_fastpath+0x12/0x71
> arch/x86/entry/entry_64.S:186
Okay, the detected race is mlock/munlock vs. rmap.
On rmap side we check vma->vm_flags in few places without taking
vma->vm_mm->mmap_sem. The vma cannot be freed since we hold i_mmap_rwsem
or anon_vma_lock, but nothing prevent vma->vm_flags from changing under
us.
In this particular case, speculative check in beginning of
try_to_unmap_one() is fine, since we re-check it under mmap_sem later in
the function.
False-negative is fine too here, since we will mlock the page in
__mm_populate() on mlock side after mlock_fixup().
BUT.
We *must* have all speculative vm_flags accesses wrapped READ_ONCE() to
avoid all compiler trickery, like duplication vm_flags access with
inconsistent results.
I looked only on VM_LOCKED checks, but there are few other flags checked
in rmap. All of them must be handled carefully. At least READ_ONCE() is
required.
Other solution would be to introduce per-vma spinlock to protect
vma->vm_flags and probably other vma fields and offload this duty
from mmap_sem.
But that's much bigger project.
--
Kirill A. Shutemov
next parent reply other threads:[~2015-09-07 11:40 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAAeHK+z8o96YeRF-fQXmoApOKXa0b9pWsQHDeP=5GC_hMTuoDg@mail.gmail.com>
[not found] ` <55EC9221.4040603@oracle.com>
2015-09-07 11:40 ` Kirill A. Shutemov [this message]
2015-09-07 11:40 ` Multiple potential races on vma->vm_flags Kirill A. Shutemov
2015-09-09 15:27 ` Vlastimil Babka
2015-09-09 15:27 ` Vlastimil Babka
2015-09-09 16:01 ` Kirill A. Shutemov
2015-09-09 16:01 ` Kirill A. Shutemov
2015-09-10 0:58 ` Sasha Levin
2015-09-10 0:58 ` Sasha Levin
2015-09-10 8:36 ` Kirill A. Shutemov
2015-09-10 8:36 ` Kirill A. Shutemov
2015-09-10 13:27 ` Andrey Konovalov
2015-09-10 13:27 ` Andrey Konovalov
2015-09-11 10:39 ` Kirill A. Shutemov
2015-09-11 10:39 ` Kirill A. Shutemov
2015-09-11 15:29 ` Vlastimil Babka
2015-09-11 15:29 ` Vlastimil Babka
2015-09-11 16:08 ` Vlastimil Babka
2015-09-11 16:08 ` Vlastimil Babka
2015-09-12 1:27 ` Hugh Dickins
2015-09-12 1:27 ` Hugh Dickins
2015-09-14 10:16 ` Kirill A. Shutemov
2015-09-14 10:16 ` Kirill A. Shutemov
2015-09-15 17:36 ` Sasha Levin
2015-09-15 17:36 ` Sasha Levin
2015-09-15 19:01 ` Kirill A. Shutemov
2015-09-15 19:01 ` Kirill A. Shutemov
2015-09-22 16:47 ` Andrey Konovalov
2015-09-22 16:47 ` Andrey Konovalov
2015-09-22 18:54 ` Hugh Dickins
2015-09-22 18:54 ` Hugh Dickins
2015-09-22 19:45 ` Andrey Konovalov
2015-09-22 19:45 ` Andrey Konovalov
2015-09-23 1:39 ` Hugh Dickins
2015-09-23 1:39 ` Hugh Dickins
2015-09-23 11:46 ` Kirill A. Shutemov
2015-09-23 11:46 ` Kirill A. Shutemov
2015-09-23 22:58 ` Davidlohr Bueso
2015-09-23 22:58 ` Davidlohr Bueso
2015-09-23 13:08 ` Andrey Konovalov
2015-09-23 13:08 ` Andrey Konovalov
2015-09-24 0:42 ` Sasha Levin
2015-09-24 0:42 ` Sasha Levin
2015-09-25 19:33 ` Kirill A. Shutemov
2015-09-25 19:33 ` Kirill A. Shutemov
2015-10-13 22:38 ` Hugh Dickins
2015-10-13 22:38 ` Hugh Dickins
2015-10-13 22:33 ` Hugh Dickins
2015-10-13 22:33 ` Hugh Dickins
2015-10-15 16:58 ` Andrey Konovalov
2015-10-15 16:58 ` Andrey Konovalov
2015-09-23 21:30 ` Sasha Levin
2015-09-23 21:30 ` Sasha Levin
2015-09-25 14:26 ` Oleg Nesterov
2015-09-25 14:26 ` Oleg Nesterov
2015-09-24 13:11 ` Oleg Nesterov
2015-09-24 13:11 ` Oleg Nesterov
2015-09-24 16:27 ` Sasha Levin
2015-09-24 16:27 ` Sasha Levin
2015-09-24 17:26 ` Oleg Nesterov
2015-09-24 17:26 ` Oleg Nesterov
2015-09-24 18:52 ` Andrey Ryabinin
2015-09-24 18:52 ` Andrey Ryabinin
2015-09-24 19:01 ` Sasha Levin
2015-09-24 19:01 ` Sasha Levin
2015-09-25 12:41 ` Oleg Nesterov
2015-09-25 12:41 ` Oleg Nesterov
2015-09-23 15:34 ` Oleg Nesterov
2015-09-23 15:34 ` Oleg Nesterov
2015-09-23 15:38 ` Oleg Nesterov
2015-09-23 15:38 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150907114048.GA5016@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@google.com \
--cc=dvyukov@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.