From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Oleg Nesterov <oleg@redhat.com>, Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dmitry Vyukov <dvyukov@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: Multiple potential races on vma->vm_flags
Date: Fri, 25 Sep 2015 22:33:06 +0300 [thread overview]
Message-ID: <20150925193306.GA2200@node.dhcp.inet.fi> (raw)
In-Reply-To: <560346F2.4050507@oracle.com>
On Wed, Sep 23, 2015 at 08:42:26PM -0400, Sasha Levin wrote:
> On 09/23/2015 09:08 AM, Andrey Konovalov wrote:
> > On Wed, Sep 23, 2015 at 3:39 AM, Hugh Dickins <hughd@google.com> wrote:
> >> > This is totally untested, and one of you may quickly prove me wrong;
> >> > but I went in to fix your "Bad page state (mlocked)" by holding pte
> >> > lock across the down_read_trylock of mmap_sem in try_to_unmap_one(),
> >> > then couldn't see why it would need mmap_sem at all, given how mlock
> >> > and munlock first assert intention by setting or clearing VM_LOCKED
> >> > in vm_flags, then work their way up the vma, taking pte locks.
> >> >
> >> > Calling mlock_vma_page() under pte lock may look suspicious
> >> > at first: but what it does is similar to clear_page_mlock(),
> >> > which we regularly call under pte lock from page_remove_rmap().
> >> >
> >> > I'd rather wait to hear whether this appears to work in practice,
> >> > and whether you agree that it should work in theory, before writing
> >> > the proper description. I'd love to lose that down_read_trylock.
> > No, unfortunately it doesn't work, I still see "Bad page state (mlocked)".
> >
> > It seems that your patch doesn't fix the race from the report below, since pte
> > lock is not taken when 'vma->vm_flags &= ~VM_LOCKED;' (mlock.c:425)
> > is being executed. (Line numbers are from kernel with your patch applied.)
>
> I've fired up my HZ_10000 patch,
Can we make HZ_10000 thing into upstream? Under KERNEL_DEBUG, or
something?
> and this seems to be a real race that is
> somewhat easy to reproduce under those conditions.
>
> Here's a fresh backtrace from my VMs:
>
> [1935109.882343] BUG: Bad page state in process trinity-subchil pfn:3ca200
> [1935109.884000] page:ffffea000f288000 count:0 mapcount:0 mapping: (null) index:0x1e00 compound_mapcount: 0
> [1935109.885772] flags: 0x22fffff80144008(uptodate|head|swapbacked|mlocked)
> [1935109.887174] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [1935109.888197] bad because of flags:
> [1935109.888759] flags: 0x100000(mlocked)
> [1935109.889525] Modules linked in:
> [1935109.890165] CPU: 8 PID: 2615 Comm: trinity-subchil Not tainted 4.3.0-rc2-next-20150923-sasha-00079-gec04207-dirty #2569
> [1935109.891876] 1ffffffff6445448 00000000e5dca494 ffff8803f7657708 ffffffffa70402da
> [1935109.893504] ffffea000f288000 ffff8803f7657738 ffffffffa56e522b 022fffff80144008
> [1935109.894947] ffffea000f288020 ffffea000f288000 00000000ffffffff ffff8803f76577a8
> [1935109.896413] Call Trace:
> [1935109.899102] [<ffffffffa70402da>] dump_stack+0x4e/0x84
> [1935109.899821] [<ffffffffa56e522b>] bad_page+0x17b/0x210
> [1935109.900469] [<ffffffffa56e85a8>] free_pages_prepare+0xb48/0x1110
> [1935109.902127] [<ffffffffa56ee0d1>] __free_pages_ok+0x21/0x260
> [1935109.904435] [<ffffffffa56ee373>] free_compound_page+0x63/0x80
> [1935109.905614] [<ffffffffa581b51e>] free_transhuge_page+0x6e/0x80
> [1935109.906752] [<ffffffffa5709f76>] __put_compound_page+0x76/0xa0
> [1935109.907884] [<ffffffffa570a475>] release_pages+0x4d5/0x9f0
> [1935109.913027] [<ffffffffa5769bea>] tlb_flush_mmu_free+0x8a/0x120
> [1935109.913957] [<ffffffffa576f993>] unmap_page_range+0xe73/0x1460
> [1935109.915737] [<ffffffffa57700a6>] unmap_single_vma+0x126/0x2f0
> [1935109.916646] [<ffffffffa577270d>] unmap_vmas+0xdd/0x190
> [1935109.917454] [<ffffffffa5790361>] exit_mmap+0x221/0x430
> [1935109.921176] [<ffffffffa5366da1>] mmput+0xb1/0x240
> [1935109.921919] [<ffffffffa537b3b2>] do_exit+0x732/0x27c0
> [1935109.928561] [<ffffffffa537d599>] do_group_exit+0xf9/0x300
> [1935109.929786] [<ffffffffa537d7bd>] SyS_exit_group+0x1d/0x20
> [1935109.930617] [<ffffffffaf59fbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
Would it make any difference if you'll add mmap_sem protection in
exit_mmap?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Oleg Nesterov <oleg@redhat.com>, Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dmitry Vyukov <dvyukov@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: Multiple potential races on vma->vm_flags
Date: Fri, 25 Sep 2015 22:33:06 +0300 [thread overview]
Message-ID: <20150925193306.GA2200@node.dhcp.inet.fi> (raw)
In-Reply-To: <560346F2.4050507@oracle.com>
On Wed, Sep 23, 2015 at 08:42:26PM -0400, Sasha Levin wrote:
> On 09/23/2015 09:08 AM, Andrey Konovalov wrote:
> > On Wed, Sep 23, 2015 at 3:39 AM, Hugh Dickins <hughd@google.com> wrote:
> >> > This is totally untested, and one of you may quickly prove me wrong;
> >> > but I went in to fix your "Bad page state (mlocked)" by holding pte
> >> > lock across the down_read_trylock of mmap_sem in try_to_unmap_one(),
> >> > then couldn't see why it would need mmap_sem at all, given how mlock
> >> > and munlock first assert intention by setting or clearing VM_LOCKED
> >> > in vm_flags, then work their way up the vma, taking pte locks.
> >> >
> >> > Calling mlock_vma_page() under pte lock may look suspicious
> >> > at first: but what it does is similar to clear_page_mlock(),
> >> > which we regularly call under pte lock from page_remove_rmap().
> >> >
> >> > I'd rather wait to hear whether this appears to work in practice,
> >> > and whether you agree that it should work in theory, before writing
> >> > the proper description. I'd love to lose that down_read_trylock.
> > No, unfortunately it doesn't work, I still see "Bad page state (mlocked)".
> >
> > It seems that your patch doesn't fix the race from the report below, since pte
> > lock is not taken when 'vma->vm_flags &= ~VM_LOCKED;' (mlock.c:425)
> > is being executed. (Line numbers are from kernel with your patch applied.)
>
> I've fired up my HZ_10000 patch,
Can we make HZ_10000 thing into upstream? Under KERNEL_DEBUG, or
something?
> and this seems to be a real race that is
> somewhat easy to reproduce under those conditions.
>
> Here's a fresh backtrace from my VMs:
>
> [1935109.882343] BUG: Bad page state in process trinity-subchil pfn:3ca200
> [1935109.884000] page:ffffea000f288000 count:0 mapcount:0 mapping: (null) index:0x1e00 compound_mapcount: 0
> [1935109.885772] flags: 0x22fffff80144008(uptodate|head|swapbacked|mlocked)
> [1935109.887174] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [1935109.888197] bad because of flags:
> [1935109.888759] flags: 0x100000(mlocked)
> [1935109.889525] Modules linked in:
> [1935109.890165] CPU: 8 PID: 2615 Comm: trinity-subchil Not tainted 4.3.0-rc2-next-20150923-sasha-00079-gec04207-dirty #2569
> [1935109.891876] 1ffffffff6445448 00000000e5dca494 ffff8803f7657708 ffffffffa70402da
> [1935109.893504] ffffea000f288000 ffff8803f7657738 ffffffffa56e522b 022fffff80144008
> [1935109.894947] ffffea000f288020 ffffea000f288000 00000000ffffffff ffff8803f76577a8
> [1935109.896413] Call Trace:
> [1935109.899102] [<ffffffffa70402da>] dump_stack+0x4e/0x84
> [1935109.899821] [<ffffffffa56e522b>] bad_page+0x17b/0x210
> [1935109.900469] [<ffffffffa56e85a8>] free_pages_prepare+0xb48/0x1110
> [1935109.902127] [<ffffffffa56ee0d1>] __free_pages_ok+0x21/0x260
> [1935109.904435] [<ffffffffa56ee373>] free_compound_page+0x63/0x80
> [1935109.905614] [<ffffffffa581b51e>] free_transhuge_page+0x6e/0x80
> [1935109.906752] [<ffffffffa5709f76>] __put_compound_page+0x76/0xa0
> [1935109.907884] [<ffffffffa570a475>] release_pages+0x4d5/0x9f0
> [1935109.913027] [<ffffffffa5769bea>] tlb_flush_mmu_free+0x8a/0x120
> [1935109.913957] [<ffffffffa576f993>] unmap_page_range+0xe73/0x1460
> [1935109.915737] [<ffffffffa57700a6>] unmap_single_vma+0x126/0x2f0
> [1935109.916646] [<ffffffffa577270d>] unmap_vmas+0xdd/0x190
> [1935109.917454] [<ffffffffa5790361>] exit_mmap+0x221/0x430
> [1935109.921176] [<ffffffffa5366da1>] mmput+0xb1/0x240
> [1935109.921919] [<ffffffffa537b3b2>] do_exit+0x732/0x27c0
> [1935109.928561] [<ffffffffa537d599>] do_group_exit+0xf9/0x300
> [1935109.929786] [<ffffffffa537d7bd>] SyS_exit_group+0x1d/0x20
> [1935109.930617] [<ffffffffaf59fbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
Would it make any difference if you'll add mmap_sem protection in
exit_mmap?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2015-09-25 19:33 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAAeHK+z8o96YeRF-fQXmoApOKXa0b9pWsQHDeP=5GC_hMTuoDg@mail.gmail.com>
[not found] ` <55EC9221.4040603@oracle.com>
2015-09-07 11:40 ` Multiple potential races on vma->vm_flags Kirill A. Shutemov
2015-09-07 11:40 ` Kirill A. Shutemov
2015-09-09 15:27 ` Vlastimil Babka
2015-09-09 15:27 ` Vlastimil Babka
2015-09-09 16:01 ` Kirill A. Shutemov
2015-09-09 16:01 ` Kirill A. Shutemov
2015-09-10 0:58 ` Sasha Levin
2015-09-10 0:58 ` Sasha Levin
2015-09-10 8:36 ` Kirill A. Shutemov
2015-09-10 8:36 ` Kirill A. Shutemov
2015-09-10 13:27 ` Andrey Konovalov
2015-09-10 13:27 ` Andrey Konovalov
2015-09-11 10:39 ` Kirill A. Shutemov
2015-09-11 10:39 ` Kirill A. Shutemov
2015-09-11 15:29 ` Vlastimil Babka
2015-09-11 15:29 ` Vlastimil Babka
2015-09-11 16:08 ` Vlastimil Babka
2015-09-11 16:08 ` Vlastimil Babka
2015-09-12 1:27 ` Hugh Dickins
2015-09-12 1:27 ` Hugh Dickins
2015-09-14 10:16 ` Kirill A. Shutemov
2015-09-14 10:16 ` Kirill A. Shutemov
2015-09-15 17:36 ` Sasha Levin
2015-09-15 17:36 ` Sasha Levin
2015-09-15 19:01 ` Kirill A. Shutemov
2015-09-15 19:01 ` Kirill A. Shutemov
2015-09-22 16:47 ` Andrey Konovalov
2015-09-22 16:47 ` Andrey Konovalov
2015-09-22 18:54 ` Hugh Dickins
2015-09-22 18:54 ` Hugh Dickins
2015-09-22 19:45 ` Andrey Konovalov
2015-09-22 19:45 ` Andrey Konovalov
2015-09-23 1:39 ` Hugh Dickins
2015-09-23 1:39 ` Hugh Dickins
2015-09-23 11:46 ` Kirill A. Shutemov
2015-09-23 11:46 ` Kirill A. Shutemov
2015-09-23 22:58 ` Davidlohr Bueso
2015-09-23 22:58 ` Davidlohr Bueso
2015-09-23 13:08 ` Andrey Konovalov
2015-09-23 13:08 ` Andrey Konovalov
2015-09-24 0:42 ` Sasha Levin
2015-09-24 0:42 ` Sasha Levin
2015-09-25 19:33 ` Kirill A. Shutemov [this message]
2015-09-25 19:33 ` Kirill A. Shutemov
2015-10-13 22:38 ` Hugh Dickins
2015-10-13 22:38 ` Hugh Dickins
2015-10-13 22:33 ` Hugh Dickins
2015-10-13 22:33 ` Hugh Dickins
2015-10-15 16:58 ` Andrey Konovalov
2015-10-15 16:58 ` Andrey Konovalov
2015-09-23 21:30 ` Sasha Levin
2015-09-23 21:30 ` Sasha Levin
2015-09-25 14:26 ` Oleg Nesterov
2015-09-25 14:26 ` Oleg Nesterov
2015-09-24 13:11 ` Oleg Nesterov
2015-09-24 13:11 ` Oleg Nesterov
2015-09-24 16:27 ` Sasha Levin
2015-09-24 16:27 ` Sasha Levin
2015-09-24 17:26 ` Oleg Nesterov
2015-09-24 17:26 ` Oleg Nesterov
2015-09-24 18:52 ` Andrey Ryabinin
2015-09-24 18:52 ` Andrey Ryabinin
2015-09-24 19:01 ` Sasha Levin
2015-09-24 19:01 ` Sasha Levin
2015-09-25 12:41 ` Oleg Nesterov
2015-09-25 12:41 ` Oleg Nesterov
2015-09-23 15:34 ` Oleg Nesterov
2015-09-23 15:34 ` Oleg Nesterov
2015-09-23 15:38 ` Oleg Nesterov
2015-09-23 15:38 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150925193306.GA2200@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@google.com \
--cc=dave@stgolabs.net \
--cc=dvyukov@google.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.