linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>, Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Oleg Nesterov <oleg@redhat.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: Multiple potential races on vma->vm_flags
Date: Wed, 23 Sep 2015 14:46:37 +0300	[thread overview]
Message-ID: <20150923114636.GB25020@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.LSU.2.11.1509221831570.19790@eggly.anvils>

On Tue, Sep 22, 2015 at 06:39:52PM -0700, Hugh Dickins wrote:
> On Tue, 22 Sep 2015, Andrey Konovalov wrote:
> > On Tue, Sep 22, 2015 at 8:54 PM, Hugh Dickins <hughd@google.com> wrote:
> > > On Tue, 22 Sep 2015, Andrey Konovalov wrote:
> > >> If anybody comes up with a patch to fix the original issue I easily
> > >> can test it, since I'm hitting "BUG: Bad page state" in a second when
> > >> fuzzing with KTSAN and Trinity.
> > >
> > > This "BUG: Bad page state" sounds more serious, but I cannot track down
> > > your report of it: please repost - thanks - though on seeing it, I may
> > > well end up with no ideas.
> > 
> > The report is below.
> 
> Thanks.
> 
> > 
> > I get it after a few seconds of running Trinity on a kernel with KTSAN
> > and targeting mlock, munlock and madvise syscalls.
> > Sasha also observed a very similar crash a while ago
> > (https://lkml.org/lkml/2014/11/6/1055).
> > I didn't manage to reproduce this in a kernel build without KTSAN though.
> > The idea was that data races KTSAN reports might be the explanation of
> > these crashes.
> > 
> > BUG: Bad page state in process trinity-c15  pfn:281999
> > page:ffffea000a066640 count:0 mapcount:0 mapping:          (null) index:0xd
> > flags: 0x20000000028000c(referenced|uptodate|swapbacked|mlocked)
> > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> > bad because of flags:
> > flags: 0x200000(mlocked)
> > Modules linked in:
> > CPU: 3 PID: 11190 Comm: trinity-c15 Not tainted 4.2.0-tsan #1295
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >  ffffffff821c3b70 0000000000000000 0000000100004741 ffff8800b857f948
> >  ffffffff81e9926c 0000000000000003 ffffea000a066640 ffff8800b857f978
> >  ffffffff811ce045 ffffffff821c3b70 ffffea000a066640 0000000000000001
> > Call Trace:
> >  [<     inline     >] __dump_stack lib/dump_stack.c:15
> >  [<ffffffff81e9926c>] dump_stack+0x63/0x81 lib/dump_stack.c:50
> >  [<ffffffff811ce045>] bad_page+0x115/0x1a0 mm/page_alloc.c:409
> >  [<     inline     >] free_pages_check mm/page_alloc.c:731
> >  [<ffffffff811cf3b8>] free_pages_prepare+0x2f8/0x330 mm/page_alloc.c:922
> >  [<ffffffff811d2911>] free_hot_cold_page+0x51/0x2b0 mm/page_alloc.c:1908
> >  [<ffffffff811d2bcf>] free_hot_cold_page_list+0x5f/0x100
> > mm/page_alloc.c:1956 (discriminator 3)
> >  [<ffffffff811dd9c1>] release_pages+0x151/0x300 mm/swap.c:967
> >  [<ffffffff811de723>] __pagevec_release+0x43/0x60 mm/swap.c:984
> >  [<     inline     >] pagevec_release include/linux/pagevec.h:69
> >  [<ffffffff811ef36a>] shmem_undo_range+0x4fa/0x9d0 mm/shmem.c:446
> >  [<ffffffff811ef86f>] shmem_truncate_range+0x2f/0x60 mm/shmem.c:540
> >  [<ffffffff811f15d5>] shmem_fallocate+0x555/0x6e0 mm/shmem.c:2086
> >  [<ffffffff812568d0>] vfs_fallocate+0x1e0/0x310 fs/open.c:303
> >  [<     inline     >] madvise_remove mm/madvise.c:326
> >  [<     inline     >] madvise_vma mm/madvise.c:378
> >  [<     inline     >] SYSC_madvise mm/madvise.c:528
> >  [<ffffffff81225548>] SyS_madvise+0x378/0x760 mm/madvise.c:459
> >  [<ffffffff8124ef36>] ? kt_atomic64_store+0x76/0x130 mm/ktsan/sync_atomic.c:161
> >  [<ffffffff81ea8691>] entry_SYSCALL_64_fastpath+0x31/0x95
> > arch/x86/entry/entry_64.S:188
> > Disabling lock debugging due to kernel taint
> 
> This is totally untested, and one of you may quickly prove me wrong;
> but I went in to fix your "Bad page state (mlocked)" by holding pte
> lock across the down_read_trylock of mmap_sem in try_to_unmap_one(),
> then couldn't see why it would need mmap_sem at all, given how mlock
> and munlock first assert intention by setting or clearing VM_LOCKED
> in vm_flags, then work their way up the vma, taking pte locks.
> 
> Calling mlock_vma_page() under pte lock may look suspicious
> at first: but what it does is similar to clear_page_mlock(),
> which we regularly call under pte lock from page_remove_rmap().

Indeed. Looks fishy. But probably will work.

That was not obvious for me what makes clearing VM_LOCKED visible in
try_to_unmap_one() without mmap_sem.

After looking some more it seems we would rely on page_check_address()
in try_to_unmap_one() having acquire semantics and follow_page_mask() in
munlock_vma_pages_range() having release semantics.

But I would prefer to have something more explicit.

That's mess.

> 
> I'd rather wait to hear whether this appears to work in practice,
> and whether you agree that it should work in theory, before writing
> the proper description.  I'd love to lose that down_read_trylock.
> 
> You mention how Sasha hit the "Bad page state (mlocked)" back in
> November: that was one of the reasons we reverted Davidlohr's
> i_mmap_lock_read to i_mmap_lock_write in unmap_mapping_range(),
> without understanding why it was needed.  Yes, it would lock out
> a concurrent try_to_unmap(), whose setting of PageMlocked was not
> sufficiently serialized by the down_read_trylock of mmap_sem.
> 
> But I don't remember the other reasons for that revert (and
> haven't looked very hard as yet): anyone else remember?

I hoped Davidlohr will come back with something after the revert, but it
never happend. I think the reverted patch was responsible for most of
scalability boost from rwsem for i_mmap_lock...

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-23 11:46 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAAeHK+z8o96YeRF-fQXmoApOKXa0b9pWsQHDeP=5GC_hMTuoDg@mail.gmail.com>
     [not found] ` <55EC9221.4040603@oracle.com>
2015-09-07 11:40   ` Multiple potential races on vma->vm_flags Kirill A. Shutemov
2015-09-09 15:27     ` Vlastimil Babka
2015-09-09 16:01       ` Kirill A. Shutemov
2015-09-10  0:58     ` Sasha Levin
2015-09-10  8:36       ` Kirill A. Shutemov
2015-09-10 13:27         ` Andrey Konovalov
2015-09-11 10:39           ` Kirill A. Shutemov
2015-09-11 15:29             ` Vlastimil Babka
2015-09-11 16:08               ` Vlastimil Babka
2015-09-12  1:27             ` Hugh Dickins
2015-09-14 10:16               ` Kirill A. Shutemov
2015-09-15 17:36               ` Sasha Levin
2015-09-15 19:01                 ` Kirill A. Shutemov
2015-09-22 16:47                   ` Andrey Konovalov
2015-09-22 18:54                     ` Hugh Dickins
2015-09-22 19:45                       ` Andrey Konovalov
2015-09-23  1:39                         ` Hugh Dickins
2015-09-23 11:46                           ` Kirill A. Shutemov [this message]
2015-09-23 22:58                             ` Davidlohr Bueso
2015-09-23 13:08                           ` Andrey Konovalov
2015-09-24  0:42                             ` Sasha Levin
2015-09-25 19:33                               ` Kirill A. Shutemov
2015-10-13 22:38                               ` Hugh Dickins
2015-10-13 22:33                             ` Hugh Dickins
2015-10-15 16:58                               ` Andrey Konovalov
2015-09-23 21:30                 ` Sasha Levin
2015-09-25 14:26                   ` Oleg Nesterov
2015-09-24 13:11                 ` Oleg Nesterov
2015-09-24 16:27                   ` Sasha Levin
2015-09-24 17:26                     ` Oleg Nesterov
2015-09-24 18:52                       ` Andrey Ryabinin
2015-09-24 19:01                         ` Sasha Levin
2015-09-25 12:41                         ` Oleg Nesterov
2015-09-23 15:34             ` Oleg Nesterov
2015-09-23 15:38               ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150923114636.GB25020@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=dave@stgolabs.net \
    --cc=dvyukov@google.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).