Re: [PATCH v2 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Christoph Lameter <cl@linux.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Andrey Konovalov <andreyknvl@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
Date: Thu, 5 Nov 2015 18:50:29 +0100	[thread overview]
Message-ID: <563B96E5.4070600@suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1510291147150.3450@eggly.anvils>

On 10/29/2015 07:49 PM, Hugh Dickins wrote:
> KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
> mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
> found mapped in a VM_LOCKED vma) is ineffective against races with
> exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
> tearing down an mm.
> 
> But that's okay, those races are benign; and although we've believed for
> years in that ugly down_read_trylock(), it's unsuitable for the job, and
> frustrates the good intention of setting PageMlocked when it fails.
> 
> It just doesn't matter if here we read vm_flags an instant before or after
> a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
> syscalls (or exit) work their way up the address space (taking pt locks
> after updating vm_flags) to establish the final state.
> 
> We do still need to be careful never to mark a page Mlocked (hence
> unevictable) by any race that will not be corrected shortly after.  The
> page lock protects from many of the races, but not all (a page is not
> necessarily locked when it's unmapped).  But the pte lock we just dropped
> is good to cover the rest (and serializes even with
> munlock_vma_pages_all(), so no special barriers required): now hold on to
> the pte lock while calling mlock_vma_page().  Is that lock ordering safe? 
> Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
> calls the complementary clear_page_mlock().
> 
> This fixes the following case (though not a case which anyone has
> complained of), which mmap_sem did not: truncation's preliminary
> unmap_mapping_range() is supposed to remove even the anonymous COWs of
> filecache pages, and that might race with try_to_unmap_one() on a
> VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
> zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
> freed.  The pte lock protects against this.
> 
> You could say that it also protects against the more ordinary case, racing
> with the preliminary unmapping of a filecache page itself: but in our
> current tree, that's independently protected by i_mmap_rwsem; and that
> race would be why "Bad page state (mlocked)" was seen before commit
> 48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").
> 
> Vlastimil Babka points out another race which this patch protects against.
> try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
> moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
> leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
> is ineffective because exit_mmap() does not hold it; page lock ineffective
> because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
> is effective because __munlock_pagevec_fill() takes it to get the page,
> after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
> 
> Kirill Shutemov points out that if the compiler chooses to implement a
> "vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
> with an intermediate store of unrelated bits set, since I'm here foregoing
> its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
> a spurious VM_LOCKED in vm_flags, and make the wrong decision.  This does
> not appear to be an immediate problem, but we may want to define vm_flags
> accessors in future, to guard against such a possibility.
> 
> While we're here, make a related optimization in try_to_munmap_one(): if
> it's doing TTU_MUNLOCK, then there's no point at all in descending the
> page tables and getting the pt lock, unless the vma is VM_LOCKED.  Yes,
> that can change racily, but it can change racily even without the
> optimization: it's not critical.  Far better not to waste time here.
> 
> Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
> on this occasion, but that's probably the sensible next step - with a
> rename, given that try_to_munlock()'s business is to try to set Mlocked.
> 
> Updated the unevictable-lru Documentation, to remove its reference to mmap
> semaphore, but found a few more updates needed in just that area.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-11-05 17:50 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
2015-10-19  9:16   ` Kirill A. Shutemov
2015-11-05 17:29   ` Vlastimil Babka
2015-10-19  4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
2015-10-19  6:23   ` Vlastimil Babka
2015-10-19 11:20     ` Hugh Dickins
2015-10-19 12:33       ` Vlastimil Babka
2015-10-19 19:17         ` Hugh Dickins
2015-10-19 20:52           ` Vlastimil Babka
2015-10-19 13:13       ` Kirill A. Shutemov
2015-10-19 19:53         ` Hugh Dickins
2015-10-19 20:10           ` Kirill A. Shutemov
2015-10-19 21:25             ` Vlastimil Babka
2015-10-19 21:53               ` Kirill A. Shutemov
2015-10-21 23:26               ` Hugh Dickins
2015-10-29 18:49                 ` [PATCH v2 " Hugh Dickins
2015-11-05 17:50                   ` Vlastimil Babka [this message]
2015-10-19 23:30         ` [PATCH " Davidlohr Bueso
2015-10-19  4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
2015-11-05 18:18   ` Vlastimil Babka
2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
2015-10-19 12:35   ` Johannes Weiner
2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
2015-12-02 10:17     ` Michal Hocko
2015-12-02 16:57     ` Johannes Weiner
2015-10-19  4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
2015-10-21 17:53   ` Rafael Aquini
2015-10-19  4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
2015-11-05 18:31   ` Vlastimil Babka
2015-11-08 21:17     ` Hugh Dickins
2015-10-19  4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
2015-10-21 17:54   ` Rafael Aquini
2015-10-19  5:01 ` [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level Hugh Dickins
2015-10-19  5:03 ` [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow Hugh Dickins
2015-10-19  5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
2015-10-22 22:35   ` Cyrill Gorcunov
2015-10-19  5:07 ` [PATCH 11/12] mm: page migration avoid touching newpage until no going back Hugh Dickins
2015-10-19  5:11 ` [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563B96E5.4070600@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=dvyukov@google.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.