Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Andrey Konovalov <andreyknvl@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
Date: Mon, 19 Oct 2015 23:25:56 +0200	[thread overview]
Message-ID: <56255FE4.5070609@suse.cz> (raw)
In-Reply-To: <20151019201003.GA18106@node.shutemov.name>

On 10/19/2015 10:10 PM, Kirill A. Shutemov wrote:
> On Mon, Oct 19, 2015 at 12:53:17PM -0700, Hugh Dickins wrote:
>> On Mon, 19 Oct 2015, Kirill A. Shutemov wrote:
>>> On Mon, Oct 19, 2015 at 04:20:05AM -0700, Hugh Dickins wrote:
>>>>> Note how munlock_vma_pages_range() via __munlock_pagevec() does
>>>>> TestClearPageMlocked() without (or "between") pte or page lock. But the pte
>>>>> lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
>>>>> try_to_unmap_one...
>>>>
>>>> A mind-trick I found helpful for understanding the barriers here, is
>>>> to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
>>>> every time it takes the pte lock: it does not actually do that, it
>>>> doesn't need to of course; but that does help show that ~VM_LOCKED
>>>> must be visible to anyone getting that pte lock afterwards.
>>>
>>> How can you make sure that any other codepath that changes vm_flags would
>>> not make (vm_flags & VM_LOCKED) temporary true while dealing with other
>>> flags?
>>>
>>> Compiler can convert things like "vma->vm_flags &= ~VM_FOO;" to whatever
>>> it wants as long as end result is the same. It's very unlikely that it
>>> will generate code to set all bits to one and then clear all which should
>>> be cleared, but it's theoretically possible.

I think Linus would be very vocal about such compiler implementation. 
And I can imagine a lot of things in the kernel would break by those 
spuriously set bits. There must be a lot of stuff that's "theoretically 
possible within the standard" but no sane compiler does. I believe even 
compiler guys are not that insane. IIRC we've seen bugs like this and 
they were always treated as bugs and fixed.
The example I've heard often used for theoretically possible but insane 
stuff is that the compiler could make code randomly write over anything 
that's not volatile, as long as it restored the original values upon 
e.g. returning from the function. That just can't happen.

>> I think that's in the realm of the fanciful.  But yes, it quite often
>> turns out that what I think is fanciful, is something that Paul has
>> heard compiler writers say they want to do, even if he has managed
>> to discourage them from doing it so far.
>
> Paul always has links to pdfs with this kind of horror. ;)
>
>> But more to the point, when you write of "end result", the compiler
>> would have no idea that releasing mmap_sem is the point at which
>> end result must be established:

Isn't releasing a lock one of those "release" barriers where previously
issued writes must become visible before the unlock takes place?

>> wouldn't it have to establish end
>> result before the next unlock operation, and before the end of the
>> compilation unit?

Now I'm lost in what you mean.

>> pte unlock being the relevant unlock operation
>> in this case, at least with my patch if not without.

Hm so IIUC Kirill's point is that try_to_unmap_one() is checking 
VM_LOCKED under pte lock, but somebody else might be modifying vm_flags 
under mmap_sem, and thus we have no protection.

>>>
>>> I think we need to have at lease WRITE_ONCE() everywhere we update
>>> vm_flags and READ_ONCE() where we read it without mmap_sem taken.

It wouldn't hurt to check if seeing a stale value or using non-atomic 
RMW can be a problem somewhere. In this case it's testing, not changing, 
so RMW is not an issue. But the check shouldn't consider arbitrary 
changes made by a potentially crazy compiler.

>> Not a series I'll embark upon myself,
>> and the patch at hand doesn't make things worse.
>
> I think it does.

So what's the alternative? Hm could we keep the trylock on mmap_sem 
under pte lock? The ordering is wrong, but it's a trylock, so no danger 
of deadlock?

> The patch changes locking rules for ->vm_flags without proper preparation
> and documentation. It will strike back one day.
> I know we have few other cases when we access ->vm_flags without mmap_sem,
> but this doesn't justify introducing one more potentially weak codepath.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-10-19 21:26 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
2015-10-19  9:16   ` Kirill A. Shutemov
2015-11-05 17:29   ` Vlastimil Babka
2015-10-19  4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
2015-10-19  6:23   ` Vlastimil Babka
2015-10-19 11:20     ` Hugh Dickins
2015-10-19 12:33       ` Vlastimil Babka
2015-10-19 19:17         ` Hugh Dickins
2015-10-19 20:52           ` Vlastimil Babka
2015-10-19 13:13       ` Kirill A. Shutemov
2015-10-19 19:53         ` Hugh Dickins
2015-10-19 20:10           ` Kirill A. Shutemov
2015-10-19 21:25             ` Vlastimil Babka [this message]
2015-10-19 21:53               ` Kirill A. Shutemov
2015-10-21 23:26               ` Hugh Dickins
2015-10-29 18:49                 ` [PATCH v2 " Hugh Dickins
2015-11-05 17:50                   ` Vlastimil Babka
2015-10-19 23:30         ` [PATCH " Davidlohr Bueso
2015-10-19  4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
2015-11-05 18:18   ` Vlastimil Babka
2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
2015-10-19 12:35   ` Johannes Weiner
2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
2015-12-02 10:17     ` Michal Hocko
2015-12-02 16:57     ` Johannes Weiner
2015-10-19  4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
2015-10-21 17:53   ` Rafael Aquini
2015-10-19  4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
2015-11-05 18:31   ` Vlastimil Babka
2015-11-08 21:17     ` Hugh Dickins
2015-10-19  4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
2015-10-21 17:54   ` Rafael Aquini
2015-10-19  5:01 ` [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level Hugh Dickins
2015-10-19  5:03 ` [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow Hugh Dickins
2015-10-19  5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
2015-10-22 22:35   ` Cyrill Gorcunov
2015-10-19  5:07 ` [PATCH 11/12] mm: page migration avoid touching newpage until no going back Hugh Dickins
2015-10-19  5:11 ` [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56255FE4.5070609@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=dvyukov@google.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).