All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <sasha.levin@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Michel Lespinasse <walken@google.com>,
	Bob Liu <bob.liu@oracle.com>, Nick Piggin <npiggin@suse.de>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs
Date: Sat, 04 Jan 2014 09:09:14 +0100	[thread overview]
Message-ID: <52C7C1AA.2070701@suse.cz> (raw)
In-Reply-To: <CA+55aFzq1iQqddGo-m=vutwMYn5CPf65Ergov5svKR4AWC3rUQ@mail.gmail.com>

On 01/04/2014 01:18 AM, Linus Torvalds wrote:
> On Fri, Jan 3, 2014 at 3:36 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> I'm for going with the removal of BUG_ON. The TestSetPageMlocked should provide enough
>> race protection.
> 
> Maybe. But dammit, that's subtle, and I don't think you're even right.
> 
> It basically depends on mlock_vma_page() and munlock_vma_page() being
> able to run CONCURRENTLY on the same page. In particular, you could
> have a mlock_vma_page() set the bit on one CPU, and munlock_vma_page()
> immediately clearing it on another, and then the rest of those
> functions could run with a totally arbitrary interleaving when working
> with the exact same page.
> 
> They both do basically
> 
>     if (!isolate_lru_page(page))
>         putback_lru_page(page);
> 
> but one or the other would randomly win the race (it's internally
> protected by the lru lock), and *if* the munlock_vma_page() wins it,
> it would also do
> 
>     try_to_munlock(page);
> 
> but if mlock_vma_page() wins it, that wouldn't happen. That looks
> entirely broken - you end up with the PageMlocked bit clear, but
> try_to_munlock() was never called on that page, because
> mlock_vma_page() got to the page isolation before the "subsequent"
> munlock_vma_page().

I got the impression (see e.g. munlock_vma_page() comments) that the
whole thing is designed with this possibility in mind. isolate_lru_page()
may fail (presumably also in other scenarios than this) and if
try_to_munlock() was not called here, then yes the page might lose the
PageMlocked bit and go to LRU instead of inevictable list, but
try_to_unmap() should catch and fix this. That would also explain why
mlock_vma_page() is called from try_to_unmap_cluster().
So if I understand correctly, PageMlocked bit is not something that has
to be correctly set 100% of the time, but when it's set correctly most
of the time, then most of these pages will go to inevictable list and spare
vmscan's time.

> And this is very much what the page lock serialization would prevent.
> So no, the PageMlocked in *no* way gives serialization. It's an atomic
> bit op, yes, but that only "serializes" in one direction, not when you
> can have a mix of bit setting and clearing.
> 
> So quite frankly, I think you're wrong. The BUG_ON() is correct, or at
> least enforces some kind of ordering. And try_to_unmap_cluster() is
> just broken in calling that without the page being locked. That's my
> opinion. There may be some *other* reason why it all happens to work,
> but no, "TestSetPageMlocked should provide enough race protection" is
> simply not true, and even if it were, it's way too subtle and odd to
> be a good rule.

Right, it was stupid of me to write such strong statement without any
details. I wanted to review that patch when back at work next week, but
since it came up now, I just wanted to point out that it's in the pipeline
for this bug.

> So I really object to just removing the BUG_ON(). Not with a *lot*
> more explanation as to why these kinds of issues wouldn't matter.
> 
>                  Linus
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <sasha.levin@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Michel Lespinasse <walken@google.com>,
	Bob Liu <bob.liu@oracle.com>, Nick Piggin <npiggin@suse.de>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs
Date: Sat, 04 Jan 2014 09:09:14 +0100	[thread overview]
Message-ID: <52C7C1AA.2070701@suse.cz> (raw)
In-Reply-To: <CA+55aFzq1iQqddGo-m=vutwMYn5CPf65Ergov5svKR4AWC3rUQ@mail.gmail.com>

On 01/04/2014 01:18 AM, Linus Torvalds wrote:
> On Fri, Jan 3, 2014 at 3:36 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> I'm for going with the removal of BUG_ON. The TestSetPageMlocked should provide enough
>> race protection.
> 
> Maybe. But dammit, that's subtle, and I don't think you're even right.
> 
> It basically depends on mlock_vma_page() and munlock_vma_page() being
> able to run CONCURRENTLY on the same page. In particular, you could
> have a mlock_vma_page() set the bit on one CPU, and munlock_vma_page()
> immediately clearing it on another, and then the rest of those
> functions could run with a totally arbitrary interleaving when working
> with the exact same page.
> 
> They both do basically
> 
>     if (!isolate_lru_page(page))
>         putback_lru_page(page);
> 
> but one or the other would randomly win the race (it's internally
> protected by the lru lock), and *if* the munlock_vma_page() wins it,
> it would also do
> 
>     try_to_munlock(page);
> 
> but if mlock_vma_page() wins it, that wouldn't happen. That looks
> entirely broken - you end up with the PageMlocked bit clear, but
> try_to_munlock() was never called on that page, because
> mlock_vma_page() got to the page isolation before the "subsequent"
> munlock_vma_page().

I got the impression (see e.g. munlock_vma_page() comments) that the
whole thing is designed with this possibility in mind. isolate_lru_page()
may fail (presumably also in other scenarios than this) and if
try_to_munlock() was not called here, then yes the page might lose the
PageMlocked bit and go to LRU instead of inevictable list, but
try_to_unmap() should catch and fix this. That would also explain why
mlock_vma_page() is called from try_to_unmap_cluster().
So if I understand correctly, PageMlocked bit is not something that has
to be correctly set 100% of the time, but when it's set correctly most
of the time, then most of these pages will go to inevictable list and spare
vmscan's time.

> And this is very much what the page lock serialization would prevent.
> So no, the PageMlocked in *no* way gives serialization. It's an atomic
> bit op, yes, but that only "serializes" in one direction, not when you
> can have a mix of bit setting and clearing.
> 
> So quite frankly, I think you're wrong. The BUG_ON() is correct, or at
> least enforces some kind of ordering. And try_to_unmap_cluster() is
> just broken in calling that without the page being locked. That's my
> opinion. There may be some *other* reason why it all happens to work,
> but no, "TestSetPageMlocked should provide enough race protection" is
> simply not true, and even if it were, it's way too subtle and odd to
> be a good rule.

Right, it was stupid of me to write such strong statement without any
details. I wanted to review that patch when back at work next week, but
since it came up now, I just wanted to point out that it's in the pipeline
for this bug.

> So I really object to just removing the BUG_ON(). Not with a *lot*
> more explanation as to why these kinds of issues wouldn't matter.
> 
>                  Linus
> 


  reply	other threads:[~2014-01-04  8:09 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17  8:05 [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs Wanpeng Li
2013-12-17  8:05 ` Wanpeng Li
2013-12-18  3:14 ` Wanpeng Li
2013-12-18  3:16 ` Wanpeng Li
2013-12-18  3:23   ` Wanpeng Li
     [not found]   ` <20131218032329.GA6044@hacker.(null)>
2013-12-18  3:32     ` Sasha Levin
2013-12-18  3:32       ` Sasha Levin
2013-12-18  4:12       ` Wanpeng Li
2013-12-18  9:11         ` Vlastimil Babka
2013-12-18  9:11           ` Vlastimil Babka
2013-12-18  9:23           ` Wanpeng Li
     [not found]           ` <52b1699f.87293c0a.75d1.34d3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-12-18 21:43             ` Andrew Morton
2013-12-18 21:43               ` Andrew Morton
2014-01-03 20:17               ` Sasha Levin
2014-01-03 20:17                 ` Sasha Levin
2014-01-03 20:52                 ` Linus Torvalds
2014-01-03 20:52                   ` Linus Torvalds
2014-01-03 23:36                   ` Vlastimil Babka
2014-01-03 23:36                     ` Vlastimil Babka
2014-01-03 23:56                     ` Andrew Morton
2014-01-03 23:56                       ` Andrew Morton
2014-01-04  3:03                       ` Sasha Levin
2014-01-04  3:03                         ` Sasha Levin
2014-01-04  0:18                     ` Linus Torvalds
2014-01-04  0:18                       ` Linus Torvalds
2014-01-04  8:09                       ` Vlastimil Babka [this message]
2014-01-04  8:09                         ` Vlastimil Babka
2014-01-05  0:27                       ` Wanpeng Li
2014-01-06 16:47                       ` Motohiro Kosaki
2014-01-06 16:47                         ` Motohiro Kosaki
2014-01-06 22:01                         ` KOSAKI Motohiro
2014-01-06 22:01                           ` KOSAKI Motohiro
2014-01-07  5:27                         ` Wanpeng Li
2014-01-07 15:01                         ` Vlastimil Babka
2014-01-07 15:01                           ` Vlastimil Babka
2014-01-08  1:06                           ` Bob Liu
2014-01-08  1:06                             ` Bob Liu
2014-01-08  2:44                           ` Linus Torvalds
2014-01-08  2:44                             ` Linus Torvalds
2014-01-10 17:48                           ` Motohiro Kosaki
2014-01-10 17:48                             ` Motohiro Kosaki
2014-01-13 14:03                             ` Vlastimil Babka
2014-01-13 14:03                               ` Vlastimil Babka
2014-01-14 11:05                               ` Vlastimil Babka
2014-01-14 11:05                                 ` Vlastimil Babka
2014-01-04  3:31                   ` Bob Liu
2014-01-04  3:31                     ` Bob Liu
2013-12-18  8:54 ` Vlastimil Babka
2013-12-18  8:54   ` Vlastimil Babka
2013-12-18  9:01   ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52C7C1AA.2070701@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bob.liu@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=sasha.levin@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.