Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>, linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	liubo <liubo254@huawei.com>, Peter Xu <peterx@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Hugh Dickins <hughd@google.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	stable@vger.kernel.org
Subject: Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs
Date: Fri, 28 Jul 2023 11:08:26 +0200	[thread overview]
Message-ID: <9de80e22-e89f-2760-34f4-61be5f8fd39c@redhat.com> (raw)
In-Reply-To: <55c92738-e402-4657-3d46-162ad2c09d68@nvidia.com>

On 28.07.23 04:30, John Hubbard wrote:
> On 7/27/23 14:28, David Hildenbrand wrote:
>> We accidentally enforced PROT_NONE PTE/PMD permission checks for
>> follow_page() like we do for get_user_pages() and friends. That was
>> undesired, because follow_page() is usually only used to lookup a currently
>> mapped page, not to actually access it. Further, follow_page() does not
>> actually trigger fault handling, but instead simply fails.
> 
> I see that follow_page() is also completely undocumented. And that
> reduces us to deducing how it should be used...these things that
> change follow_page()'s behavior maybe should have a go at documenting
> it too, perhaps.

I can certainly be motivated to do that. :)

> 
>>
>> Let's restore that behavior by conditionally setting FOLL_FORCE if
>> FOLL_WRITE is not set. This way, for example KSM and migration code will
>> no longer fail on PROT_NONE mapped PTEs/PMDS.
>>
>> Handling this internally doesn't require us to add any new FOLL_FORCE
>> usage outside of GUP code.
>>
>> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
>> permission checks like in check_vma_flags(), so especially
>> FOLL_FORCE|FOLL_WRITE would be dodgy.
>>
>> This issue was identified by code inspection. We'll add some
>> documentation regarding FOLL_FORCE next.
>>
>> Reported-by: Peter Xu <peterx@redhat.com>
>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>    mm/gup.c | 10 +++++++++-
>>    1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 2493ffa10f4b..da9a5cc096ac 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>>    	if (vma_is_secretmem(vma))
>>    		return NULL;
>>    
>> -	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
>> +	if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
>>    		return NULL;
> 
> This is not a super happy situation: follow_page() is now prohibited
> (see above: we should document that interface) from passing in
> FOLL_FORCE...

I guess you saw my patch #4.

If you take a look at the existing callers (that are fortunately very 
limited), you'll see that nobody cares.

Most of the FOLL flags don't make any sense for follow_page(), and 
limiting further (ab)use is at least to me very appealing.

> 
>>    
>> +	/*
>> +	 * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
>> +	 * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
>> +	 * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
>> +	 */
>> +	if (!(foll_flags & FOLL_WRITE))
>> +		foll_flags |= FOLL_FORCE;
>> +
> 
> ...but then we set it anyway, for special cases. It's awkward because
> FOLL_FORCE is not an "internal to gup" flag (yet?).
> 
> I don't yet have suggestions, other than:
> 
> 1) Yes, the FOLL_NUMA made things bad.
> 
> 2) And they are still very confusing, especially the new use of
>      FOLL_FORCE.
> 
> ...I'll try to let this soak in and maybe recommend something
> in a more productive way. :)

What I can offer that might be very appealing is the following:

Get rid of the flags parameter for follow_page() *completely*. Yes, then 
we can even rename FOLL_ to something reasonable in the context where it 
is nowadays used ;)


Internally, we'll then set

FOLL_GET | FOLL_DUMP | FOLL_FORCE

and document exactly what this functions does. Any user that needs 
something different should just look into using get_user_pages() instead.

I can prototype that on top of this work easily.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2023-07-28  9:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 21:28 [PATCH v1 0/4] smaps / mm/gup: fix gup_can_follow_protnone fallout David Hildenbrand
2023-07-27 21:28 ` [PATCH v1 1/4] smaps: Fix the abnormal memory statistics obtained through /proc/pid/smaps David Hildenbrand
2023-07-27 21:28 ` [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs David Hildenbrand
2023-07-28  2:30   ` John Hubbard
2023-07-28  9:08     ` David Hildenbrand [this message]
2023-07-28 10:12       ` David Hildenbrand
2023-07-27 21:28 ` [PATCH v1 3/4] smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd() David Hildenbrand
2023-07-27 21:28 ` [PATCH v1 4/4] mm/gup: document FOLL_FORCE behavior David Hildenbrand
2023-07-28 16:18 ` [PATCH v1 0/4] smaps / mm/gup: fix gup_can_follow_protnone fallout Linus Torvalds
2023-07-28 17:30   ` David Hildenbrand
2023-07-28 17:54     ` David Hildenbrand
2023-07-28 19:40     ` David Hildenbrand
2023-07-28 19:50       ` Peter Xu
2023-07-28 20:00         ` David Hildenbrand
2023-08-02 10:24     ` Mel Gorman
2023-07-28 19:39   ` Peter Xu
2023-07-28 19:52     ` David Hildenbrand
2023-07-28 20:23     ` Linus Torvalds
2023-07-28 20:33       ` David Hildenbrand
2023-07-28 20:50         ` Linus Torvalds
2023-07-28 21:02           ` David Hildenbrand
2023-07-28 21:20             ` Peter Xu
2023-07-28 21:31               ` David Hildenbrand
2023-07-28 22:14                 ` Jason Gunthorpe
2023-07-31 16:01                   ` Peter Xu
2023-07-28 21:32               ` John Hubbard
2023-07-28 21:49                 ` Peter Xu
2023-07-28 22:00                   ` John Hubbard
2023-07-31 16:05                     ` Peter Xu
     [not found]   ` <412bb30f-0417-802c-3fc4-a4e9d5891c5d@redhat.com>
2023-07-29  9:35     ` David Hildenbrand
2023-07-31 16:10       ` Peter Xu
2023-07-31 16:20         ` David Hildenbrand
2023-07-31 18:23           ` Linus Torvalds
2023-07-31 18:51             ` Peter Xu
2023-07-31 19:00             ` David Hildenbrand
2023-07-31 19:07               ` Linus Torvalds
2023-07-31 19:22                 ` David Hildenbrand
2023-08-01 13:05               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9de80e22-e89f-2760-34f4-61be5f8fd39c@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liubo254@huawei.com \
    --cc=peterx@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.