stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 1/4] smaps: Fix the abnormal memory statistics obtained through /proc/pid/smaps
       [not found] <20230727212845.135673-1-david@redhat.com>
@ 2023-07-27 21:28 ` David Hildenbrand
  2023-07-27 21:28 ` [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs David Hildenbrand
  1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2023-07-27 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-fsdevel, David Hildenbrand, Andrew Morton,
	Linus Torvalds, liubo, Peter Xu, Matthew Wilcox, Hugh Dickins,
	Jason Gunthorpe, John Hubbard, stable

From: liubo <liubo254@huawei.com>

In commit 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()"), FOLL_NUMA was removed and replaced by
the gup_can_follow_protnone interface.

However, for the case where the user-mode process uses transparent
huge pages, when analyzing the memory usage through
/proc/pid/smaps_rollup, the obtained memory usage is not consistent
with the RSS in /proc/pid/status.

Related examples are as follows:
cat /proc/15427/status
VmRSS:  20973024 kB
RssAnon:        20971616 kB
RssFile:            1408 kB
RssShmem:              0 kB

cat /proc/15427/smaps_rollup
00400000-7ffcc372d000 ---p 00000000 00:00 0 [rollup]
Rss:            14419432 kB
Pss:            14418079 kB
Pss_Dirty:      14418016 kB
Pss_Anon:       14418016 kB
Pss_File:             63 kB
Pss_Shmem:             0 kB
Anonymous:      14418016 kB
LazyFree:              0 kB
AnonHugePages:  14417920 kB

The root cause is that the traversal In the page table, the number of
pages obtained by smaps_pmd_entry does not include the pages
corresponding to PROTNONE,resulting in a different situation.

Therefore, when obtaining pages through the follow_trans_huge_pmd
interface, add the FOLL_FORCE flag to count the pages corresponding to
PROTNONE to solve the above problem.

Signed-off-by: liubo <liubo254@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Signed-off-by: David Hildenbrand <david@redhat.com> # AKPM fixups, cc stable
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/task_mmu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c1e6531cb02a..7075ce11dc7d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -571,8 +571,12 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 	bool migration = false;
 
 	if (pmd_present(*pmd)) {
-		/* FOLL_DUMP will return -EFAULT on huge zero page */
-		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
+		/*
+		 * FOLL_DUMP will return -EFAULT on huge zero page
+		 * FOLL_FORCE follow a PROT_NONE mapped page
+		 */
+		page = follow_trans_huge_pmd(vma, addr, pmd,
+					     FOLL_DUMP | FOLL_FORCE);
 	} else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
 		swp_entry_t entry = pmd_to_swp_entry(*pmd);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs
       [not found] <20230727212845.135673-1-david@redhat.com>
  2023-07-27 21:28 ` [PATCH v1 1/4] smaps: Fix the abnormal memory statistics obtained through /proc/pid/smaps David Hildenbrand
@ 2023-07-27 21:28 ` David Hildenbrand
  2023-07-28  2:30   ` John Hubbard
  1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2023-07-27 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-fsdevel, David Hildenbrand, Andrew Morton,
	Linus Torvalds, liubo, Peter Xu, Matthew Wilcox, Hugh Dickins,
	Jason Gunthorpe, John Hubbard, stable

We accidentally enforced PROT_NONE PTE/PMD permission checks for
follow_page() like we do for get_user_pages() and friends. That was
undesired, because follow_page() is usually only used to lookup a currently
mapped page, not to actually access it. Further, follow_page() does not
actually trigger fault handling, but instead simply fails.

Let's restore that behavior by conditionally setting FOLL_FORCE if
FOLL_WRITE is not set. This way, for example KSM and migration code will
no longer fail on PROT_NONE mapped PTEs/PMDS.

Handling this internally doesn't require us to add any new FOLL_FORCE
usage outside of GUP code.

While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
permission checks like in check_vma_flags(), so especially
FOLL_FORCE|FOLL_WRITE would be dodgy.

This issue was identified by code inspection. We'll add some
documentation regarding FOLL_FORCE next.

Reported-by: Peter Xu <peterx@redhat.com>
Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Cc: <stable@vger.kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/gup.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2493ffa10f4b..da9a5cc096ac 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 	if (vma_is_secretmem(vma))
 		return NULL;
 
-	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
+	if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
 		return NULL;
 
+	/*
+	 * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
+	 * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
+	 * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
+	 */
+	if (!(foll_flags & FOLL_WRITE))
+		foll_flags |= FOLL_FORCE;
+
 	page = follow_page_mask(vma, address, foll_flags, &ctx);
 	if (ctx.pgmap)
 		put_dev_pagemap(ctx.pgmap);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs
  2023-07-27 21:28 ` [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs David Hildenbrand
@ 2023-07-28  2:30   ` John Hubbard
  2023-07-28  9:08     ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: John Hubbard @ 2023-07-28  2:30 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Linus Torvalds, liubo,
	Peter Xu, Matthew Wilcox, Hugh Dickins, Jason Gunthorpe, stable

On 7/27/23 14:28, David Hildenbrand wrote:
> We accidentally enforced PROT_NONE PTE/PMD permission checks for
> follow_page() like we do for get_user_pages() and friends. That was
> undesired, because follow_page() is usually only used to lookup a currently
> mapped page, not to actually access it. Further, follow_page() does not
> actually trigger fault handling, but instead simply fails.

I see that follow_page() is also completely undocumented. And that
reduces us to deducing how it should be used...these things that
change follow_page()'s behavior maybe should have a go at documenting
it too, perhaps.

> 
> Let's restore that behavior by conditionally setting FOLL_FORCE if
> FOLL_WRITE is not set. This way, for example KSM and migration code will
> no longer fail on PROT_NONE mapped PTEs/PMDS.
> 
> Handling this internally doesn't require us to add any new FOLL_FORCE
> usage outside of GUP code.
> 
> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
> permission checks like in check_vma_flags(), so especially
> FOLL_FORCE|FOLL_WRITE would be dodgy.
> 
> This issue was identified by code inspection. We'll add some
> documentation regarding FOLL_FORCE next.
> 
> Reported-by: Peter Xu <peterx@redhat.com>
> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   mm/gup.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 2493ffa10f4b..da9a5cc096ac 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>   	if (vma_is_secretmem(vma))
>   		return NULL;
>   
> -	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
> +	if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
>   		return NULL;

This is not a super happy situation: follow_page() is now prohibited
(see above: we should document that interface) from passing in
FOLL_FORCE...

>   
> +	/*
> +	 * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
> +	 * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
> +	 * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
> +	 */
> +	if (!(foll_flags & FOLL_WRITE))
> +		foll_flags |= FOLL_FORCE;
> +

...but then we set it anyway, for special cases. It's awkward because
FOLL_FORCE is not an "internal to gup" flag (yet?).

I don't yet have suggestions, other than:

1) Yes, the FOLL_NUMA made things bad.

2) And they are still very confusing, especially the new use of
    FOLL_FORCE.

...I'll try to let this soak in and maybe recommend something
in a more productive way. :)

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs
  2023-07-28  2:30   ` John Hubbard
@ 2023-07-28  9:08     ` David Hildenbrand
  2023-07-28 10:12       ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2023-07-28  9:08 UTC (permalink / raw)
  To: John Hubbard, linux-kernel
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Linus Torvalds, liubo,
	Peter Xu, Matthew Wilcox, Hugh Dickins, Jason Gunthorpe, stable

On 28.07.23 04:30, John Hubbard wrote:
> On 7/27/23 14:28, David Hildenbrand wrote:
>> We accidentally enforced PROT_NONE PTE/PMD permission checks for
>> follow_page() like we do for get_user_pages() and friends. That was
>> undesired, because follow_page() is usually only used to lookup a currently
>> mapped page, not to actually access it. Further, follow_page() does not
>> actually trigger fault handling, but instead simply fails.
> 
> I see that follow_page() is also completely undocumented. And that
> reduces us to deducing how it should be used...these things that
> change follow_page()'s behavior maybe should have a go at documenting
> it too, perhaps.

I can certainly be motivated to do that. :)

> 
>>
>> Let's restore that behavior by conditionally setting FOLL_FORCE if
>> FOLL_WRITE is not set. This way, for example KSM and migration code will
>> no longer fail on PROT_NONE mapped PTEs/PMDS.
>>
>> Handling this internally doesn't require us to add any new FOLL_FORCE
>> usage outside of GUP code.
>>
>> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
>> permission checks like in check_vma_flags(), so especially
>> FOLL_FORCE|FOLL_WRITE would be dodgy.
>>
>> This issue was identified by code inspection. We'll add some
>> documentation regarding FOLL_FORCE next.
>>
>> Reported-by: Peter Xu <peterx@redhat.com>
>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>    mm/gup.c | 10 +++++++++-
>>    1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 2493ffa10f4b..da9a5cc096ac 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>>    	if (vma_is_secretmem(vma))
>>    		return NULL;
>>    
>> -	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
>> +	if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
>>    		return NULL;
> 
> This is not a super happy situation: follow_page() is now prohibited
> (see above: we should document that interface) from passing in
> FOLL_FORCE...

I guess you saw my patch #4.

If you take a look at the existing callers (that are fortunately very 
limited), you'll see that nobody cares.

Most of the FOLL flags don't make any sense for follow_page(), and 
limiting further (ab)use is at least to me very appealing.

> 
>>    
>> +	/*
>> +	 * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
>> +	 * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
>> +	 * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
>> +	 */
>> +	if (!(foll_flags & FOLL_WRITE))
>> +		foll_flags |= FOLL_FORCE;
>> +
> 
> ...but then we set it anyway, for special cases. It's awkward because
> FOLL_FORCE is not an "internal to gup" flag (yet?).
> 
> I don't yet have suggestions, other than:
> 
> 1) Yes, the FOLL_NUMA made things bad.
> 
> 2) And they are still very confusing, especially the new use of
>      FOLL_FORCE.
> 
> ...I'll try to let this soak in and maybe recommend something
> in a more productive way. :)

What I can offer that might be very appealing is the following:

Get rid of the flags parameter for follow_page() *completely*. Yes, then 
we can even rename FOLL_ to something reasonable in the context where it 
is nowadays used ;)


Internally, we'll then set

FOLL_GET | FOLL_DUMP | FOLL_FORCE

and document exactly what this functions does. Any user that needs 
something different should just look into using get_user_pages() instead.

I can prototype that on top of this work easily.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs
  2023-07-28  9:08     ` David Hildenbrand
@ 2023-07-28 10:12       ` David Hildenbrand
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2023-07-28 10:12 UTC (permalink / raw)
  To: John Hubbard, linux-kernel
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Linus Torvalds, liubo,
	Peter Xu, Matthew Wilcox, Hugh Dickins, Jason Gunthorpe, stable

On 28.07.23 11:08, David Hildenbrand wrote:
> On 28.07.23 04:30, John Hubbard wrote:
>> On 7/27/23 14:28, David Hildenbrand wrote:
>>> We accidentally enforced PROT_NONE PTE/PMD permission checks for
>>> follow_page() like we do for get_user_pages() and friends. That was
>>> undesired, because follow_page() is usually only used to lookup a currently
>>> mapped page, not to actually access it. Further, follow_page() does not
>>> actually trigger fault handling, but instead simply fails.
>>
>> I see that follow_page() is also completely undocumented. And that
>> reduces us to deducing how it should be used...these things that
>> change follow_page()'s behavior maybe should have a go at documenting
>> it too, perhaps.
> 
> I can certainly be motivated to do that. :)
> 
>>
>>>
>>> Let's restore that behavior by conditionally setting FOLL_FORCE if
>>> FOLL_WRITE is not set. This way, for example KSM and migration code will
>>> no longer fail on PROT_NONE mapped PTEs/PMDS.
>>>
>>> Handling this internally doesn't require us to add any new FOLL_FORCE
>>> usage outside of GUP code.
>>>
>>> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
>>> permission checks like in check_vma_flags(), so especially
>>> FOLL_FORCE|FOLL_WRITE would be dodgy.
>>>
>>> This issue was identified by code inspection. We'll add some
>>> documentation regarding FOLL_FORCE next.
>>>
>>> Reported-by: Peter Xu <peterx@redhat.com>
>>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>     mm/gup.c | 10 +++++++++-
>>>     1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 2493ffa10f4b..da9a5cc096ac 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>>>     	if (vma_is_secretmem(vma))
>>>     		return NULL;
>>>     
>>> -	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
>>> +	if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
>>>     		return NULL;
>>
>> This is not a super happy situation: follow_page() is now prohibited
>> (see above: we should document that interface) from passing in
>> FOLL_FORCE...
> 
> I guess you saw my patch #4.
> 
> If you take a look at the existing callers (that are fortunately very
> limited), you'll see that nobody cares.
> 
> Most of the FOLL flags don't make any sense for follow_page(), and
> limiting further (ab)use is at least to me very appealing.
> 
>>
>>>     
>>> +	/*
>>> +	 * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
>>> +	 * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
>>> +	 * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
>>> +	 */
>>> +	if (!(foll_flags & FOLL_WRITE))
>>> +		foll_flags |= FOLL_FORCE;
>>> +
>>
>> ...but then we set it anyway, for special cases. It's awkward because
>> FOLL_FORCE is not an "internal to gup" flag (yet?).
>>
>> I don't yet have suggestions, other than:
>>
>> 1) Yes, the FOLL_NUMA made things bad.
>>
>> 2) And they are still very confusing, especially the new use of
>>       FOLL_FORCE.
>>
>> ...I'll try to let this soak in and maybe recommend something
>> in a more productive way. :)
> 
> What I can offer that might be very appealing is the following:
> 
> Get rid of the flags parameter for follow_page() *completely*. Yes, then
> we can even rename FOLL_ to something reasonable in the context where it
> is nowadays used ;)
> 
> 
> Internally, we'll then set
> 
> FOLL_GET | FOLL_DUMP | FOLL_FORCE
> 
> and document exactly what this functions does. Any user that needs
> something different should just look into using get_user_pages() instead.
> 
> I can prototype that on top of this work easily.

The end result looks something like:

/**
  * follow_page - look up and reference a page descriptor from a user-virtual
  * 		 address
  * @vma: vm_area_struct mapping @address
  * @address: virtual address to look up
  *
  * follow_page() will look up the page mapped at the given address and
  * take a reference on the page. The returned page has to be released using
  * put_page().
  *
  * follow_page() will not return special (like zero) pages and does not check
  * PTE protection: the returned page might be mapped PROT_NONE, R/O or R/W.
  * Consequently, follow_page() will not trigger NUMA hinting faults.
  *
  * follow_page() does not trigger page faults. If no page is mapped, or
  * a special (like zero) page is mapped, it returns %NULL or an error pointer.
  *
  * Note: new users with different requirements are probably better off using
  *       one of the get_user_pages() variants or one of the walk_page_range()
  *       variants.
  *
  * Return: the mapped (struct page *), %NULL if no mapping exists, or
  * an error pointer if there is a mapping to something not represented
  * by a page descriptor (see also vm_normal_page()) or the zero page.
  */
struct page *follow_page(struct vm_area_struct *vma, unsigned long address)
{
	struct follow_page_context ctx = { NULL };
	unsigned long gup_flags;
	struct page *page;

	if (vma_is_secretmem(vma))
		return NULL;

	/*
	 * FOLL_GET: We always want a reference on the returned page.
	 * FOL_DUMP: Ignore special (like zero) pages.
	 * FOLL_FORCE: Succeeded on PROT_NONE-mapped pages.
	 */
	gup_flags = FOLL_GET | FOLL_DUMP | FOLL_FORCE;

	page = follow_page_mask(vma, address, gup_flags, &ctx);
	if (ctx.pgmap)
		put_dev_pagemap(ctx.pgmap);
	return page;
}

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-28 10:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230727212845.135673-1-david@redhat.com>
2023-07-27 21:28 ` [PATCH v1 1/4] smaps: Fix the abnormal memory statistics obtained through /proc/pid/smaps David Hildenbrand
2023-07-27 21:28 ` [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs David Hildenbrand
2023-07-28  2:30   ` John Hubbard
2023-07-28  9:08     ` David Hildenbrand
2023-07-28 10:12       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).