Re: [PATCH 2/8] mm: memory: extend finish_fault() to support large folio

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	akpm@linux-foundation.org, hughd@google.com
Cc: willy@infradead.org, david@redhat.com, ioworker0@gmail.com,
	wangkefeng.wang@huawei.com, ying.huang@intel.com,
	21cnbao@gmail.com, shy828301@gmail.com, ziy@nvidia.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/8] mm: memory: extend finish_fault() to support large folio
Date: Wed, 8 May 2024 11:44:00 +0800	[thread overview]
Message-ID: <d2bd3277-7ef5-4909-a149-6895ad95459e@linux.alibaba.com> (raw)
In-Reply-To: <13939ade-a99a-4075-8a26-9be7576b7e03@arm.com>



On 2024/5/7 18:37, Ryan Roberts wrote:
> On 06/05/2024 09:46, Baolin Wang wrote:
>> Add large folio mapping establishment support for finish_fault() as a preparation,
>> to support multi-size THP allocation of anonymous shmem pages in the following
>> patches.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/memory.c | 43 +++++++++++++++++++++++++++++++++----------
>>   1 file changed, 33 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index eea6e4984eae..936377220b77 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4747,9 +4747,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>   {
>>   	struct vm_area_struct *vma = vmf->vma;
>>   	struct page *page;
>> +	struct folio *folio;
>>   	vm_fault_t ret;
>>   	bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
>>   		      !(vma->vm_flags & VM_SHARED);
>> +	int type, nr_pages, i;
>> +	unsigned long addr = vmf->address;
>>   
>>   	/* Did we COW the page? */
>>   	if (is_cow)
>> @@ -4780,24 +4783,44 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>   			return VM_FAULT_OOM;
>>   	}
>>   
>> +	folio = page_folio(page);
>> +	nr_pages = folio_nr_pages(folio);
>> +
>> +	if (unlikely(userfaultfd_armed(vma))) {
>> +		nr_pages = 1;
>> +	} else if (nr_pages > 1) {
>> +		unsigned long start = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
>> +		unsigned long end = start + nr_pages * PAGE_SIZE;
>> +
>> +		/* In case the folio size in page cache beyond the VMA limits. */
>> +		addr = max(start, vma->vm_start);
>> +		nr_pages = (min(end, vma->vm_end) - addr) >> PAGE_SHIFT;
>> +
>> +		page = folio_page(folio, (addr - start) >> PAGE_SHIFT);
> 
> I still don't really follow the logic in this else if block. Isn't it possible
> that finish_fault() gets called with a page from a folio that isn't aligned with
> vmf->address?
> 
> For example, let's say we have a file who's size is 64K and which is cached in a
> single large folio in the page cache. But the file is mapped into a process at
> VA 16K to 80K. Let's say we fault on the first page (VA=16K). You will calculate

For shmem, this doesn't happen because the VA is aligned with the 
hugepage size in the shmem_get_unmapped_area() function. See patch 7.

> start=0 and end=64K I think?

Yes. Unfortunately, some file systems that support large mappings do not 
perform alignment for multi-size THP (non-PMD sized, for example: 64K). 
I think this requires modification to 
__get_unmapped_area--->thp_get_unmapped_area_vmflags() or 
file->f_op->get_unmapped_area() to align VA for multi-size THP in future.

So before adding that VA alignment changes, only allow building the 
large folio mapping for anonymous shmem:

diff --git a/mm/memory.c b/mm/memory.c
index 936377220b77..9e4d51826d23 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4786,7 +4786,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
         folio = page_folio(page);
         nr_pages = folio_nr_pages(folio);

-       if (unlikely(userfaultfd_armed(vma))) {
+       if (unlikely(userfaultfd_armed(vma)) || !vma_is_anon_shmem(vma)) {
                 nr_pages = 1;
         } else if (nr_pages > 1) {
                 unsigned long start = ALIGN_DOWN(vmf->address, nr_pages 
* PAGE_SIZE);

> Additionally, I think this path will end up mapping the entire folio (as long as
> it fits in the VMA). But this bypasses the fault-around configuration. As I
> think I mentioned against the RFC, this will inflate the RSS of the process and
> can cause behavioural changes as a result. I believe the current advice is to
> disable fault-around to prevent this kind of bloat when needed.

With above change, I do not think this is a problem? since users already 
want to use mTHP for anonymous shmem.

> It might be that you need a special variant of finish_fault() for shmem?
> 
> 
>> +	}
>>   	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
>> -				      vmf->address, &vmf->ptl);
>> +				       addr, &vmf->ptl);
>>   	if (!vmf->pte)
>>   		return VM_FAULT_NOPAGE;
>>   
>>   	/* Re-check under ptl */
>> -	if (likely(!vmf_pte_changed(vmf))) {
>> -		struct folio *folio = page_folio(page);
>> -		int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio);
>> -
>> -		set_pte_range(vmf, folio, page, 1, vmf->address);
>> -		add_mm_counter(vma->vm_mm, type, 1);
>> -		ret = 0;
>> -	} else {
>> -		update_mmu_tlb(vma, vmf->address, vmf->pte);
>> +	if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) {
>> +		update_mmu_tlb(vma, addr, vmf->pte);
>> +		ret = VM_FAULT_NOPAGE;
>> +		goto unlock;
>> +	} else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) {
>> +		for (i = 0; i < nr_pages; i++)
>> +			update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i);
>>   		ret = VM_FAULT_NOPAGE;
>> +		goto unlock;
>>   	}
>>   
>> +	set_pte_range(vmf, folio, page, nr_pages, addr);
>> +	type = is_cow ? MM_ANONPAGES : mm_counter_file(folio);
>> +	add_mm_counter(vma->vm_mm, type, nr_pages);
>> +	ret = 0;
>> +
>> +unlock:
>>   	pte_unmap_unlock(vmf->pte, vmf->ptl);
>>   	return ret;
>>   }

next prev parent reply	other threads:[~2024-05-08  3:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240508113934eucas1p13a3972f3f9955365f40155e084a7c7d5@eucas1p1.samsung.com>
2024-05-06  8:46 ` [PATCH 0/8] add mTHP support for anonymous shmem Baolin Wang
2024-05-06  8:46   ` [PATCH 1/8] mm: move highest_order() and next_order() out of the THP config Baolin Wang
2024-05-07 10:21     ` Ryan Roberts
2024-05-08  2:13       ` Baolin Wang
2024-05-08  9:06         ` Ryan Roberts
2024-05-08  9:40           ` Baolin Wang
2024-05-06  8:46   ` [PATCH 2/8] mm: memory: extend finish_fault() to support large folio Baolin Wang
2024-05-07 10:37     ` Ryan Roberts
2024-05-08  3:44       ` Baolin Wang [this message]
2024-05-08  7:15         ` David Hildenbrand
2024-05-08  9:06           ` Baolin Wang
2024-05-08  8:53         ` Ryan Roberts
2024-05-08  9:31           ` Baolin Wang
2024-05-08 10:47             ` Ryan Roberts
2024-05-09  1:10               ` Baolin Wang
2024-05-06  8:46   ` [PATCH 3/8] mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio() Baolin Wang
2024-05-06  8:46   ` [PATCH 4/8] mm: shmem: add THP validation for PMD-mapped THP related statistics Baolin Wang
2024-05-06  8:46   ` [PATCH 5/8] mm: shmem: add multi-size THP sysfs interface for anonymous shmem Baolin Wang
2024-05-07 10:52     ` Ryan Roberts
2024-05-08  4:45       ` Baolin Wang
2024-05-08  7:08         ` David Hildenbrand
2024-05-08  7:12           ` David Hildenbrand
2024-05-08  9:02             ` Ryan Roberts
2024-05-08  9:56               ` Baolin Wang
2024-05-08 10:48                 ` Ryan Roberts
2024-05-08 12:02               ` David Hildenbrand
2024-05-08 12:10                 ` David Hildenbrand
2024-05-08 12:43                   ` Ryan Roberts
2024-05-08 12:44                     ` Ryan Roberts
2024-05-08 12:45                     ` David Hildenbrand
2024-05-08 12:54                       ` Ryan Roberts
2024-05-08 13:07                         ` David Hildenbrand
2024-05-08 13:44                           ` Ryan Roberts
2024-05-06  8:46   ` [PATCH 6/8] mm: shmem: add mTHP support " Baolin Wang
2024-05-07 10:46     ` kernel test robot
2024-05-08  6:03       ` Baolin Wang
2024-05-06  8:46   ` [PATCH 7/8] mm: shmem: add mTHP size alignment in shmem_get_unmapped_area Baolin Wang
2024-05-06  8:46   ` [PATCH 8/8] mm: shmem: add mTHP counters for anonymous shmem Baolin Wang
2024-05-06 10:54   ` [PATCH 0/8] add mTHP support " Lance Yang
2024-05-07  1:47     ` Baolin Wang
2024-05-07  6:50       ` Lance Yang
2024-05-07 10:20   ` Ryan Roberts
2024-05-08  5:45     ` Baolin Wang
2024-05-08 11:39   ` Daniel Gomez
2024-05-08 11:58     ` David Hildenbrand
2024-05-08 14:28       ` Daniel Gomez
2024-05-08 17:03         ` David Hildenbrand
2024-05-09 19:18           ` Daniel Gomez
2024-05-09  3:08         ` Baolin Wang
2024-05-08 19:23       ` Luis Chamberlain
2024-05-09 17:48         ` David Hildenbrand
2024-05-10 18:53           ` Luis Chamberlain

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:936377220b7 dfblob:9e4d51826d2 )
 OR (
bs:"Re: [PATCH 2/8] mm: memory: extend finish_fault() to support large folio" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2bd3277-7ef5-4909-a149-6895ad95459e@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox