Re: [PATCH v6 5/9] mm: thp: Extend THP to allocate anonymous large folios

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Yin Fengwei" <fengwei.yin@intel.com>,
	David Hildenbrand <david@redhat.com>,
	"Yu Zhao" <yuzhao@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Itaru Kitayama <itaru.kitayama@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v6 5/9] mm: thp: Extend THP to allocate anonymous large folios
Date: Fri, 27 Oct 2023 16:04:19 -0700	[thread overview]
Message-ID: <8a72da61-b2ef-48ad-ae59-0bae7ac2ce10@nvidia.com> (raw)
In-Reply-To: <20230929114421.3761121-6-ryan.roberts@arm.com>

On 9/29/23 04:44, Ryan Roberts wrote:

Hi Ryan,

A few clarifying questions below.

...
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2e7c338229a6..c4860476a1f5 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -68,9 +68,11 @@ extern struct kobj_attribute shmem_enabled_attr;
>   #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER)
>   
>   /*
> - * Mask of all large folio orders supported for anonymous THP.
> + * Mask of all large folio orders supported for anonymous THP; all orders up to
> + * and including PMD_ORDER, except order-0 (which is not "huge") and order-1
> + * (which is a limitation of the THP implementation).
>    */
> -#define THP_ORDERS_ALL_ANON	BIT(PMD_ORDER)
> +#define THP_ORDERS_ALL_ANON	((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))
>   
>   /*
>    * Mask of all large folio orders supported for file THP.
> diff --git a/mm/memory.c b/mm/memory.c
> index b5b82fc8e164..92ed9c782dc9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4059,6 +4059,87 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>   	return ret;
>   }
>   
> +static bool vmf_pte_range_changed(struct vm_fault *vmf, int nr_pages)
> +{
> +	int i;
> +
> +	if (nr_pages == 1)
> +		return vmf_pte_changed(vmf);
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		if (!pte_none(ptep_get_lockless(vmf->pte + i)))
> +			return true;

This seems like something different than the function name implies.
It's really confusing: for a single page case, return true if the
pte in the page tables has changed, yes that is very clear.

But then for multiple page cases, which is really the main
focus here--for that, claim that the range has changed if any
pte is present (!pte_none). Can you please help me understand
what this means?

> +	}
> +
> +	return false;
> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> +{
> +	gfp_t gfp;
> +	pte_t *pte;
> +	unsigned long addr;
> +	struct folio *folio;
> +	struct vm_area_struct *vma = vmf->vma;
> +	unsigned int orders;
> +	int order;
> +
> +	/*
> +	 * If uffd is active for the vma we need per-page fault fidelity to
> +	 * maintain the uffd semantics.
> +	 */
> +	if (userfaultfd_armed(vma))
> +		goto fallback;
> +
> +	/*
> +	 * Get a list of all the (large) orders below PMD_ORDER that are enabled
> +	 * for this vma. Then filter out the orders that can't be allocated over
> +	 * the faulting address and still be fully contained in the vma.
> +	 */
> +	orders = hugepage_vma_check(vma, vma->vm_flags, false, true, true,
> +				    BIT(PMD_ORDER) - 1);
> +	orders = transhuge_vma_suitable(vma, vmf->address, orders);
> +
> +	if (!orders)
> +		goto fallback;
> +
> +	pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
> +	if (!pte)
> +		return ERR_PTR(-EAGAIN);

pte_offset_map() can only fail due to:

     a) Wrong pmd type. These include:
         pmd_none
         pmd_bad
         pmd migration entry
         pmd_trans_huge
         pmd_devmap

     b) __pte_map() failure

For (a), why is it that -EAGAIN is used here? I see that that
will lead to a re-fault, I got that far, but am missing something
still.

For (b), same question, actually. I'm not completely sure why
why a retry is going to fix a __pte_map() failure?


> +
> +	order = first_order(orders);
> +	while (orders) {
> +		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
> +		vmf->pte = pte + pte_index(addr);
> +		if (!vmf_pte_range_changed(vmf, 1 << order))
> +			break;
> +		order = next_order(&orders, order);
> +	}
> +
> +	vmf->pte = NULL;
> +	pte_unmap(pte);
> +
> +	gfp = vma_thp_gfp_mask(vma);
> +
> +	while (orders) {
> +		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
> +		folio = vma_alloc_folio(gfp, order, vma, addr, true);
> +		if (folio) {
> +			clear_huge_page(&folio->page, addr, 1 << order);
> +			return folio;
> +		}
> +		order = next_order(&orders, order);
> +	}

And finally: is it accurate to say that there are *no* special
page flags being set, for PTE-mapped THPs? I don't see any here,
but want to confirm.


thanks,
-- 
John Hubbard
NVIDIA

next prev parent reply	other threads:[~2023-10-27 23:04 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 11:44 [PATCH v6 0/9] variable-order, large folios for anonymous memory Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 1/9] mm: Allow deferred splitting of arbitrary anon large folios Ryan Roberts
2023-10-05  8:19   ` David Hildenbrand
2023-09-29 11:44 ` [PATCH v6 2/9] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-09-29 13:45   ` Kirill A. Shutemov
2023-09-29 14:39     ` Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 3/9] mm: thp: Account pte-mapped anonymous THP usage Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 4/9] mm: thp: Introduce anon_orders and anon_always_mask sysfs files Ryan Roberts
2023-09-29 22:55   ` Andrew Morton
2023-10-02 10:15     ` Ryan Roberts
2023-10-07 22:54     ` Michael Ellerman
2023-10-10  0:20       ` Andrew Morton
2023-10-12  9:31         ` David Hildenbrand
2023-10-12 11:07         ` Michael Ellerman
2023-10-11  6:02   ` kernel test robot
2023-09-29 11:44 ` [PATCH v6 5/9] mm: thp: Extend THP to allocate anonymous large folios Ryan Roberts
2023-10-05 12:05   ` Daniel Gomez
2023-10-05 12:49     ` Ryan Roberts
2023-10-05 14:59       ` Daniel Gomez
2023-10-27 23:04   ` John Hubbard [this message]
2023-10-30 11:43     ` Ryan Roberts
2023-10-30 23:25       ` John Hubbard
2023-11-01 13:56         ` Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 6/9] mm: thp: Add "recommend" option for anon_orders Ryan Roberts
2023-10-06 20:08   ` David Hildenbrand
2023-10-06 22:28     ` Yu Zhao
2023-10-09 11:45       ` Ryan Roberts
2023-10-09 14:43         ` David Hildenbrand
2023-10-09 20:04         ` Yu Zhao
2023-10-10 10:16           ` Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 7/9] arm64/mm: Override arch_wants_pte_order() Ryan Roberts
2023-10-02 15:21   ` Catalin Marinas
2023-10-03  7:32     ` Ryan Roberts
2023-10-03 12:05       ` Catalin Marinas
2023-09-29 11:44 ` [PATCH v6 8/9] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 9/9] selftests/mm/cow: Add tests for small-order anon THP Ryan Roberts
2023-10-06 20:06 ` [PATCH v6 0/9] variable-order, large folios for anonymous memory David Hildenbrand
2023-10-09 11:28   ` Ryan Roberts
2023-10-09 16:22     ` David Hildenbrand
2023-10-10 10:47       ` Ryan Roberts
2023-10-13 20:14         ` David Hildenbrand
2023-10-20 12:33   ` Ryan Roberts
2023-10-25 16:24     ` Ryan Roberts
2023-10-25 18:47       ` David Hildenbrand
2023-10-25 19:11         ` Yu Zhao
2023-10-26  9:53           ` Ryan Roberts
2023-10-26 15:19             ` David Hildenbrand
2023-10-25 19:10       ` John Hubbard
2023-10-31 11:50   ` Ryan Roberts
2023-10-31 11:55     ` Ryan Roberts
2023-10-31 12:03       ` David Hildenbrand
2023-10-31 13:13         ` Ryan Roberts
2023-10-31 18:29       ` Yang Shi
2023-11-01 14:02         ` Ryan Roberts
2023-11-01 18:11           ` Yang Shi
2023-10-31 11:58     ` David Hildenbrand
2023-10-31 13:12       ` Ryan Roberts
2023-11-13  3:57 ` John Hubbard
2023-11-13  5:18   ` Matthew Wilcox
2023-11-13 10:19     ` Ryan Roberts
2023-11-13 11:52       ` Kefeng Wang
2023-11-13 12:12         ` Ryan Roberts
2023-11-13 14:52           ` Kefeng Wang
2023-11-13 14:52       ` John Hubbard
2023-11-13 15:04       ` Matthew Wilcox
2023-11-14 10:57         ` Ryan Roberts
2023-12-05 16:05           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a72da61-b2ef-48ad-ae59-0bae7ac2ce10@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=hughd@google.com \
    --cc=itaru.kitayama@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox