All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yu Zhao <yuzhao@google.com>, Hugh Dickins <hughd@google.com>,
	 Matthew Wilcox <willy@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	 Yin Fengwei <fengwei.yin@intel.com>,
	David Hildenbrand <david@redhat.com>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	 Yang Shi <shy828301@gmail.com>,
	"Huang, Ying" <ying.huang@intel.com>,  Zi Yan <ziy@nvidia.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance
Date: Mon, 17 Jul 2023 16:37:31 -0700 (PDT)	[thread overview]
Message-ID: <8bdfd8d8-5662-4615-86dc-d60259bd16d@google.com> (raw)
In-Reply-To: <5df787a0-8e69-2472-cdd6-f96a3f7dfaaf@arm.com>

On Mon, 17 Jul 2023, Ryan Roberts wrote:

> >>>> +static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio)
> >>>> +{
> >>>> +       int i;
> >>>> +       gfp_t gfp;
> >>>> +       pte_t *pte;
> >>>> +       unsigned long addr;
> >>>> +       struct vm_area_struct *vma = vmf->vma;
> >>>> +       int prefer = anon_folio_order(vma);
> >>>> +       int orders[] = {
> >>>> +               prefer,
> >>>> +               prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0,
> >>>> +               0,
> >>>> +       };
> >>>> +
> >>>> +       *folio = NULL;
> >>>> +
> >>>> +       if (vmf_orig_pte_uffd_wp(vmf))
> >>>> +               goto fallback;
> >>>> +
> >>>> +       for (i = 0; orders[i]; i++) {
> >>>> +               addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]);
> >>>> +               if (addr >= vma->vm_start &&
> >>>> +                   addr + (PAGE_SIZE << orders[i]) <= vma->vm_end)
> >>>> +                       break;
> >>>> +       }
> >>>> +
> >>>> +       if (!orders[i])
> >>>> +               goto fallback;
> >>>> +
> >>>> +       pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
> >>>> +       if (!pte)
> >>>> +               return -EAGAIN;
> >>>
> >>> It would be a bug if this happens. So probably -EINVAL?
> >>
> >> Not sure what you mean? Hugh Dickins' series that went into v6.5-rc1 makes it
> >> possible for pte_offset_map() to fail (if I understood correctly) and we have to
> >> handle this. The intent is that we will return from the fault without making any
> >> change, then we will refault and try again.
> > 
> > Thanks for checking that -- it's very relevant. One detail is that
> > that series doesn't affect anon. IOW, collapsing PTEs into a PMD can't
> > happen while we are holding mmap_lock for read here, and therefore,
> > the race that could cause pte_offset_map() on shmem/file PTEs to fail
> > doesn't apply here.
> 
> But Hugh's patches have changed do_anonymous_page() to handle failure from
> pte_offset_map_lock(). So I was just following that pattern. If this really
> can't happen, then I'd rather WARN/BUG on it, and simplify alloc_anon_folio()'s
> prototype to just return a `struct folio *` (and if it's null that means ENOMEM).
> 
> Hugh, perhaps you can comment?

I agree with your use of -EAGAIN there: I find it better to allow for the
possibility, than to go to great effort persuading that it's impossible;
especially because what's possible tomorrow may differ from today.

And notice that, before my changes, there used to be a pmd_trans_unstable()
check above, implying that it is possible for it to fail (for more reasons
than corruption causing pmd_bad()) - one scenario would be that the
pte_alloc() above succeeded *because* someone else had managed to insert
a huge pmd there already (maybe we have MMF_DISABLE_THP but they did not).

But I see from later mail that Yu Zhao now agrees with your -EAGAIN too,
so we are all on the same folio.

Hugh

p.s. while giving opinions, I'm one of those against using "THP" for
large but not pmd-mappable folios; and was glad to see Matthew arguing
the same way when considering THP_SWPOUT in another thread today.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yu Zhao <yuzhao@google.com>, Hugh Dickins <hughd@google.com>,
	 Matthew Wilcox <willy@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	 Yin Fengwei <fengwei.yin@intel.com>,
	David Hildenbrand <david@redhat.com>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	 Yang Shi <shy828301@gmail.com>,
	"Huang, Ying" <ying.huang@intel.com>,  Zi Yan <ziy@nvidia.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance
Date: Mon, 17 Jul 2023 16:37:31 -0700 (PDT)	[thread overview]
Message-ID: <8bdfd8d8-5662-4615-86dc-d60259bd16d@google.com> (raw)
In-Reply-To: <5df787a0-8e69-2472-cdd6-f96a3f7dfaaf@arm.com>

On Mon, 17 Jul 2023, Ryan Roberts wrote:

> >>>> +static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio)
> >>>> +{
> >>>> +       int i;
> >>>> +       gfp_t gfp;
> >>>> +       pte_t *pte;
> >>>> +       unsigned long addr;
> >>>> +       struct vm_area_struct *vma = vmf->vma;
> >>>> +       int prefer = anon_folio_order(vma);
> >>>> +       int orders[] = {
> >>>> +               prefer,
> >>>> +               prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0,
> >>>> +               0,
> >>>> +       };
> >>>> +
> >>>> +       *folio = NULL;
> >>>> +
> >>>> +       if (vmf_orig_pte_uffd_wp(vmf))
> >>>> +               goto fallback;
> >>>> +
> >>>> +       for (i = 0; orders[i]; i++) {
> >>>> +               addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]);
> >>>> +               if (addr >= vma->vm_start &&
> >>>> +                   addr + (PAGE_SIZE << orders[i]) <= vma->vm_end)
> >>>> +                       break;
> >>>> +       }
> >>>> +
> >>>> +       if (!orders[i])
> >>>> +               goto fallback;
> >>>> +
> >>>> +       pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
> >>>> +       if (!pte)
> >>>> +               return -EAGAIN;
> >>>
> >>> It would be a bug if this happens. So probably -EINVAL?
> >>
> >> Not sure what you mean? Hugh Dickins' series that went into v6.5-rc1 makes it
> >> possible for pte_offset_map() to fail (if I understood correctly) and we have to
> >> handle this. The intent is that we will return from the fault without making any
> >> change, then we will refault and try again.
> > 
> > Thanks for checking that -- it's very relevant. One detail is that
> > that series doesn't affect anon. IOW, collapsing PTEs into a PMD can't
> > happen while we are holding mmap_lock for read here, and therefore,
> > the race that could cause pte_offset_map() on shmem/file PTEs to fail
> > doesn't apply here.
> 
> But Hugh's patches have changed do_anonymous_page() to handle failure from
> pte_offset_map_lock(). So I was just following that pattern. If this really
> can't happen, then I'd rather WARN/BUG on it, and simplify alloc_anon_folio()'s
> prototype to just return a `struct folio *` (and if it's null that means ENOMEM).
> 
> Hugh, perhaps you can comment?

I agree with your use of -EAGAIN there: I find it better to allow for the
possibility, than to go to great effort persuading that it's impossible;
especially because what's possible tomorrow may differ from today.

And notice that, before my changes, there used to be a pmd_trans_unstable()
check above, implying that it is possible for it to fail (for more reasons
than corruption causing pmd_bad()) - one scenario would be that the
pte_alloc() above succeeded *because* someone else had managed to insert
a huge pmd there already (maybe we have MMF_DISABLE_THP but they did not).

But I see from later mail that Yu Zhao now agrees with your -EAGAIN too,
so we are all on the same folio.

Hugh

p.s. while giving opinions, I'm one of those against using "THP" for
large but not pmd-mappable folios; and was glad to see Matthew arguing
the same way when considering THP_SWPOUT in another thread today.


  parent reply	other threads:[~2023-07-19  8:21 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-14 16:04 [PATCH v3 0/4] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-14 16:04 ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 1/4] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-07-14 16:17   ` Ryan Roberts
2023-07-14 16:52   ` Yu Zhao
2023-07-14 16:52     ` Yu Zhao
2023-07-14 18:01     ` Ryan Roberts
2023-07-14 18:01       ` Ryan Roberts
2023-07-17 13:00   ` David Hildenbrand
2023-07-17 13:00     ` David Hildenbrand
2023-07-17 13:13     ` Ryan Roberts
2023-07-17 13:13       ` Ryan Roberts
2023-07-17 13:19       ` David Hildenbrand
2023-07-17 13:19         ` David Hildenbrand
2023-07-17 13:21         ` Ryan Roberts
2023-07-17 13:21           ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 2/4] mm: Default implementation of arch_wants_pte_order() Ryan Roberts
2023-07-14 16:17   ` Ryan Roberts
2023-07-14 16:54   ` Yu Zhao
2023-07-14 16:54     ` Yu Zhao
2023-07-17 11:13   ` Yin Fengwei
2023-07-17 11:13     ` Yin Fengwei
2023-07-17 13:01   ` David Hildenbrand
2023-07-17 13:01     ` David Hildenbrand
2023-07-17 13:15     ` Ryan Roberts
2023-07-17 13:15       ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance Ryan Roberts
2023-07-14 16:17   ` Ryan Roberts
2023-07-14 17:17   ` Yu Zhao
2023-07-14 17:17     ` Yu Zhao
2023-07-14 17:59     ` Ryan Roberts
2023-07-14 17:59       ` Ryan Roberts
2023-07-14 22:11       ` Yu Zhao
2023-07-14 22:11         ` Yu Zhao
2023-07-17 13:36         ` Ryan Roberts
2023-07-17 13:36           ` Ryan Roberts
2023-07-17 19:31           ` Yu Zhao
2023-07-17 19:31             ` Yu Zhao
2023-07-17 20:35             ` Yu Zhao
2023-07-17 20:35               ` Yu Zhao
2023-07-17 23:37           ` Hugh Dickins [this message]
2023-07-17 23:37             ` Hugh Dickins
2023-07-18 10:36             ` Ryan Roberts
2023-07-18 10:36               ` Ryan Roberts
2023-07-17 13:06     ` David Hildenbrand
2023-07-17 13:06       ` David Hildenbrand
2023-07-17 13:20       ` Ryan Roberts
2023-07-17 13:20         ` Ryan Roberts
2023-07-17 13:56         ` David Hildenbrand
2023-07-17 13:56           ` David Hildenbrand
2023-07-17 14:47           ` Ryan Roberts
2023-07-17 14:47             ` Ryan Roberts
2023-07-17 14:55             ` David Hildenbrand
2023-07-17 14:55               ` David Hildenbrand
2023-07-17 17:07       ` Yu Zhao
2023-07-17 17:07         ` Yu Zhao
2023-07-17 17:16         ` David Hildenbrand
2023-07-17 17:16           ` David Hildenbrand
2023-07-21 10:57   ` Ryan Roberts
2023-07-21 10:57     ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 4/4] arm64: mm: Override arch_wants_pte_order() Ryan Roberts
2023-07-14 16:17   ` Ryan Roberts
2023-07-14 16:47   ` Yu Zhao
2023-07-14 16:47     ` Yu Zhao
2023-07-24 11:59 ` [PATCH v3 0/4] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-24 14:58   ` Zi Yan
2023-07-24 15:41     ` Ryan Roberts
2023-07-26  7:36       ` Itaru Kitayama
2023-07-26  8:42         ` Ryan Roberts
2023-07-26  8:47           ` Itaru Kitayama

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bdfd8d8-5662-4615-86dc-d60259bd16d@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.