From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: Hugh Dickins <hughd@google.com>
Cc: linux-ia64@vger.kernel.org, David Hildenbrand <david@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
linux-kernel@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
Will Deacon <will@kernel.org>, Greg Ungerer <gerg@linux-m68k.org>,
linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
Helge Deller <deller@gmx.de>,
x86@kernel.org, Russell King <linux@armlinux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
Geert Uytterhoeven <geert@linux-m68k.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Alexandre Ghiti <alexghiti@rivosinc.com>,
Heiko Carstens <hca@linux.ibm.com>,
linux-m68k@lists.linux-m68k.org,
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
John David Anglin <dave.anglin@bell.net>,
Suren Baghdasaryan <surenb@google.com>,
linux-arm-kernel@lists.infradead.org,
Chris Zankel <chris@zankel.net>, Michal Simek <monstr@monstr.eu>,
Thomas Bogendoerfer <tsbogen d@alpha.franken.de>,
linux-parisc@vger.kernel.org, linux-mm@kvack.org,
linux-mips@vger.kernel.org, Palmer Dabbelt <palmer@dabbelt.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org,
"David S. Miller" <davem@davemloft.net>,
Mike Rapoport <rppt@kernel.org>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail
Date: Tue, 23 May 2023 14:00:56 +0200 [thread overview]
Message-ID: <20230523140056.55b664b1@p-imbrenda> (raw)
In-Reply-To: <4a15dbaa-1614-ce-ce1f-f73959cef895@google.com>
On Wed, 17 May 2023 14:50:28 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:
> On Wed, 17 May 2023, Claudio Imbrenda wrote:
> > On Tue, 9 May 2023 22:01:16 -0700 (PDT)
> > Hugh Dickins <hughd@google.com> wrote:
> >
> > > In rare transient cases, not yet made possible, pte_offset_map() and
> > > pte_offset_map_lock() may not find a page table: handle appropriately.
> > >
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > ---
> > > arch/s390/kernel/uv.c | 2 ++
> > > arch/s390/mm/gmap.c | 2 ++
> > > arch/s390/mm/pgtable.c | 12 +++++++++---
> > > 3 files changed, 13 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> > > index cb2ee06df286..3c62d1b218b1 100644
> > > --- a/arch/s390/kernel/uv.c
> > > +++ b/arch/s390/kernel/uv.c
> > > @@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> > >
> > > rc = -ENXIO;
> > > ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> > > + if (!ptep)
> > > + goto out;
>
> You may or may not be asking about this instance too. When I looked at
actually no, because of the reasons you give here :)
> how the code lower down handles -ENXIO (promoting it to -EFAULT if an
> access fails, or to -EAGAIN to ask for a retry), this looked just right
> (whereas using -EAGAIN here would be wrong: that expects a "page" which
> has not been initialized at this point).
>
> > > if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
> > > page = pte_page(*ptep);
> > > rc = -EAGAIN;
> > > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > > index dc90d1eb0d55..d198fc9475a2 100644
> > > --- a/arch/s390/mm/gmap.c
> > > +++ b/arch/s390/mm/gmap.c
> > > @@ -2549,6 +2549,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
> > > spinlock_t *ptl;
> > >
> > > ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> > > + if (!ptep)
> > > + break;
> >
> > so if pte_offset_map_lock fails, we abort and skip both the failed
> > entry and the rest of the entries?
>
> Yes.
>
> >
> > can pte_offset_map_lock be retried immediately if it fails? (consider
> > that we currently don't allow THP with KVM guests)
> >
> > Would something like this:
> >
> > do {
> > ptep = pte_offset_map_lock(...);
> > mb(); /* maybe? */
> > } while (!ptep);
> >
> > make sense?
>
> No. But you're absolutely right to be asking: thank you for looking
> into it so carefully - and I realize that it's hard at this stage to
> judge what's appropriate, when I've not yet even posted the endpoint
> of these changes, the patches which make it possible not to find a
> page table here. And I'm intentionally keeping that vague, because
> although I shall only introduce a THP case, I do expect it to be built
> upon later in reclaiming empty page tables: it would be nice not to
> have to change the arch code again when extending further.
>
> My "rare transient cases" phrase may be somewhat misleading: one thing
> that's wrong with your tight pte_offset_map_lock() loop above is that
> the pmd entry pointing to page table may have been suddenly replaced
> by a pmd_none() entry; and there's nothing in your loop above to
> break out if that is so.
>
> But if a page table is suddenly removed, that would be because it was
> either empty, or replaced by a THP entry, or easily reconstructable on
> demand (by that, I probably mean it was only mapping shared file
> pages, which can just be refaulted if needed again).
>
> The case you're wary of, is if the page table were removed briefly,
> then put back shortly after: and still contains zero pages further
> down. That's not something mm does now, nor at the end of my several
> series, nor that I imagine us wanting to do in future: but I am
> struggling to find a killer argument to persuade you that it could
> never be done - most pages in a page table do need rmap tracking,
> which will BUG if it's broken, but that argument happens not to apply
> to the zero page.
>
> (Hmm, there could be somewhere, where we would find it convenient to
> remove a page table with intent to do ...something, then validation
> of that isolated page table fails, so we just put it back again.)
>
> Is it good enough for me to promise you that we won't do that?
>
> There are several ways in which we could change __zap_zero_pages(),
> but I don't see them as actually dealing with the concern at hand.
>
> One change, I've tended to make at the mm end but did not dare
> to interfere here: it would seem more sensible to do a single
> pte_offset_map_lock() outside the loop, return if that fails,
> increment ptep inside the loop, pte_unmap_unlock() after the loop.
>
> But perhaps you have preemption reasons for not wanting that; and
> although it would eliminate the oddity of half-processing a page
> table, it would not really resolve the problem at hand: because,
> what if this page table got removed just before __zap_zero_pages()
> tries to take the lock, then got put back just after?
>
> Another change: I see __zap_zero_pages() is driven by
> walk_page_range(), and over at the mm end I'm usually setting
> walk->action to ACTION_AGAIN in these failure cases; but thought that
> an unnecessary piece of magic here, and cannot see how it could
> actually help. Your "retry the whole walk_page_range()" suggestion
> below would be a heavier equivalent of that: but neither way gives
> confidence, if a page table could actually be removed then reinserted
> without mmap_write_lock().
>
> I think I want to keep this s390 __zap_zero_pages() issue in mind, it
> is important and thank you for raising it; but don't see any change
> to the patch as actually needed.
>
> Hugh
so if I understand the above correctly, pte_offset_map_lock will only
fail if the whole page table has disappeared, and in that case, it will
never reappear with zero pages, therefore we can safely skip (in that
case just break). if we were to do a continue instead of a break, we
would most likely fail again anyway.
in that case I would still like a small change in your patch: please
write a short (2~3 lines max) comment about why it's ok to do things
that way
next prev parent reply other threads:[~2023-05-23 12:09 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-10 4:39 [PATCH 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-10 4:42 ` [PATCH 01/23] arm: " Hugh Dickins
2023-05-10 14:28 ` Matthew Wilcox
2023-05-11 3:40 ` Hugh Dickins
2023-05-10 4:43 ` [PATCH 02/23] arm64: allow pte_offset_map() " Hugh Dickins
2023-05-25 16:37 ` Catalin Marinas
2023-05-10 4:45 ` [PATCH 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-05-25 16:37 ` Catalin Marinas
2023-05-10 4:47 ` [PATCH 04/23] ia64/hugetlb: " Hugh Dickins
2023-05-10 4:48 ` [PATCH 05/23] m68k: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-10 7:13 ` Geert Uytterhoeven
2023-05-11 2:57 ` Hugh Dickins
2023-05-11 6:53 ` Geert Uytterhoeven
2023-05-10 4:49 ` [PATCH 06/23] microblaze: allow pte_offset_map() " Hugh Dickins
2023-05-10 4:51 ` [PATCH 07/23] mips: update_mmu_cache() can replace __update_tlb() Hugh Dickins
2023-05-10 4:52 ` [PATCH 08/23] parisc: add pte_unmap() to balance get_ptep() Hugh Dickins
2023-05-13 21:35 ` Helge Deller
2023-05-14 18:20 ` Hugh Dickins
2023-05-10 4:54 ` [PATCH 09/23] parisc: unmap_uncached_pte() use pte_offset_kernel() Hugh Dickins
2023-05-10 4:55 ` [PATCH 10/23] parisc/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-05-10 4:56 ` [PATCH 11/23] powerpc: kvmppc_unmap_free_pmd() pte_offset_kernel() Hugh Dickins
2023-05-10 4:57 ` [PATCH 12/23] powerpc: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-10 4:58 ` [PATCH 13/23] powerpc/hugetlb: pte_alloc_huge() Hugh Dickins
2023-05-10 4:59 ` [PATCH 14/23] riscv/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-05-10 8:01 ` Alexandre Ghiti
2023-05-10 14:01 ` Palmer Dabbelt
2023-05-10 5:01 ` [PATCH 15/23] s390: allow pte_offset_map_lock() to fail Hugh Dickins
2023-05-17 10:35 ` Claudio Imbrenda
2023-05-17 21:50 ` Hugh Dickins
2023-05-23 12:00 ` Claudio Imbrenda [this message]
2023-05-24 1:49 ` Hugh Dickins
2023-05-25 7:23 ` Claudio Imbrenda
2023-05-10 5:02 ` [PATCH 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock() Hugh Dickins
2023-05-17 11:28 ` Alexander Gordeev
2023-05-10 5:03 ` [PATCH 17/23] sh/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-05-10 5:04 ` [PATCH 18/23] sparc/hugetlb: " Hugh Dickins
2023-05-10 5:05 ` [PATCH 19/23] sparc: allow pte_offset_map() to fail Hugh Dickins
2023-05-10 5:07 ` [PATCH 20/23] sparc: iounit and iommu use pte_offset_kernel() Hugh Dickins
2023-05-10 5:08 ` [PATCH 21/23] x86: Allow get_locked_pte() to fail Hugh Dickins
2023-05-10 8:18 ` Peter Zijlstra
2023-05-11 3:16 ` Hugh Dickins
2023-05-11 7:29 ` Peter Zijlstra
2023-05-10 5:09 ` [PATCH 22/23] x86: sme_populate_pgd() use pte_offset_kernel() Hugh Dickins
2023-05-10 5:11 ` [PATCH 23/23] xtensa: add pte_unmap() to balance pte_offset_map() Hugh Dickins
2023-05-10 6:07 ` [PATCH 00/23] arch: allow pte_offset_map[_lock]() to fail Matthew Wilcox
2023-05-11 4:35 ` Hugh Dickins
2023-05-11 14:02 ` Matthew Wilcox
2023-05-11 22:37 ` Hugh Dickins
2023-05-12 3:38 ` Mike Rapoport
2023-05-16 10:41 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230523140056.55b664b1@p-imbrenda \
--to=imbrenda@linux.ibm.com \
--cc=alexghiti@rivosinc.com \
--cc=borntraeger@linux.ibm.com \
--cc=catalin.marinas@arm.com \
--cc=chris@zankel.net \
--cc=dave.anglin@bell.net \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=geert@linux-m68k.org \
--cc=gerg@linux-m68k.org \
--cc=glaubitz@physik.fu-berlin.de \
--cc=hca@linux.ibm.com \
--cc=hughd@google.com \
--cc=jcmvbkbc@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-sh@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=monstr@monstr.eu \
--cc=sparclinux@vger.kernel.org \
--cc=surenb@google.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).