linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 00/23] arch: allow pte_offset_map[_lock]() to fail
       [not found]     ` <ZFz1j1slZHCQmwMJ@casper.infradead.org>
@ 2023-05-16 10:41       ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2023-05-16 10:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, David Hildenbrand, Suren Baghdasaryan,
	Qi Zheng, Russell King, Catalin Marinas, Will Deacon,
	Geert Uytterhoeven, Greg Ungerer, Michal Simek,
	Thomas Bogendoerfer, Helge Deller, John David Anglin,
	Aneesh Kumar K.V, Michael Ellerman, Alexandre Ghiti,
	Palmer Dabbelt, Heiko Carstens, Christian Borntraeger,
	Claudio Imbrenda, John Paul Adrian Glaubitz, David S. Miller,
	Chris Zankel, Max Filippov, x86, linux-arm-kernel, linux-ia64,
	linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, sparclinux, linux-kernel, linux-mm,
	Michel Lespinasse

On Thu, May 11, 2023 at 03:02:55PM +0100, Matthew Wilcox wrote:

> We also talked about moving x86 to always RCU-free page tables in
> order to make accessing /proc/$pid/smaps lockless.  I believe Michel
> is going to take a swing at this project.

Shouldn't be too controversial I think -- effectively everybody already
has it enabled because everybody builds with KVM enabled.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail
       [not found] ` <94aec8fe-383f-892-dcbf-d4c14e460a7@google.com>
@ 2023-05-17 10:35   ` Claudio Imbrenda
  2023-05-17 21:50     ` Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: Claudio Imbrenda @ 2023-05-17 10:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, John Paul Adrian Glaubitz, David S. Miller,
	Chris Zankel, Max Filippov, x86, linux-arm-kernel, linux-ia64,
	linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, sparclinux, linux-kernel, linux-mm

On Tue, 9 May 2023 22:01:16 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> In rare transient cases, not yet made possible, pte_offset_map() and
> pte_offset_map_lock() may not find a page table: handle appropriately.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  arch/s390/kernel/uv.c  |  2 ++
>  arch/s390/mm/gmap.c    |  2 ++
>  arch/s390/mm/pgtable.c | 12 +++++++++---
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index cb2ee06df286..3c62d1b218b1 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>  
>  	rc = -ENXIO;
>  	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> +	if (!ptep)
> +		goto out;
>  	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
>  		page = pte_page(*ptep);
>  		rc = -EAGAIN;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..d198fc9475a2 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2549,6 +2549,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
>  		spinlock_t *ptl;
>  
>  		ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> +		if (!ptep)
> +			break;

so if pte_offset_map_lock fails, we abort and skip both the failed
entry and the rest of the entries?

can pte_offset_map_lock be retried immediately if it fails? (consider
that we currently don't allow THP with KVM guests)

Would something like this:

do {
	ptep = pte_offset_map_lock(...);
	mb();	/* maybe? */
} while (!ptep);

make sense?


otherwise maybe it's better to return an error and retry the whole
walk_page_range() in s390_enable_sie() ? it's a slow path anyway.

>  		if (is_zero_pfn(pte_pfn(*ptep)))
>  			ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID));
>  		pte_unmap_unlock(ptep, ptl);

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock()
       [not found] ` <5579873-d7b-65e-5de0-a2ba8a144e7@google.com>
@ 2023-05-17 11:28   ` Alexander Gordeev
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander Gordeev @ 2023-05-17 11:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Tue, May 09, 2023 at 10:02:32PM -0700, Hugh Dickins wrote:
> pte_alloc_map_lock() expects to be followed by pte_unmap_unlock(): to
> keep balance in future, pass ptep as well as ptl to gmap_pte_op_end(),
> and use pte_unmap_unlock() instead of direct spin_unlock() (even though
> ptep ends up unused inside the macro).
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  arch/s390/mm/gmap.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail
  2023-05-17 10:35   ` [PATCH 15/23] s390: allow pte_offset_map_lock() " Claudio Imbrenda
@ 2023-05-17 21:50     ` Hugh Dickins
  2023-05-23 12:00       ` Claudio Imbrenda
  0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-05-17 21:50 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Russell King, Catalin Marinas,
	Will Deacon, Geert Uytterhoeven, Greg Ungerer, Michal Simek,
	Thomas Bogendoerfer, Helge Deller, John David Anglin,
	Aneesh Kumar K.V, Michael Ellerman, Alexandre Ghiti,
	Palmer Dabbelt, Heiko Carstens, Christian Borntraeger,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Wed, 17 May 2023, Claudio Imbrenda wrote:
> On Tue, 9 May 2023 22:01:16 -0700 (PDT)
> Hugh Dickins <hughd@google.com> wrote:
> 
> > In rare transient cases, not yet made possible, pte_offset_map() and
> > pte_offset_map_lock() may not find a page table: handle appropriately.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> >  arch/s390/kernel/uv.c  |  2 ++
> >  arch/s390/mm/gmap.c    |  2 ++
> >  arch/s390/mm/pgtable.c | 12 +++++++++---
> >  3 files changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> > index cb2ee06df286..3c62d1b218b1 100644
> > --- a/arch/s390/kernel/uv.c
> > +++ b/arch/s390/kernel/uv.c
> > @@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> >  
> >  	rc = -ENXIO;
> >  	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> > +	if (!ptep)
> > +		goto out;

You may or may not be asking about this instance too.  When I looked at
how the code lower down handles -ENXIO (promoting it to -EFAULT if an
access fails, or to -EAGAIN to ask for a retry), this looked just right
(whereas using -EAGAIN here would be wrong: that expects a "page" which
has not been initialized at this point).

> >  	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
> >  		page = pte_page(*ptep);
> >  		rc = -EAGAIN;
> > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > index dc90d1eb0d55..d198fc9475a2 100644
> > --- a/arch/s390/mm/gmap.c
> > +++ b/arch/s390/mm/gmap.c
> > @@ -2549,6 +2549,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
> >  		spinlock_t *ptl;
> >  
> >  		ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> > +		if (!ptep)
> > +			break;
> 
> so if pte_offset_map_lock fails, we abort and skip both the failed
> entry and the rest of the entries?

Yes.

> 
> can pte_offset_map_lock be retried immediately if it fails? (consider
> that we currently don't allow THP with KVM guests)
> 
> Would something like this:
> 
> do {
> 	ptep = pte_offset_map_lock(...);
> 	mb();	/* maybe? */
> } while (!ptep);
> 
> make sense?

No.  But you're absolutely right to be asking: thank you for looking
into it so carefully - and I realize that it's hard at this stage to
judge what's appropriate, when I've not yet even posted the endpoint
of these changes, the patches which make it possible not to find a
page table here.  And I'm intentionally keeping that vague, because
although I shall only introduce a THP case, I do expect it to be built
upon later in reclaiming empty page tables: it would be nice not to
have to change the arch code again when extending further.

My "rare transient cases" phrase may be somewhat misleading: one thing
that's wrong with your tight pte_offset_map_lock() loop above is that
the pmd entry pointing to page table may have been suddenly replaced by
a pmd_none() entry; and there's nothing in your loop above to break out
if that is so.

But if a page table is suddenly removed, that would be because it was
either empty, or replaced by a THP entry, or easily reconstructable on
demand (by that, I probably mean it was only mapping shared file pages,
which can just be refaulted if needed again).

The case you're wary of, is if the page table were removed briefly,
then put back shortly after: and still contains zero pages further down.
That's not something mm does now, nor at the end of my several series,
nor that I imagine us wanting to do in future: but I am struggling to
find a killer argument to persuade you that it could never be done -
most pages in a page table do need rmap tracking, which will BUG if
it's broken, but that argument happens not to apply to the zero page.

(Hmm, there could be somewhere, where we would find it convenient to
remove a page table with intent to do ...something, then validation
of that isolated page table fails, so we just put it back again.)

Is it good enough for me to promise you that we won't do that?

There are several ways in which we could change __zap_zero_pages(),
but I don't see them as actually dealing with the concern at hand.

One change, I've tended to make at the mm end but did not dare
to interfere here: it would seem more sensible to do a single
pte_offset_map_lock() outside the loop, return if that fails,
increment ptep inside the loop, pte_unmap_unlock() after the loop.

But perhaps you have preemption reasons for not wanting that; and
although it would eliminate the oddity of half-processing a page
table, it would not really resolve the problem at hand: because,
what if this page table got removed just before __zap_zero_pages()
tries to take the lock, then got put back just after?

Another change: I see __zap_zero_pages() is driven by walk_page_range(),
and over at the mm end I'm usually setting walk->action to ACTION_AGAIN
in these failure cases; but thought that an unnecessary piece of magic
here, and cannot see how it could actually help.  Your "retry the whole
walk_page_range()" suggestion below would be a heavier equivalent of
that: but neither way gives confidence, if a page table could actually
be removed then reinserted without mmap_write_lock().

I think I want to keep this s390 __zap_zero_pages() issue in mind, it is
important and thank you for raising it; but don't see any change to the
patch as actually needed.

Hugh

> 
> 
> otherwise maybe it's better to return an error and retry the whole
> walk_page_range() in s390_enable_sie() ? it's a slow path anyway.
> 
> >  		if (is_zero_pfn(pte_pfn(*ptep)))
> >  			ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID));
> >  		pte_unmap_unlock(ptep, ptl);
> 
> [...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail
  2023-05-17 21:50     ` Hugh Dickins
@ 2023-05-23 12:00       ` Claudio Imbrenda
  2023-05-24  1:49         ` Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: Claudio Imbrenda @ 2023-05-23 12:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, John Paul Adrian Glaubitz, David S. Miller,
	Chris Zankel, Max Filippov, x86, linux-arm-kernel, linux-ia64,
	linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, sparclinux, linux-kernel, linux-mm

On Wed, 17 May 2023 14:50:28 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> On Wed, 17 May 2023, Claudio Imbrenda wrote:
> > On Tue, 9 May 2023 22:01:16 -0700 (PDT)
> > Hugh Dickins <hughd@google.com> wrote:
> >   
> > > In rare transient cases, not yet made possible, pte_offset_map() and
> > > pte_offset_map_lock() may not find a page table: handle appropriately.
> > > 
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > ---
> > >  arch/s390/kernel/uv.c  |  2 ++
> > >  arch/s390/mm/gmap.c    |  2 ++
> > >  arch/s390/mm/pgtable.c | 12 +++++++++---
> > >  3 files changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> > > index cb2ee06df286..3c62d1b218b1 100644
> > > --- a/arch/s390/kernel/uv.c
> > > +++ b/arch/s390/kernel/uv.c
> > > @@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> > >  
> > >  	rc = -ENXIO;
> > >  	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> > > +	if (!ptep)
> > > +		goto out;  
> 
> You may or may not be asking about this instance too.  When I looked at

actually no, because of the reasons you give here :)

> how the code lower down handles -ENXIO (promoting it to -EFAULT if an
> access fails, or to -EAGAIN to ask for a retry), this looked just right
> (whereas using -EAGAIN here would be wrong: that expects a "page" which
> has not been initialized at this point).
> 
> > >  	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
> > >  		page = pte_page(*ptep);
> > >  		rc = -EAGAIN;
> > > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > > index dc90d1eb0d55..d198fc9475a2 100644
> > > --- a/arch/s390/mm/gmap.c
> > > +++ b/arch/s390/mm/gmap.c
> > > @@ -2549,6 +2549,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
> > >  		spinlock_t *ptl;
> > >  
> > >  		ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> > > +		if (!ptep)
> > > +			break;  
> > 
> > so if pte_offset_map_lock fails, we abort and skip both the failed
> > entry and the rest of the entries?  
> 
> Yes.
> 
> > 
> > can pte_offset_map_lock be retried immediately if it fails? (consider
> > that we currently don't allow THP with KVM guests)
> > 
> > Would something like this:
> > 
> > do {
> > 	ptep = pte_offset_map_lock(...);
> > 	mb();	/* maybe? */
> > } while (!ptep);
> > 
> > make sense?  
> 
> No.  But you're absolutely right to be asking: thank you for looking
> into it so carefully - and I realize that it's hard at this stage to
> judge what's appropriate, when I've not yet even posted the endpoint
> of these changes, the patches which make it possible not to find a
> page table here.  And I'm intentionally keeping that vague, because
> although I shall only introduce a THP case, I do expect it to be built
> upon later in reclaiming empty page tables: it would be nice not to
> have to change the arch code again when extending further.
> 
> My "rare transient cases" phrase may be somewhat misleading: one thing
> that's wrong with your tight pte_offset_map_lock() loop above is that
> the pmd entry pointing to page table may have been suddenly replaced
> by a pmd_none() entry; and there's nothing in your loop above to
> break out if that is so.
> 
> But if a page table is suddenly removed, that would be because it was
> either empty, or replaced by a THP entry, or easily reconstructable on
> demand (by that, I probably mean it was only mapping shared file
> pages, which can just be refaulted if needed again).
> 
> The case you're wary of, is if the page table were removed briefly,
> then put back shortly after: and still contains zero pages further
> down. That's not something mm does now, nor at the end of my several
> series, nor that I imagine us wanting to do in future: but I am
> struggling to find a killer argument to persuade you that it could
> never be done - most pages in a page table do need rmap tracking,
> which will BUG if it's broken, but that argument happens not to apply
> to the zero page.
> 
> (Hmm, there could be somewhere, where we would find it convenient to
> remove a page table with intent to do ...something, then validation
> of that isolated page table fails, so we just put it back again.)
> 
> Is it good enough for me to promise you that we won't do that?
> 
> There are several ways in which we could change __zap_zero_pages(),
> but I don't see them as actually dealing with the concern at hand.
> 
> One change, I've tended to make at the mm end but did not dare
> to interfere here: it would seem more sensible to do a single
> pte_offset_map_lock() outside the loop, return if that fails,
> increment ptep inside the loop, pte_unmap_unlock() after the loop.
> 
> But perhaps you have preemption reasons for not wanting that; and
> although it would eliminate the oddity of half-processing a page
> table, it would not really resolve the problem at hand: because,
> what if this page table got removed just before __zap_zero_pages()
> tries to take the lock, then got put back just after?
> 
> Another change: I see __zap_zero_pages() is driven by
> walk_page_range(), and over at the mm end I'm usually setting
> walk->action to ACTION_AGAIN in these failure cases; but thought that
> an unnecessary piece of magic here, and cannot see how it could
> actually help.  Your "retry the whole walk_page_range()" suggestion
> below would be a heavier equivalent of that: but neither way gives
> confidence, if a page table could actually be removed then reinserted
> without mmap_write_lock().
> 
> I think I want to keep this s390 __zap_zero_pages() issue in mind, it
> is important and thank you for raising it; but don't see any change
> to the patch as actually needed.
> 
> Hugh

so if I understand the above correctly, pte_offset_map_lock will only
fail if the whole page table has disappeared, and in that case, it will
never reappear with zero pages, therefore we can safely skip (in that
case just break). if we were to do a continue instead of a break, we
would most likely fail again anyway.

in that case I would still like a small change in your patch: please
write a short (2~3 lines max) comment about why it's ok to do things
that way

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail
  2023-05-23 12:00       ` Claudio Imbrenda
@ 2023-05-24  1:49         ` Hugh Dickins
  2023-05-25  7:23           ` Claudio Imbrenda
  0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-05-24  1:49 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Russell King, Catalin Marinas,
	Will Deacon, Geert Uytterhoeven, Greg Ungerer, Michal Simek,
	Thomas Bogendoerfer, Helge Deller, John David Anglin,
	Aneesh Kumar K.V, Michael Ellerman, Alexandre Ghiti,
	Palmer Dabbelt, Heiko Carstens, Christian Borntraeger,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Tue, 23 May 2023, Claudio Imbrenda wrote:
> 
> so if I understand the above correctly, pte_offset_map_lock will only
> fail if the whole page table has disappeared, and in that case, it will
> never reappear with zero pages, therefore we can safely skip (in that
> case just break). if we were to do a continue instead of a break, we
> would most likely fail again anyway.

Yes, that's the most likely; and you hold mmap_write_lock() there,
and VM_NOHUGEPAGE on all vmas, so I think it's the only foreseeable
possibility.

> 
> in that case I would still like a small change in your patch: please
> write a short (2~3 lines max) comment about why it's ok to do things
> that way

Sure.

But I now see that I've disobeyed you, and gone to 4 lines (but in the
comment above the function, so as not to distract from the code itself):
is this good wording to you?  I needed to research how they were stopped
from coming in afterwards, so wanted to put something greppable in there.

And, unless I'm misunderstanding, that "after THP was enabled" was
always supposed to say "after THP was disabled" (because splitting a
huge zero page pmd inserts a a page table full of little zero ptes).

Or would you prefer the comment in the commit message instead,
or down just above the pte_offset_map_lock() line?

It would much better if I could find one place at the mm end, to
enforce its end of the contract; but cannot think how to do that.

Hugh

--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2537,7 +2537,12 @@ static inline void thp_split_mm(struct mm_struct *mm)
  * Remove all empty zero pages from the mapping for lazy refaulting
  * - This must be called after mm->context.has_pgste is set, to avoid
  *   future creation of zero pages
- * - This must be called after THP was enabled
+ * - This must be called after THP was disabled.
+ *
+ * mm contracts with s390, that even if mm were to remove a page table,
+ * racing with the loop below and so causing pte_offset_map_lock() to fail,
+ * it will never insert a page table containing empty zero pages once
+ * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set.
  */
 static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
 			   unsigned long end, struct mm_walk *walk)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail
  2023-05-24  1:49         ` Hugh Dickins
@ 2023-05-25  7:23           ` Claudio Imbrenda
  0 siblings, 0 replies; 9+ messages in thread
From: Claudio Imbrenda @ 2023-05-25  7:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, John Paul Adrian Glaubitz, David S. Miller,
	Chris Zankel, Max Filippov, x86, linux-arm-kernel, linux-ia64,
	linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, sparclinux, linux-kernel, linux-mm

On Tue, 23 May 2023 18:49:14 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> On Tue, 23 May 2023, Claudio Imbrenda wrote:
> > 
> > so if I understand the above correctly, pte_offset_map_lock will only
> > fail if the whole page table has disappeared, and in that case, it will
> > never reappear with zero pages, therefore we can safely skip (in that
> > case just break). if we were to do a continue instead of a break, we
> > would most likely fail again anyway.  
> 
> Yes, that's the most likely; and you hold mmap_write_lock() there,
> and VM_NOHUGEPAGE on all vmas, so I think it's the only foreseeable
> possibility.
> 
> > 
> > in that case I would still like a small change in your patch: please
> > write a short (2~3 lines max) comment about why it's ok to do things
> > that way  
> 
> Sure.
> 
> But I now see that I've disobeyed you, and gone to 4 lines (but in the
> comment above the function, so as not to distract from the code itself):
> is this good wording to you?  I needed to research how they were stopped
> from coming in afterwards, so wanted to put something greppable in there.
> 
> And, unless I'm misunderstanding, that "after THP was enabled" was
> always supposed to say "after THP was disabled" (because splitting a
> huge zero page pmd inserts a a page table full of little zero ptes).

indeed, thanks for noticing and fixing it

> 
> Or would you prefer the comment in the commit message instead,
> or down just above the pte_offset_map_lock() line?
> 
> It would much better if I could find one place at the mm end, to
> enforce its end of the contract; but cannot think how to do that.
> 
> Hugh
> 
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2537,7 +2537,12 @@ static inline void thp_split_mm(struct mm_struct *mm)
>   * Remove all empty zero pages from the mapping for lazy refaulting
>   * - This must be called after mm->context.has_pgste is set, to avoid
>   *   future creation of zero pages
> - * - This must be called after THP was enabled
> + * - This must be called after THP was disabled.
> + *
> + * mm contracts with s390, that even if mm were to remove a page table,
> + * racing with the loop below and so causing pte_offset_map_lock() to fail,
> + * it will never insert a page table containing empty zero pages once
> + * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set.
>   */
>  static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
>  			   unsigned long end, struct mm_walk *walk)

looks good, thanks

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 02/23] arm64: allow pte_offset_map() to fail
       [not found] ` <e72f6f3e-a8d4-3ed-2b4-5d3ced41484@google.com>
@ 2023-05-25 16:37   ` Catalin Marinas
  0 siblings, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2023-05-25 16:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Russell King, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Tue, May 09, 2023 at 09:43:47PM -0700, Hugh Dickins wrote:
> In rare transient cases, not yet made possible, pte_offset_map() and
> pte_offset_map_lock() may not find a page table: handle appropriately.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge()
       [not found] ` <dda2885-929b-c278-14e-5f447e9eec52@google.com>
@ 2023-05-25 16:37   ` Catalin Marinas
  0 siblings, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2023-05-25 16:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Russell King, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Tue, May 09, 2023 at 09:45:57PM -0700, Hugh Dickins wrote:
> pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
> that: to keep balance in future, use the recently added pte_alloc_huge()
> instead; with pte_offset_huge() a better name for pte_offset_kernel().
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-25 16:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <77a5d8c-406b-7068-4f17-23b7ac53bc83@google.com>
     [not found] ` <ZFs0k2rrLPH9A/UU@casper.infradead.org>
     [not found]   ` <d7f3c7b2-25b8-ef66-98a8-43d68f4499f@google.com>
     [not found]     ` <ZFz1j1slZHCQmwMJ@casper.infradead.org>
2023-05-16 10:41       ` [PATCH 00/23] arch: allow pte_offset_map[_lock]() to fail Peter Zijlstra
     [not found] ` <94aec8fe-383f-892-dcbf-d4c14e460a7@google.com>
2023-05-17 10:35   ` [PATCH 15/23] s390: allow pte_offset_map_lock() " Claudio Imbrenda
2023-05-17 21:50     ` Hugh Dickins
2023-05-23 12:00       ` Claudio Imbrenda
2023-05-24  1:49         ` Hugh Dickins
2023-05-25  7:23           ` Claudio Imbrenda
     [not found] ` <5579873-d7b-65e-5de0-a2ba8a144e7@google.com>
2023-05-17 11:28   ` [PATCH 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock() Alexander Gordeev
     [not found] ` <e72f6f3e-a8d4-3ed-2b4-5d3ced41484@google.com>
2023-05-25 16:37   ` [PATCH 02/23] arm64: allow pte_offset_map() to fail Catalin Marinas
     [not found] ` <dda2885-929b-c278-14e-5f447e9eec52@google.com>
2023-05-25 16:37   ` [PATCH 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge() Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).