* [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
[not found] <20250827220141.262669-1-david@redhat.com>
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 7:31 ` Wei Yang
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages() David Hildenbrand
` (24 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
Let's reject them early, which in turn makes folio_alloc_gigantic() reject
them properly.
To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
and calculate MAX_FOLIO_NR_PAGES based on that.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 6 ++++--
mm/page_alloc.c | 5 ++++-
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00c8a54127d37..77737cbf2216a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2055,11 +2055,13 @@ static inline long folio_nr_pages(const struct folio *folio)
/* Only hugetlbfs can allocate folios larger than MAX_ORDER */
#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
-#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER)
+#define MAX_FOLIO_ORDER PUD_ORDER
#else
-#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES
+#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
#endif
+#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
+
/*
* compound_nr() returns the number of pages in this potentially compound
* page. compound_nr() can be called on a tail page, and is defined to
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index baead29b3e67b..426bc404b80cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
int alloc_contig_range_noprof(unsigned long start, unsigned long end,
acr_flags_t alloc_flags, gfp_t gfp_mask)
{
+ const unsigned int order = ilog2(end - start);
unsigned long outer_start, outer_end;
int ret = 0;
@@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
PB_ISOLATE_MODE_CMA_ALLOC :
PB_ISOLATE_MODE_OTHER;
+ if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
+ return -EINVAL;
+
gfp_mask = current_gfp_context(gfp_mask);
if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
return -EINVAL;
@@ -6947,7 +6951,6 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
free_contig_range(end, outer_end - end);
} else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
struct page *head = pfn_to_page(start);
- int order = ilog2(end - start);
check_new_pages(head, order);
prep_new_page(head, order, gfp_mask, 0);
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-27 22:01 ` [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
@ 2025-08-28 7:31 ` Wei Yang
2025-08-28 14:37 ` Lorenzo Stoakes
2025-08-29 0:33 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2025-08-28 7:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:10AM +0200, David Hildenbrand wrote:
>Let's reject them early, which in turn makes folio_alloc_gigantic() reject
>them properly.
>
>To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
>and calculate MAX_FOLIO_NR_PAGES based on that.
>
>Reviewed-by: Zi Yan <ziy@nvidia.com>
>Acked-by: SeongJae Park <sj@kernel.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-27 22:01 ` [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
2025-08-28 7:31 ` Wei Yang
@ 2025-08-28 14:37 ` Lorenzo Stoakes
2025-08-29 10:06 ` David Hildenbrand
2025-08-29 0:33 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 14:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:10AM +0200, David Hildenbrand wrote:
> Let's reject them early, which in turn makes folio_alloc_gigantic() reject
> them properly.
>
> To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
> and calculate MAX_FOLIO_NR_PAGES based on that.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: SeongJae Park <sj@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Some nits, but overall LGTM so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mm.h | 6 ++++--
> mm/page_alloc.c | 5 ++++-
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 00c8a54127d37..77737cbf2216a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2055,11 +2055,13 @@ static inline long folio_nr_pages(const struct folio *folio)
>
> /* Only hugetlbfs can allocate folios larger than MAX_ORDER */
> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> -#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER)
> +#define MAX_FOLIO_ORDER PUD_ORDER
> #else
> -#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES
> +#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
> #endif
>
> +#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
BIT()?
> +
> /*
> * compound_nr() returns the number of pages in this potentially compound
> * page. compound_nr() can be called on a tail page, and is defined to
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index baead29b3e67b..426bc404b80cc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
> int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> acr_flags_t alloc_flags, gfp_t gfp_mask)
> {
> + const unsigned int order = ilog2(end - start);
> unsigned long outer_start, outer_end;
> int ret = 0;
>
> @@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> PB_ISOLATE_MODE_CMA_ALLOC :
> PB_ISOLATE_MODE_OTHER;
>
> + if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
> + return -EINVAL;
Possibly not worth it for a one off, but be nice to have this as a helper function, like:
static bool is_valid_order(gfp_t gfp_mask, unsigned int order)
{
return !(gfp_mask & __GFP_COMP) || order <= MAX_FOLIO_ORDER;
}
Then makes this:
if (WARN_ON_ONCE(!is_valid_order(gfp_mask, order)))
return -EINVAL;
Kinda self-documenting!
> +
> gfp_mask = current_gfp_context(gfp_mask);
> if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
> return -EINVAL;
> @@ -6947,7 +6951,6 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> free_contig_range(end, outer_end - end);
> } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
> struct page *head = pfn_to_page(start);
> - int order = ilog2(end - start);
>
> check_new_pages(head, order);
> prep_new_page(head, order, gfp_mask, 0);
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-28 14:37 ` Lorenzo Stoakes
@ 2025-08-29 10:06 ` David Hildenbrand
2025-08-29 12:31 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 10:06 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On 28.08.25 16:37, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:10AM +0200, David Hildenbrand wrote:
>> Let's reject them early, which in turn makes folio_alloc_gigantic() reject
>> them properly.
>>
>> To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
>> and calculate MAX_FOLIO_NR_PAGES based on that.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Acked-by: SeongJae Park <sj@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Some nits, but overall LGTM so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
>> ---
>> include/linux/mm.h | 6 ++++--
>> mm/page_alloc.c | 5 ++++-
>> 2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 00c8a54127d37..77737cbf2216a 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2055,11 +2055,13 @@ static inline long folio_nr_pages(const struct folio *folio)
>>
>> /* Only hugetlbfs can allocate folios larger than MAX_ORDER */
>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>> -#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER)
>> +#define MAX_FOLIO_ORDER PUD_ORDER
>> #else
>> -#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES
>> +#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
>> #endif
>>
>> +#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
>
> BIT()?
I don't think we want to use BIT whenever we convert from order -> folio
-- which is why we also don't do that in other code.
BIT() is nice in the context of flags and bitmaps, but not really in the
context of converting orders to pages.
One could argue that maybe one would want a order_to_pages() helper
(that could use BIT() internally), but I am certainly not someone that
would suggest that at this point ... :)
>
>> +
>> /*
>> * compound_nr() returns the number of pages in this potentially compound
>> * page. compound_nr() can be called on a tail page, and is defined to
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index baead29b3e67b..426bc404b80cc 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
>> int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>> acr_flags_t alloc_flags, gfp_t gfp_mask)
>> {
>> + const unsigned int order = ilog2(end - start);
>> unsigned long outer_start, outer_end;
>> int ret = 0;
>>
>> @@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>> PB_ISOLATE_MODE_CMA_ALLOC :
>> PB_ISOLATE_MODE_OTHER;
>>
>> + if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
>> + return -EINVAL;
>
> Possibly not worth it for a one off, but be nice to have this as a helper function, like:
>
> static bool is_valid_order(gfp_t gfp_mask, unsigned int order)
> {
> return !(gfp_mask & __GFP_COMP) || order <= MAX_FOLIO_ORDER;
> }
>
> Then makes this:
>
> if (WARN_ON_ONCE(!is_valid_order(gfp_mask, order)))
> return -EINVAL;
>
> Kinda self-documenting!
I don't like it -- especially forwarding __GFP_COMP.
is_valid_folio_order() to wrap the order check? Also not sure.
So I'll leave it as is I think.
Thanks for all the review!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-29 10:06 ` David Hildenbrand
@ 2025-08-29 12:31 ` Lorenzo Stoakes
2025-08-29 13:09 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On Fri, Aug 29, 2025 at 12:06:21PM +0200, David Hildenbrand wrote:
> On 28.08.25 16:37, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:10AM +0200, David Hildenbrand wrote:
> > > Let's reject them early, which in turn makes folio_alloc_gigantic() reject
> > > them properly.
> > >
> > > To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
> > > and calculate MAX_FOLIO_NR_PAGES based on that.
> > >
> > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > Acked-by: SeongJae Park <sj@kernel.org>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > Some nits, but overall LGTM so:
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > > ---
> > > include/linux/mm.h | 6 ++++--
> > > mm/page_alloc.c | 5 ++++-
> > > 2 files changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 00c8a54127d37..77737cbf2216a 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2055,11 +2055,13 @@ static inline long folio_nr_pages(const struct folio *folio)
> > >
> > > /* Only hugetlbfs can allocate folios larger than MAX_ORDER */
> > > #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> > > -#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER)
> > > +#define MAX_FOLIO_ORDER PUD_ORDER
> > > #else
> > > -#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES
> > > +#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
> > > #endif
> > >
> > > +#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
> >
> > BIT()?
>
> I don't think we want to use BIT whenever we convert from order -> folio --
> which is why we also don't do that in other code.
It seems a bit arbitrary, like we open-code this (at risk of making a mistake)
in some places but not others.
>
> BIT() is nice in the context of flags and bitmaps, but not really in the
> context of converting orders to pages.
It's nice for setting a specific bit :)
>
> One could argue that maybe one would want a order_to_pages() helper (that
> could use BIT() internally), but I am certainly not someone that would
> suggest that at this point ... :)
I mean maybe.
Anyway as I said none of this is massively important, the open-coding here is
correct, just seems silly.
>
> >
> > > +
> > > /*
> > > * compound_nr() returns the number of pages in this potentially compound
> > > * page. compound_nr() can be called on a tail page, and is defined to
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index baead29b3e67b..426bc404b80cc 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
> > > int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> > > acr_flags_t alloc_flags, gfp_t gfp_mask)
Funny btw th
> > > {
> > > + const unsigned int order = ilog2(end - start);
> > > unsigned long outer_start, outer_end;
> > > int ret = 0;
> > >
> > > @@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> > > PB_ISOLATE_MODE_CMA_ALLOC :
> > > PB_ISOLATE_MODE_OTHER;
> > >
> > > + if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
> > > + return -EINVAL;
> >
> > Possibly not worth it for a one off, but be nice to have this as a helper function, like:
> >
> > static bool is_valid_order(gfp_t gfp_mask, unsigned int order)
> > {
> > return !(gfp_mask & __GFP_COMP) || order <= MAX_FOLIO_ORDER;
> > }
> >
> > Then makes this:
> >
> > if (WARN_ON_ONCE(!is_valid_order(gfp_mask, order)))
> > return -EINVAL;
> >
> > Kinda self-documenting!
>
> I don't like it -- especially forwarding __GFP_COMP.
>
> is_valid_folio_order() to wrap the order check? Also not sure.
OK, it's not a big deal.
Can we have a comment explaining this though? As people might be confused
as to why we check this here and not elsewhere.
>
> So I'll leave it as is I think.
Right fine.
>
> Thanks for all the review!
>
> --
> Cheers
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-29 12:31 ` Lorenzo Stoakes
@ 2025-08-29 13:09 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 13:09 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
>
> It seems a bit arbitrary, like we open-code this (at risk of making a mistake)
> in some places but not others.
[...]
>>
>> One could argue that maybe one would want a order_to_pages() helper (that
>> could use BIT() internally), but I am certainly not someone that would
>> suggest that at this point ... :)
>
> I mean maybe.
>
> Anyway as I said none of this is massively important, the open-coding here is
> correct, just seems silly.
Maybe we really want a ORDER_PAGES() and PAGES_ORDER().
But I mean, we also have PHYS_PFN() PFN_PHYS() and see how many "<<
PAGE_SIZE" etc we are using all over the place.
>
>>
>>>
>>>> +
>>>> /*
>>>> * compound_nr() returns the number of pages in this potentially compound
>>>> * page. compound_nr() can be called on a tail page, and is defined to
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index baead29b3e67b..426bc404b80cc 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
>>>> int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>>> acr_flags_t alloc_flags, gfp_t gfp_mask)
>
> Funny btw th
>
>>>> {
>>>> + const unsigned int order = ilog2(end - start);
>>>> unsigned long outer_start, outer_end;
>>>> int ret = 0;
>>>>
>>>> @@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>>> PB_ISOLATE_MODE_CMA_ALLOC :
>>>> PB_ISOLATE_MODE_OTHER;
>>>>
>>>> + if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
>>>> + return -EINVAL;
>>>
>>> Possibly not worth it for a one off, but be nice to have this as a helper function, like:
>>>
>>> static bool is_valid_order(gfp_t gfp_mask, unsigned int order)
>>> {
>>> return !(gfp_mask & __GFP_COMP) || order <= MAX_FOLIO_ORDER;
>>> }
>>>
>>> Then makes this:
>>>
>>> if (WARN_ON_ONCE(!is_valid_order(gfp_mask, order)))
>>> return -EINVAL;
>>>
>>> Kinda self-documenting!
>>
>> I don't like it -- especially forwarding __GFP_COMP.
>>
>> is_valid_folio_order() to wrap the order check? Also not sure.
>
> OK, it's not a big deal.
>
> Can we have a comment explaining this though? As people might be confused
> as to why we check this here and not elsewhere.
I can add a comment.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-27 22:01 ` [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
2025-08-28 7:31 ` Wei Yang
2025-08-28 14:37 ` Lorenzo Stoakes
@ 2025-08-29 0:33 ` Liam R. Howlett
2025-08-29 9:58 ` David Hildenbrand
2 siblings, 1 reply; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 0:33 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
* David Hildenbrand <david@redhat.com> [250827 18:04]:
> Let's reject them early, which in turn makes folio_alloc_gigantic() reject
> them properly.
>
> To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
> and calculate MAX_FOLIO_NR_PAGES based on that.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: SeongJae Park <sj@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Nit below, but..
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> include/linux/mm.h | 6 ++++--
> mm/page_alloc.c | 5 ++++-
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 00c8a54127d37..77737cbf2216a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2055,11 +2055,13 @@ static inline long folio_nr_pages(const struct folio *folio)
>
> /* Only hugetlbfs can allocate folios larger than MAX_ORDER */
> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> -#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER)
> +#define MAX_FOLIO_ORDER PUD_ORDER
> #else
> -#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES
> +#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
> #endif
>
> +#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
> +
> /*
> * compound_nr() returns the number of pages in this potentially compound
> * page. compound_nr() can be called on a tail page, and is defined to
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index baead29b3e67b..426bc404b80cc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
> int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> acr_flags_t alloc_flags, gfp_t gfp_mask)
> {
> + const unsigned int order = ilog2(end - start);
> unsigned long outer_start, outer_end;
> int ret = 0;
>
> @@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> PB_ISOLATE_MODE_CMA_ALLOC :
> PB_ISOLATE_MODE_OTHER;
>
> + if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
> + return -EINVAL;
> +
> gfp_mask = current_gfp_context(gfp_mask);
> if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
> return -EINVAL;
> @@ -6947,7 +6951,6 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> free_contig_range(end, outer_end - end);
> } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
> struct page *head = pfn_to_page(start);
> - int order = ilog2(end - start);
You have changed this from an int to a const unsigned int, which is
totally fine but it was left out of the change log. Probably not really
worth mentioning but curious why the change to unsigned here?
>
> check_new_pages(head, order);
> prep_new_page(head, order, gfp_mask, 0);
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()
2025-08-29 0:33 ` Liam R. Howlett
@ 2025-08-29 9:58 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 9:58 UTC (permalink / raw)
To: Liam R. Howlett, linux-kernel, Zi Yan, SeongJae Park,
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86
On 29.08.25 02:33, Liam R. Howlett wrote:
> * David Hildenbrand <david@redhat.com> [250827 18:04]:
>> Let's reject them early, which in turn makes folio_alloc_gigantic() reject
>> them properly.
>>
>> To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
>> and calculate MAX_FOLIO_NR_PAGES based on that.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Acked-by: SeongJae Park <sj@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Nit below, but..
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
>> ---
>> include/linux/mm.h | 6 ++++--
>> mm/page_alloc.c | 5 ++++-
>> 2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 00c8a54127d37..77737cbf2216a 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2055,11 +2055,13 @@ static inline long folio_nr_pages(const struct folio *folio)
>>
>> /* Only hugetlbfs can allocate folios larger than MAX_ORDER */
>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>> -#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER)
>> +#define MAX_FOLIO_ORDER PUD_ORDER
>> #else
>> -#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES
>> +#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
>> #endif
>>
>> +#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
>> +
>> /*
>> * compound_nr() returns the number of pages in this potentially compound
>> * page. compound_nr() can be called on a tail page, and is defined to
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index baead29b3e67b..426bc404b80cc 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6833,6 +6833,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
>> int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>> acr_flags_t alloc_flags, gfp_t gfp_mask)
>> {
>> + const unsigned int order = ilog2(end - start);
>> unsigned long outer_start, outer_end;
>> int ret = 0;
>>
>> @@ -6850,6 +6851,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>> PB_ISOLATE_MODE_CMA_ALLOC :
>> PB_ISOLATE_MODE_OTHER;
>>
>> + if (WARN_ON_ONCE((gfp_mask & __GFP_COMP) && order > MAX_FOLIO_ORDER))
>> + return -EINVAL;
>> +
>> gfp_mask = current_gfp_context(gfp_mask);
>> if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
>> return -EINVAL;
>> @@ -6947,7 +6951,6 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>> free_contig_range(end, outer_end - end);
>> } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
>> struct page *head = pfn_to_page(start);
>> - int order = ilog2(end - start);
>
> You have changed this from an int to a const unsigned int, which is
> totally fine but it was left out of the change log.
Considered to trivial to document, but I can add a sentence about that.
> Probably not really
> worth mentioning but curious why the change to unsigned here?
orders are always unsigned, like folio_order().
Thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()
[not found] <20250827220141.262669-1-david@redhat.com>
2025-08-27 22:01 ` [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 14:39 ` Lorenzo Stoakes
2025-08-29 0:34 ` Liam R. Howlett
2025-08-27 22:01 ` [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
` (23 subsequent siblings)
25 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, SeongJae Park, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Let's reject unreasonable folio sizes early, where we can still fail.
We'll add sanity checks to prepare_compound_head/prepare_compound_page
next.
Is there a way to configure a system such that unreasonable folio sizes
would be possible? It would already be rather questionable.
If so, we'd probably want to bail out earlier, where we can avoid a
WARN and just report a proper error message that indicates where
something went wrong such that we messed up.
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memremap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/memremap.c b/mm/memremap.c
index b0ce0d8254bd8..a2d4bb88f64b6 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -275,6 +275,9 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
return ERR_PTR(-EINVAL);
+ if (WARN_ONCE(pgmap->vmemmap_shift > MAX_FOLIO_ORDER,
+ "requested folio size unsupported\n"))
+ return ERR_PTR(-EINVAL);
switch (pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()
2025-08-27 22:01 ` [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages() David Hildenbrand
@ 2025-08-28 14:39 ` Lorenzo Stoakes
2025-08-29 0:34 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 14:39 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, SeongJae Park, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:11AM +0200, David Hildenbrand wrote:
> Let's reject unreasonable folio sizes early, where we can still fail.
> We'll add sanity checks to prepare_compound_head/prepare_compound_page
> next.
>
> Is there a way to configure a system such that unreasonable folio sizes
> would be possible? It would already be rather questionable.
>
> If so, we'd probably want to bail out earlier, where we can avoid a
> WARN and just report a proper error message that indicates where
> something went wrong such that we messed up.
>
> Acked-by: SeongJae Park <sj@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/memremap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/memremap.c b/mm/memremap.c
> index b0ce0d8254bd8..a2d4bb88f64b6 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -275,6 +275,9 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>
> if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> return ERR_PTR(-EINVAL);
> + if (WARN_ONCE(pgmap->vmemmap_shift > MAX_FOLIO_ORDER,
> + "requested folio size unsupported\n"))
> + return ERR_PTR(-EINVAL);
>
> switch (pgmap->type) {
> case MEMORY_DEVICE_PRIVATE:
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()
2025-08-27 22:01 ` [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages() David Hildenbrand
2025-08-28 14:39 ` Lorenzo Stoakes
@ 2025-08-29 0:34 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 0:34 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, SeongJae Park, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:04]:
> Let's reject unreasonable folio sizes early, where we can still fail.
> We'll add sanity checks to prepare_compound_head/prepare_compound_page
> next.
>
> Is there a way to configure a system such that unreasonable folio sizes
> would be possible? It would already be rather questionable.
>
> If so, we'd probably want to bail out earlier, where we can avoid a
> WARN and just report a proper error message that indicates where
> something went wrong such that we messed up.
>
> Acked-by: SeongJae Park <sj@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/memremap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/memremap.c b/mm/memremap.c
> index b0ce0d8254bd8..a2d4bb88f64b6 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -275,6 +275,9 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>
> if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> return ERR_PTR(-EINVAL);
> + if (WARN_ONCE(pgmap->vmemmap_shift > MAX_FOLIO_ORDER,
> + "requested folio size unsupported\n"))
> + return ERR_PTR(-EINVAL);
>
> switch (pgmap->type) {
> case MEMORY_DEVICE_PRIVATE:
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate
[not found] <20250827220141.262669-1-david@redhat.com>
2025-08-27 22:01 ` [PATCH v1 06/36] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
2025-08-27 22:01 ` [PATCH v1 07/36] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 1:04 ` Zi Yan
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
` (22 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Let's check that no hstate that corresponds to an unreasonable folio size
is registered by an architecture. If we were to succeed registering, we
could later try allocating an unsupported gigantic folio size.
Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
to use a BUILD_BUG_ON_INVALID() to make it compile.
No existing kernel configuration should be able to trigger this check:
either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
gigantic folios will not exceed a memory section (the case on sparse).
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/hugetlb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 572b6f7772841..4a97e4f14c0dc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
__NR_HPAGEFLAGS);
+ BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
if (!hugepages_supported()) {
if (hugetlb_max_hstate || default_hstate_max_huge_pages)
@@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
}
BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
+ WARN_ON(order > MAX_FOLIO_ORDER);
h = &hstates[hugetlb_max_hstate++];
__mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
h->order = order;
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate
2025-08-27 22:01 ` [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
@ 2025-08-28 1:04 ` Zi Yan
2025-08-28 14:45 ` Lorenzo Stoakes
2025-08-29 0:35 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Zi Yan @ 2025-08-28 1:04 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86
On 27 Aug 2025, at 18:01, David Hildenbrand wrote:
> Let's check that no hstate that corresponds to an unreasonable folio size
> is registered by an architecture. If we were to succeed registering, we
> could later try allocating an unsupported gigantic folio size.
>
> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
> to use a BUILD_BUG_ON_INVALID() to make it compile.
>
> No existing kernel configuration should be able to trigger this check:
> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
> gigantic folios will not exceed a memory section (the case on sparse).
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> mm/hugetlb.c | 2 ++
> 1 file changed, 2 insertions(+)
>
LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate
2025-08-27 22:01 ` [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
2025-08-28 1:04 ` Zi Yan
@ 2025-08-28 14:45 ` Lorenzo Stoakes
2025-08-29 10:07 ` David Hildenbrand
2025-08-29 0:35 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 14:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:12AM +0200, David Hildenbrand wrote:
> Let's check that no hstate that corresponds to an unreasonable folio size
> is registered by an architecture. If we were to succeed registering, we
> could later try allocating an unsupported gigantic folio size.
>
> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
> to use a BUILD_BUG_ON_INVALID() to make it compile.
>
> No existing kernel configuration should be able to trigger this check:
> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
> gigantic folios will not exceed a memory section (the case on sparse).
I am guessing it's implicit that MAX_FOLIO_ORDER <= section size?
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/hugetlb.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 572b6f7772841..4a97e4f14c0dc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>
> BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
> __NR_HPAGEFLAGS);
> + BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>
> if (!hugepages_supported()) {
> if (hugetlb_max_hstate || default_hstate_max_huge_pages)
> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
> }
> BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
> BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
> + WARN_ON(order > MAX_FOLIO_ORDER);
> h = &hstates[hugetlb_max_hstate++];
> __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
> h->order = order;
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate
2025-08-28 14:45 ` Lorenzo Stoakes
@ 2025-08-29 10:07 ` David Hildenbrand
2025-08-29 12:18 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 10:07 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On 28.08.25 16:45, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:12AM +0200, David Hildenbrand wrote:
>> Let's check that no hstate that corresponds to an unreasonable folio size
>> is registered by an architecture. If we were to succeed registering, we
>> could later try allocating an unsupported gigantic folio size.
>>
>> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
>> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
>> to use a BUILD_BUG_ON_INVALID() to make it compile.
>>
>> No existing kernel configuration should be able to trigger this check:
>> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
>> gigantic folios will not exceed a memory section (the case on sparse).
>
> I am guessing it's implicit that MAX_FOLIO_ORDER <= section size?
Yes, we have a build-time bug that somewhere.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate
2025-08-29 10:07 ` David Hildenbrand
@ 2025-08-29 12:18 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:18 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Fri, Aug 29, 2025 at 12:07:44PM +0200, David Hildenbrand wrote:
> On 28.08.25 16:45, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:12AM +0200, David Hildenbrand wrote:
> > > Let's check that no hstate that corresponds to an unreasonable folio size
> > > is registered by an architecture. If we were to succeed registering, we
> > > could later try allocating an unsupported gigantic folio size.
> > >
> > > Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
> > > is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
> > > to use a BUILD_BUG_ON_INVALID() to make it compile.
> > >
> > > No existing kernel configuration should be able to trigger this check:
> > > either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
> > > gigantic folios will not exceed a memory section (the case on sparse).
> >
> > I am guessing it's implicit that MAX_FOLIO_ORDER <= section size?
>
> Yes, we have a build-time bug that somewhere.
OK cool thanks!
>
> --
> Cheers
>
> David / dhildenb
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate
2025-08-27 22:01 ` [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
2025-08-28 1:04 ` Zi Yan
2025-08-28 14:45 ` Lorenzo Stoakes
@ 2025-08-29 0:35 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 0:35 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:04]:
> Let's check that no hstate that corresponds to an unreasonable folio size
> is registered by an architecture. If we were to succeed registering, we
> could later try allocating an unsupported gigantic folio size.
>
> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
> to use a BUILD_BUG_ON_INVALID() to make it compile.
>
> No existing kernel configuration should be able to trigger this check:
> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
> gigantic folios will not exceed a memory section (the case on sparse).
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/hugetlb.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 572b6f7772841..4a97e4f14c0dc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>
> BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
> __NR_HPAGEFLAGS);
> + BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>
> if (!hugepages_supported()) {
> if (hugetlb_max_hstate || default_hstate_max_huge_pages)
> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
> }
> BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
> BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
> + WARN_ON(order > MAX_FOLIO_ORDER);
> h = &hstates[hugetlb_max_hstate++];
> __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
> h->order = order;
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page()
[not found] <20250827220141.262669-1-david@redhat.com>
` (2 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 08/36] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 7:35 ` Wei Yang
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
` (21 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Mike Rapoport (Microsoft), Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
Grepping for "prep_compound_page" leaves on clueless how devdax gets its
compound pages initialized.
Let's add a comment that might help finding this open-coded
prep_compound_page() initialization more easily.
Further, let's be less smart about the ordering of initialization and just
perform the prep_compound_head() call after all tail pages were
initialized: just like prep_compound_page() does.
No need for a comment to describe the initialization order: again,
just like prep_compound_page().
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/mm_init.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5c21b3af216b2..df614556741a4 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1091,6 +1091,12 @@ static void __ref memmap_init_compound(struct page *head,
unsigned long pfn, end_pfn = head_pfn + nr_pages;
unsigned int order = pgmap->vmemmap_shift;
+ /*
+ * We have to initialize the pages, including setting up page links.
+ * prep_compound_page() does not take care of that, so instead we
+ * open-code prep_compound_page() so we can take care of initializing
+ * the pages in the same go.
+ */
__SetPageHead(head);
for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_page(pfn);
@@ -1098,15 +1104,8 @@ static void __ref memmap_init_compound(struct page *head,
__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
prep_compound_tail(head, pfn - head_pfn);
set_page_count(page, 0);
-
- /*
- * The first tail page stores important compound page info.
- * Call prep_compound_head() after the first tail page has
- * been initialized, to not have the data overwritten.
- */
- if (pfn == head_pfn + 1)
- prep_compound_head(head, order);
}
+ prep_compound_head(head, order);
}
void __ref memmap_init_zone_device(struct zone *zone,
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page()
2025-08-27 22:01 ` [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
@ 2025-08-28 7:35 ` Wei Yang
2025-08-28 14:54 ` Lorenzo Stoakes
2025-08-29 0:37 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2025-08-28 7:35 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:13AM +0200, David Hildenbrand wrote:
>Grepping for "prep_compound_page" leaves on clueless how devdax gets its
>compound pages initialized.
>
>Let's add a comment that might help finding this open-coded
>prep_compound_page() initialization more easily.
>
>Further, let's be less smart about the ordering of initialization and just
>perform the prep_compound_head() call after all tail pages were
>initialized: just like prep_compound_page() does.
>
>No need for a comment to describe the initialization order: again,
>just like prep_compound_page().
>
>Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page()
2025-08-27 22:01 ` [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
2025-08-28 7:35 ` Wei Yang
@ 2025-08-28 14:54 ` Lorenzo Stoakes
2025-08-29 0:37 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 14:54 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Muchun Song, netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:13AM +0200, David Hildenbrand wrote:
> Grepping for "prep_compound_page" leaves on clueless how devdax gets its
> compound pages initialized.
>
> Let's add a comment that might help finding this open-coded
> prep_compound_page() initialization more easily.
>
> Further, let's be less smart about the ordering of initialization and just
> perform the prep_compound_head() call after all tail pages were
> initialized: just like prep_compound_page() does.
>
> No need for a comment to describe the initialization order: again,
> just like prep_compound_page().
Wow this is great, thank you for putting a quality comment for this and
thinking of this :)
We have too much 'special case you just have to know' stuff sitting around,
so this kind of thing is always great to see.
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/mm_init.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 5c21b3af216b2..df614556741a4 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1091,6 +1091,12 @@ static void __ref memmap_init_compound(struct page *head,
> unsigned long pfn, end_pfn = head_pfn + nr_pages;
> unsigned int order = pgmap->vmemmap_shift;
>
> + /*
> + * We have to initialize the pages, including setting up page links.
> + * prep_compound_page() does not take care of that, so instead we
> + * open-code prep_compound_page() so we can take care of initializing
> + * the pages in the same go.
> + */
> __SetPageHead(head);
> for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) {
> struct page *page = pfn_to_page(pfn);
> @@ -1098,15 +1104,8 @@ static void __ref memmap_init_compound(struct page *head,
> __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> prep_compound_tail(head, pfn - head_pfn);
> set_page_count(page, 0);
> -
> - /*
> - * The first tail page stores important compound page info.
> - * Call prep_compound_head() after the first tail page has
> - * been initialized, to not have the data overwritten.
> - */
> - if (pfn == head_pfn + 1)
> - prep_compound_head(head, order);
> }
> + prep_compound_head(head, order);
> }
>
> void __ref memmap_init_zone_device(struct zone *zone,
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page()
2025-08-27 22:01 ` [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
2025-08-28 7:35 ` Wei Yang
2025-08-28 14:54 ` Lorenzo Stoakes
@ 2025-08-29 0:37 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 0:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:05]:
> Grepping for "prep_compound_page" leaves on clueless how devdax gets its
> compound pages initialized.
>
> Let's add a comment that might help finding this open-coded
> prep_compound_page() initialization more easily.
Thanks for the comment here.
>
> Further, let's be less smart about the ordering of initialization and just
> perform the prep_compound_head() call after all tail pages were
> initialized: just like prep_compound_page() does.
>
> No need for a comment to describe the initialization order: again,
> just like prep_compound_page().
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/mm_init.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 5c21b3af216b2..df614556741a4 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1091,6 +1091,12 @@ static void __ref memmap_init_compound(struct page *head,
> unsigned long pfn, end_pfn = head_pfn + nr_pages;
> unsigned int order = pgmap->vmemmap_shift;
>
> + /*
> + * We have to initialize the pages, including setting up page links.
> + * prep_compound_page() does not take care of that, so instead we
> + * open-code prep_compound_page() so we can take care of initializing
> + * the pages in the same go.
> + */
> __SetPageHead(head);
> for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) {
> struct page *page = pfn_to_page(pfn);
> @@ -1098,15 +1104,8 @@ static void __ref memmap_init_compound(struct page *head,
> __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> prep_compound_tail(head, pfn - head_pfn);
> set_page_count(page, 0);
> -
> - /*
> - * The first tail page stores important compound page info.
> - * Call prep_compound_head() after the first tail page has
> - * been initialized, to not have the data overwritten.
> - */
> - if (pfn == head_pfn + 1)
> - prep_compound_head(head, order);
> }
> + prep_compound_head(head, order);
> }
>
> void __ref memmap_init_zone_device(struct zone *zone,
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order()
[not found] <20250827220141.262669-1-david@redhat.com>
` (3 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 09/36] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 7:35 ` Wei Yang
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
` (20 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
Let's sanity-check in folio_set_order() whether we would be trying to
create a folio with an order that would make it exceed MAX_FOLIO_ORDER.
This will enable the check whenever a folio/compound page is initialized
through prepare_compound_head() / prepare_compound_page().
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/internal.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/internal.h b/mm/internal.h
index 45da9ff5694f6..9b0129531d004 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -755,6 +755,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
{
if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
return;
+ VM_WARN_ON_ONCE(order > MAX_FOLIO_ORDER);
folio->_flags_1 = (folio->_flags_1 & ~0xffUL) | order;
#ifdef NR_PAGES_IN_LARGE_FOLIO
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order()
2025-08-27 22:01 ` [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
@ 2025-08-28 7:35 ` Wei Yang
2025-08-28 15:00 ` Lorenzo Stoakes
2025-08-29 14:24 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2025-08-28 7:35 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:14AM +0200, David Hildenbrand wrote:
>Let's sanity-check in folio_set_order() whether we would be trying to
>create a folio with an order that would make it exceed MAX_FOLIO_ORDER.
>
>This will enable the check whenever a folio/compound page is initialized
>through prepare_compound_head() / prepare_compound_page().
>
>Reviewed-by: Zi Yan <ziy@nvidia.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order()
2025-08-27 22:01 ` [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
2025-08-28 7:35 ` Wei Yang
@ 2025-08-28 15:00 ` Lorenzo Stoakes
2025-08-29 10:10 ` David Hildenbrand
2025-08-29 14:24 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 15:00 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:14AM +0200, David Hildenbrand wrote:
> Let's sanity-check in folio_set_order() whether we would be trying to
> create a folio with an order that would make it exceed MAX_FOLIO_ORDER.
>
> This will enable the check whenever a folio/compound page is initialized
> through prepare_compound_head() / prepare_compound_page().
NIT: with CONFIG_DEBUG_VM set :)
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM (apart from nit below), so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/internal.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 45da9ff5694f6..9b0129531d004 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -755,6 +755,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
> + VM_WARN_ON_ONCE(order > MAX_FOLIO_ORDER);
Given we have 'full-fat' WARN_ON*()'s above, maybe worth making this one too?
>
> folio->_flags_1 = (folio->_flags_1 & ~0xffUL) | order;
> #ifdef NR_PAGES_IN_LARGE_FOLIO
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order()
2025-08-28 15:00 ` Lorenzo Stoakes
@ 2025-08-29 10:10 ` David Hildenbrand
2025-08-29 12:18 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 10:10 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On 28.08.25 17:00, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:14AM +0200, David Hildenbrand wrote:
>> Let's sanity-check in folio_set_order() whether we would be trying to
>> create a folio with an order that would make it exceed MAX_FOLIO_ORDER.
>>
>> This will enable the check whenever a folio/compound page is initialized
>> through prepare_compound_head() / prepare_compound_page().
>
> NIT: with CONFIG_DEBUG_VM set :)
Yes, will add that.
>
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> LGTM (apart from nit below), so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
>> ---
>> mm/internal.h | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 45da9ff5694f6..9b0129531d004 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -755,6 +755,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
>> {
>> if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>> return;
>> + VM_WARN_ON_ONCE(order > MAX_FOLIO_ORDER);
>
> Given we have 'full-fat' WARN_ON*()'s above, maybe worth making this one too?
The idea is that if you reach this point here, previous such checks I
added failed. So this is the safety net, and for that VM_WARN_ON_ONCE()
is sufficient.
I think we should rather convert the WARN_ON_ONCE to VM_WARN_ON_ONCE()
at some point, because no sane code should ever trigger that.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order()
2025-08-29 10:10 ` David Hildenbrand
@ 2025-08-29 12:18 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:18 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On Fri, Aug 29, 2025 at 12:10:30PM +0200, David Hildenbrand wrote:
> On 28.08.25 17:00, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:14AM +0200, David Hildenbrand wrote:
> > > Let's sanity-check in folio_set_order() whether we would be trying to
> > > create a folio with an order that would make it exceed MAX_FOLIO_ORDER.
> > >
> > > This will enable the check whenever a folio/compound page is initialized
> > > through prepare_compound_head() / prepare_compound_page().
> >
> > NIT: with CONFIG_DEBUG_VM set :)
>
> Yes, will add that.
Thanks!
>
> >
> > >
> > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > LGTM (apart from nit below), so:
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > > ---
> > > mm/internal.h | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/mm/internal.h b/mm/internal.h
> > > index 45da9ff5694f6..9b0129531d004 100644
> > > --- a/mm/internal.h
> > > +++ b/mm/internal.h
> > > @@ -755,6 +755,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
> > > {
> > > if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> > > return;
> > > + VM_WARN_ON_ONCE(order > MAX_FOLIO_ORDER);
> >
> > Given we have 'full-fat' WARN_ON*()'s above, maybe worth making this one too?
>
> The idea is that if you reach this point here, previous such checks I added
> failed. So this is the safety net, and for that VM_WARN_ON_ONCE() is
> sufficient.
>
> I think we should rather convert the WARN_ON_ONCE to VM_WARN_ON_ONCE() at
> some point, because no sane code should ever trigger that.
Ack, ok. I don't think vital for this series though!
>
> --
> Cheers
>
> David / dhildenb
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order()
2025-08-27 22:01 ` [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
2025-08-28 7:35 ` Wei Yang
2025-08-28 15:00 ` Lorenzo Stoakes
@ 2025-08-29 14:24 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 14:24 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86
* David Hildenbrand <david@redhat.com> [250827 18:05]:
> Let's sanity-check in folio_set_order() whether we would be trying to
> create a folio with an order that would make it exceed MAX_FOLIO_ORDER.
>
> This will enable the check whenever a folio/compound page is initialized
> through prepare_compound_head() / prepare_compound_page().
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/internal.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 45da9ff5694f6..9b0129531d004 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -755,6 +755,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
> + VM_WARN_ON_ONCE(order > MAX_FOLIO_ORDER);
>
> folio->_flags_1 = (folio->_flags_1 & ~0xffUL) | order;
> #ifdef NR_PAGES_IN_LARGE_FOLIO
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs
[not found] <20250827220141.262669-1-david@redhat.com>
` (4 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 10/36] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 7:37 ` Wei Yang
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
` (19 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Zi Yan, Mike Rapoport (Microsoft),
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86
Let's limit the maximum folio size in problematic kernel config where
the memmap is allocated per memory section (SPARSEMEM without
SPARSEMEM_VMEMMAP) to a single memory section.
Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
but not SPARSEMEM_VMEMMAP: sh.
Fortunately, the biggest hugetlb size sh supports is 64 MiB
(HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
(SECTION_SIZE_BITS == 26), so their use case is not degraded.
As folios and memory sections are naturally aligned to their order-2 size
in memory, consequently a single folio can no longer span multiple memory
sections on these problematic kernel configs.
nth_page() is no longer required when operating within a single compound
page / folio.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77737cbf2216a..2dee79fa2efcf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2053,11 +2053,25 @@ static inline long folio_nr_pages(const struct folio *folio)
return folio_large_nr_pages(folio);
}
-/* Only hugetlbfs can allocate folios larger than MAX_ORDER */
-#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
-#define MAX_FOLIO_ORDER PUD_ORDER
-#else
+#if !defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE)
+/*
+ * We don't expect any folios that exceed buddy sizes (and consequently
+ * memory sections).
+ */
#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
+#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
+ * Only pages within a single memory section are guaranteed to be
+ * contiguous. By limiting folios to a single memory section, all folio
+ * pages are guaranteed to be contiguous.
+ */
+#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
+#else
+/*
+ * There is no real limit on the folio size. We limit them to the maximum we
+ * currently expect (e.g., hugetlb, dax).
+ */
+#define MAX_FOLIO_ORDER PUD_ORDER
#endif
#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs
2025-08-27 22:01 ` [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
@ 2025-08-28 7:37 ` Wei Yang
2025-08-28 15:10 ` Lorenzo Stoakes
2025-08-29 14:27 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2025-08-28 7:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Mike Rapoport (Microsoft),
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:15AM +0200, David Hildenbrand wrote:
>Let's limit the maximum folio size in problematic kernel config where
>the memmap is allocated per memory section (SPARSEMEM without
>SPARSEMEM_VMEMMAP) to a single memory section.
>
>Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
>but not SPARSEMEM_VMEMMAP: sh.
>
>Fortunately, the biggest hugetlb size sh supports is 64 MiB
>(HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
>(SECTION_SIZE_BITS == 26), so their use case is not degraded.
>
>As folios and memory sections are naturally aligned to their order-2 size
>in memory, consequently a single folio can no longer span multiple memory
>sections on these problematic kernel configs.
>
>nth_page() is no longer required when operating within a single compound
>page / folio.
>
>Reviewed-by: Zi Yan <ziy@nvidia.com>
>Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs
2025-08-27 22:01 ` [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
2025-08-28 7:37 ` Wei Yang
@ 2025-08-28 15:10 ` Lorenzo Stoakes
2025-08-29 11:57 ` David Hildenbrand
2025-08-29 14:27 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 15:10 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Mike Rapoport (Microsoft),
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:15AM +0200, David Hildenbrand wrote:
> Let's limit the maximum folio size in problematic kernel config where
> the memmap is allocated per memory section (SPARSEMEM without
> SPARSEMEM_VMEMMAP) to a single memory section.
>
> Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
> but not SPARSEMEM_VMEMMAP: sh.
>
> Fortunately, the biggest hugetlb size sh supports is 64 MiB
> (HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
> (SECTION_SIZE_BITS == 26), so their use case is not degraded.
>
> As folios and memory sections are naturally aligned to their order-2 size
> in memory, consequently a single folio can no longer span multiple memory
> sections on these problematic kernel configs.
>
> nth_page() is no longer required when operating within a single compound
> page / folio.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Realy great comments, like this!
I wonder if we could have this be part of the first patch where you fiddle
with MAX_FOLIO_ORDER etc. but not a big deal.
Anyway LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mm.h | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77737cbf2216a..2dee79fa2efcf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2053,11 +2053,25 @@ static inline long folio_nr_pages(const struct folio *folio)
> return folio_large_nr_pages(folio);
> }
>
> -/* Only hugetlbfs can allocate folios larger than MAX_ORDER */
> -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> -#define MAX_FOLIO_ORDER PUD_ORDER
> -#else
> +#if !defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE)
> +/*
> + * We don't expect any folios that exceed buddy sizes (and consequently
> + * memory sections).
> + */
> #define MAX_FOLIO_ORDER MAX_PAGE_ORDER
> +#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Only pages within a single memory section are guaranteed to be
> + * contiguous. By limiting folios to a single memory section, all folio
> + * pages are guaranteed to be contiguous.
> + */
> +#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
Hmmm, was this implicit before somehow? I mean surely by the fact as you say
that physical contiguity would not otherwise be guaranteed :))
> +#else
> +/*
> + * There is no real limit on the folio size. We limit them to the maximum we
> + * currently expect (e.g., hugetlb, dax).
> + */
This is nice.
> +#define MAX_FOLIO_ORDER PUD_ORDER
> #endif
>
> #define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs
2025-08-28 15:10 ` Lorenzo Stoakes
@ 2025-08-29 11:57 ` David Hildenbrand
2025-08-29 12:01 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 11:57 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Zi Yan, Mike Rapoport (Microsoft),
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86
On 28.08.25 17:10, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:15AM +0200, David Hildenbrand wrote:
>> Let's limit the maximum folio size in problematic kernel config where
>> the memmap is allocated per memory section (SPARSEMEM without
>> SPARSEMEM_VMEMMAP) to a single memory section.
>>
>> Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
>> but not SPARSEMEM_VMEMMAP: sh.
>>
>> Fortunately, the biggest hugetlb size sh supports is 64 MiB
>> (HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
>> (SECTION_SIZE_BITS == 26), so their use case is not degraded.
>>
>> As folios and memory sections are naturally aligned to their order-2 size
>> in memory, consequently a single folio can no longer span multiple memory
>> sections on these problematic kernel configs.
>>
>> nth_page() is no longer required when operating within a single compound
>> page / folio.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Realy great comments, like this!
>
> I wonder if we could have this be part of the first patch where you fiddle
> with MAX_FOLIO_ORDER etc. but not a big deal.
I think it belongs into this patch where we actually impose the
restrictions.
[...]
>> +/*
>> + * Only pages within a single memory section are guaranteed to be
>> + * contiguous. By limiting folios to a single memory section, all folio
>> + * pages are guaranteed to be contiguous.
>> + */
>> +#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
>
> Hmmm, was this implicit before somehow? I mean surely by the fact as you say
> that physical contiguity would not otherwise be guaranteed :))
Well, my patches until this point made sure that any attempt to use a
larger folio would fail in a way that we could spot now if there is any
offender.
That is why before this change, nth_page() was required within a folio.
Hope that clarifies it, thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs
2025-08-29 11:57 ` David Hildenbrand
@ 2025-08-29 12:01 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:01 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Mike Rapoport (Microsoft),
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86
On Fri, Aug 29, 2025 at 01:57:22PM +0200, David Hildenbrand wrote:
> On 28.08.25 17:10, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:15AM +0200, David Hildenbrand wrote:
> > > Let's limit the maximum folio size in problematic kernel config where
> > > the memmap is allocated per memory section (SPARSEMEM without
> > > SPARSEMEM_VMEMMAP) to a single memory section.
> > >
> > > Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
> > > but not SPARSEMEM_VMEMMAP: sh.
> > >
> > > Fortunately, the biggest hugetlb size sh supports is 64 MiB
> > > (HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
> > > (SECTION_SIZE_BITS == 26), so their use case is not degraded.
> > >
> > > As folios and memory sections are naturally aligned to their order-2 size
> > > in memory, consequently a single folio can no longer span multiple memory
> > > sections on these problematic kernel configs.
> > >
> > > nth_page() is no longer required when operating within a single compound
> > > page / folio.
> > >
> > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > Realy great comments, like this!
> >
> > I wonder if we could have this be part of the first patch where you fiddle
> > with MAX_FOLIO_ORDER etc. but not a big deal.
>
> I think it belongs into this patch where we actually impose the
> restrictions.
Sure it's not a big deal.
>
> [...]
>
> > > +/*
> > > + * Only pages within a single memory section are guaranteed to be
> > > + * contiguous. By limiting folios to a single memory section, all folio
> > > + * pages are guaranteed to be contiguous.
> > > + */
> > > +#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
> >
> > Hmmm, was this implicit before somehow? I mean surely by the fact as you say
> > that physical contiguity would not otherwise be guaranteed :))
>
> Well, my patches until this point made sure that any attempt to use a larger
> folio would fail in a way that we could spot now if there is any offender.
Ack yeah.
>
> That is why before this change, nth_page() was required within a folio.
>
> Hope that clarifies it, thanks!
Yes thanks! :)
>
> --
> Cheers
>
> David / dhildenb
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs
2025-08-27 22:01 ` [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
2025-08-28 7:37 ` Wei Yang
2025-08-28 15:10 ` Lorenzo Stoakes
@ 2025-08-29 14:27 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 14:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Mike Rapoport (Microsoft),
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86
* David Hildenbrand <david@redhat.com> [250827 18:05]:
> Let's limit the maximum folio size in problematic kernel config where
> the memmap is allocated per memory section (SPARSEMEM without
> SPARSEMEM_VMEMMAP) to a single memory section.
>
> Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
> but not SPARSEMEM_VMEMMAP: sh.
>
> Fortunately, the biggest hugetlb size sh supports is 64 MiB
> (HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
> (SECTION_SIZE_BITS == 26), so their use case is not degraded.
>
> As folios and memory sections are naturally aligned to their order-2 size
> in memory, consequently a single folio can no longer span multiple memory
> sections on these problematic kernel configs.
>
> nth_page() is no longer required when operating within a single compound
> page / folio.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> include/linux/mm.h | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77737cbf2216a..2dee79fa2efcf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2053,11 +2053,25 @@ static inline long folio_nr_pages(const struct folio *folio)
> return folio_large_nr_pages(folio);
> }
>
> -/* Only hugetlbfs can allocate folios larger than MAX_ORDER */
> -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> -#define MAX_FOLIO_ORDER PUD_ORDER
> -#else
> +#if !defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE)
> +/*
> + * We don't expect any folios that exceed buddy sizes (and consequently
> + * memory sections).
> + */
> #define MAX_FOLIO_ORDER MAX_PAGE_ORDER
> +#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Only pages within a single memory section are guaranteed to be
> + * contiguous. By limiting folios to a single memory section, all folio
> + * pages are guaranteed to be contiguous.
> + */
> +#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
> +#else
> +/*
> + * There is no real limit on the folio size. We limit them to the maximum we
> + * currently expect (e.g., hugetlb, dax).
> + */
> +#define MAX_FOLIO_ORDER PUD_ORDER
> #endif
>
> #define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx()
[not found] <20250827220141.262669-1-david@redhat.com>
` (5 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 11/36] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 7:43 ` Wei Yang
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
` (18 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
Now that a single folio/compound page can no longer span memory sections
in problematic kernel configurations, we can stop using nth_page().
While at it, turn both macros into static inline functions and add
kernel doc for folio_page_idx().
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 16 ++++++++++++++--
include/linux/page-flags.h | 5 ++++-
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2dee79fa2efcf..f6880e3225c5c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -210,10 +210,8 @@ extern unsigned long sysctl_admin_reserve_kbytes;
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
-#define folio_page_idx(folio, p) (page_to_pfn(p) - folio_pfn(folio))
#else
#define nth_page(page,n) ((page) + (n))
-#define folio_page_idx(folio, p) ((p) - &(folio)->page)
#endif
/* to align the pointer to the (next) page boundary */
@@ -225,6 +223,20 @@ extern unsigned long sysctl_admin_reserve_kbytes;
/* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
#define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
+/**
+ * folio_page_idx - Return the number of a page in a folio.
+ * @folio: The folio.
+ * @page: The folio page.
+ *
+ * This function expects that the page is actually part of the folio.
+ * The returned number is relative to the start of the folio.
+ */
+static inline unsigned long folio_page_idx(const struct folio *folio,
+ const struct page *page)
+{
+ return page - &folio->page;
+}
+
static inline struct folio *lru_to_folio(struct list_head *head)
{
return list_entry((head)->prev, struct folio, lru);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5ee6ffbdbf831..faf17ca211b4f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -316,7 +316,10 @@ static __always_inline unsigned long _compound_head(const struct page *page)
* check that the page number lies within @folio; the caller is presumed
* to have a reference to the page.
*/
-#define folio_page(folio, n) nth_page(&(folio)->page, n)
+static inline struct page *folio_page(struct folio *folio, unsigned long n)
+{
+ return &folio->page + n;
+}
static __always_inline int PageTail(const struct page *page)
{
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx()
2025-08-27 22:01 ` [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
@ 2025-08-28 7:43 ` Wei Yang
2025-08-28 7:46 ` David Hildenbrand
2025-08-28 15:24 ` Lorenzo Stoakes
2025-08-29 14:41 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2025-08-28 7:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:16AM +0200, David Hildenbrand wrote:
>Now that a single folio/compound page can no longer span memory sections
>in problematic kernel configurations, we can stop using nth_page().
>
>While at it, turn both macros into static inline functions and add
>kernel doc for folio_page_idx().
>
>Reviewed-by: Zi Yan <ziy@nvidia.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
The code looks good, while one nit below.
>---
> include/linux/mm.h | 16 ++++++++++++++--
> include/linux/page-flags.h | 5 ++++-
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/mm.h b/include/linux/mm.h
>index 2dee79fa2efcf..f6880e3225c5c 100644
>--- a/include/linux/mm.h
>+++ b/include/linux/mm.h
>@@ -210,10 +210,8 @@ extern unsigned long sysctl_admin_reserve_kbytes;
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
>-#define folio_page_idx(folio, p) (page_to_pfn(p) - folio_pfn(folio))
> #else
> #define nth_page(page,n) ((page) + (n))
>-#define folio_page_idx(folio, p) ((p) - &(folio)->page)
> #endif
>
> /* to align the pointer to the (next) page boundary */
>@@ -225,6 +223,20 @@ extern unsigned long sysctl_admin_reserve_kbytes;
> /* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
> #define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
>
>+/**
>+ * folio_page_idx - Return the number of a page in a folio.
>+ * @folio: The folio.
>+ * @page: The folio page.
>+ *
>+ * This function expects that the page is actually part of the folio.
>+ * The returned number is relative to the start of the folio.
>+ */
>+static inline unsigned long folio_page_idx(const struct folio *folio,
>+ const struct page *page)
>+{
>+ return page - &folio->page;
>+}
>+
> static inline struct folio *lru_to_folio(struct list_head *head)
> {
> return list_entry((head)->prev, struct folio, lru);
>diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>index 5ee6ffbdbf831..faf17ca211b4f 100644
>--- a/include/linux/page-flags.h
>+++ b/include/linux/page-flags.h
>@@ -316,7 +316,10 @@ static __always_inline unsigned long _compound_head(const struct page *page)
> * check that the page number lies within @folio; the caller is presumed
> * to have a reference to the page.
> */
>-#define folio_page(folio, n) nth_page(&(folio)->page, n)
>+static inline struct page *folio_page(struct folio *folio, unsigned long n)
>+{
>+ return &folio->page + n;
>+}
>
Curious about why it is in page-flags.h. It seems not related to page-flags.
> static __always_inline int PageTail(const struct page *page)
> {
>--
>2.50.1
>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx()
2025-08-28 7:43 ` Wei Yang
@ 2025-08-28 7:46 ` David Hildenbrand
2025-08-28 8:18 ` Wei Yang
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-28 7:46 UTC (permalink / raw)
To: Wei Yang
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
>
> Curious about why it is in page-flags.h. It seems not related to page-flags.
Likely because we have the page_folio() in there as well.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx()
2025-08-28 7:46 ` David Hildenbrand
@ 2025-08-28 8:18 ` Wei Yang
0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2025-08-28 8:18 UTC (permalink / raw)
To: David Hildenbrand
Cc: Wei Yang, linux-kernel, Zi Yan, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 09:46:25AM +0200, David Hildenbrand wrote:
>>
>> Curious about why it is in page-flags.h. It seems not related to page-flags.
>
>Likely because we have the page_folio() in there as well.
>
Hmm... sorry for this silly question.
>--
>Cheers
>
>David / dhildenb
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx()
2025-08-27 22:01 ` [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
2025-08-28 7:43 ` Wei Yang
@ 2025-08-28 15:24 ` Lorenzo Stoakes
2025-08-29 14:41 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 15:24 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86
On Thu, Aug 28, 2025 at 12:01:16AM +0200, David Hildenbrand wrote:
> Now that a single folio/compound page can no longer span memory sections
> in problematic kernel configurations, we can stop using nth_page().
>
> While at it, turn both macros into static inline functions and add
> kernel doc for folio_page_idx().
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mm.h | 16 ++++++++++++++--
> include/linux/page-flags.h | 5 ++++-
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2dee79fa2efcf..f6880e3225c5c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -210,10 +210,8 @@ extern unsigned long sysctl_admin_reserve_kbytes;
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> -#define folio_page_idx(folio, p) (page_to_pfn(p) - folio_pfn(folio))
> #else
> #define nth_page(page,n) ((page) + (n))
> -#define folio_page_idx(folio, p) ((p) - &(folio)->page)
> #endif
>
> /* to align the pointer to the (next) page boundary */
> @@ -225,6 +223,20 @@ extern unsigned long sysctl_admin_reserve_kbytes;
> /* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
> #define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
>
> +/**
> + * folio_page_idx - Return the number of a page in a folio.
> + * @folio: The folio.
> + * @page: The folio page.
> + *
> + * This function expects that the page is actually part of the folio.
> + * The returned number is relative to the start of the folio.
> + */
> +static inline unsigned long folio_page_idx(const struct folio *folio,
> + const struct page *page)
> +{
> + return page - &folio->page;
Ahh now I see why we did all this stuff with regard to the sparse things before
:) very nice.
> +}
> +
> static inline struct folio *lru_to_folio(struct list_head *head)
> {
> return list_entry((head)->prev, struct folio, lru);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5ee6ffbdbf831..faf17ca211b4f 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -316,7 +316,10 @@ static __always_inline unsigned long _compound_head(const struct page *page)
> * check that the page number lies within @folio; the caller is presumed
> * to have a reference to the page.
> */
> -#define folio_page(folio, n) nth_page(&(folio)->page, n)
> +static inline struct page *folio_page(struct folio *folio, unsigned long n)
> +{
> + return &folio->page + n;
> +}3
>
> static __always_inline int PageTail(const struct page *page)
> {
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx()
2025-08-27 22:01 ` [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
2025-08-28 7:43 ` Wei Yang
2025-08-28 15:24 ` Lorenzo Stoakes
@ 2025-08-29 14:41 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 14:41 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Zi Yan, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86
* David Hildenbrand <david@redhat.com> [250827 18:06]:
> Now that a single folio/compound page can no longer span memory sections
> in problematic kernel configurations, we can stop using nth_page().
..but only in a subset of nth_page uses, considering mm.h still has the
define.
>
> While at it, turn both macros into static inline functions and add
> kernel doc for folio_page_idx().
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> include/linux/mm.h | 16 ++++++++++++++--
> include/linux/page-flags.h | 5 ++++-
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2dee79fa2efcf..f6880e3225c5c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -210,10 +210,8 @@ extern unsigned long sysctl_admin_reserve_kbytes;
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> -#define folio_page_idx(folio, p) (page_to_pfn(p) - folio_pfn(folio))
> #else
> #define nth_page(page,n) ((page) + (n))
> -#define folio_page_idx(folio, p) ((p) - &(folio)->page)
> #endif
>
> /* to align the pointer to the (next) page boundary */
> @@ -225,6 +223,20 @@ extern unsigned long sysctl_admin_reserve_kbytes;
> /* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
> #define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
>
> +/**
> + * folio_page_idx - Return the number of a page in a folio.
> + * @folio: The folio.
> + * @page: The folio page.
> + *
> + * This function expects that the page is actually part of the folio.
> + * The returned number is relative to the start of the folio.
> + */
> +static inline unsigned long folio_page_idx(const struct folio *folio,
> + const struct page *page)
> +{
> + return page - &folio->page;
> +}
> +
> static inline struct folio *lru_to_folio(struct list_head *head)
> {
> return list_entry((head)->prev, struct folio, lru);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5ee6ffbdbf831..faf17ca211b4f 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -316,7 +316,10 @@ static __always_inline unsigned long _compound_head(const struct page *page)
> * check that the page number lies within @folio; the caller is presumed
> * to have a reference to the page.
> */
> -#define folio_page(folio, n) nth_page(&(folio)->page, n)
> +static inline struct page *folio_page(struct folio *folio, unsigned long n)
> +{
> + return &folio->page + n;
> +}
>
> static __always_inline int PageTail(const struct page *page)
> {
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
[not found] <20250827220141.262669-1-david@redhat.com>
` (6 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 12/36] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 7:21 ` Mike Rapoport
` (2 more replies)
2025-08-27 22:01 ` [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation David Hildenbrand
` (17 subsequent siblings)
25 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
We can now safely iterate over all pages in a folio, so no need for the
pfn_to_page().
Also, as we already force the refcount in __init_single_page() to 1,
we can just set the refcount to 0 and avoid page_ref_freeze() +
VM_BUG_ON. Likely, in the future, we would just want to tell
__init_single_page() to which value to initialize the refcount.
Further, adjust the comments to highlight that we are dealing with an
open-coded prep_compound_page() variant, and add another comment explaining
why we really need the __init_single_page() only on the tail pages.
Note that the current code was likely problematic, but we never ran into
it: prep_compound_tail() would have been called with an offset that might
exceed a memory section, and prep_compound_tail() would have simply
added that offset to the page pointer -- which would not have done the
right thing on sparsemem without vmemmap.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/hugetlb.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4a97e4f14c0dc..1f42186a85ea4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
{
enum zone_type zone = zone_idx(folio_zone(folio));
int nid = folio_nid(folio);
+ struct page *page = folio_page(folio, start_page_number);
unsigned long head_pfn = folio_pfn(folio);
unsigned long pfn, end_pfn = head_pfn + end_page_number;
- int ret;
-
- for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
- struct page *page = pfn_to_page(pfn);
+ /*
+ * We mark all tail pages with memblock_reserved_mark_noinit(),
+ * so these pages are completely uninitialized.
+ */
+ for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
__init_single_page(page, pfn, zone, nid);
prep_compound_tail((struct page *)folio, pfn - head_pfn);
- ret = page_ref_freeze(page, 1);
- VM_BUG_ON(!ret);
+ set_page_count(page, 0);
}
}
@@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
{
int ret;
- /* Prepare folio head */
+ /*
+ * This is an open-coded prep_compound_page() whereby we avoid
+ * walking pages twice by initializing/preparing+freezing them in the
+ * same go.
+ */
__folio_clear_reserved(folio);
__folio_set_head(folio);
ret = folio_ref_freeze(folio, 1);
VM_BUG_ON(!ret);
- /* Initialize the necessary tail struct pages */
hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
prep_compound_head((struct page *)folio, huge_page_order(h));
}
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-27 22:01 ` [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
@ 2025-08-28 7:21 ` Mike Rapoport
2025-08-28 7:44 ` David Hildenbrand
2025-08-28 15:37 ` Lorenzo Stoakes
2025-08-29 14:57 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Mike Rapoport @ 2025-08-28 7:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> We can now safely iterate over all pages in a folio, so no need for the
> pfn_to_page().
>
> Also, as we already force the refcount in __init_single_page() to 1,
> we can just set the refcount to 0 and avoid page_ref_freeze() +
> VM_BUG_ON. Likely, in the future, we would just want to tell
> __init_single_page() to which value to initialize the refcount.
>
> Further, adjust the comments to highlight that we are dealing with an
> open-coded prep_compound_page() variant, and add another comment explaining
> why we really need the __init_single_page() only on the tail pages.
>
> Note that the current code was likely problematic, but we never ran into
> it: prep_compound_tail() would have been called with an offset that might
> exceed a memory section, and prep_compound_tail() would have simply
> added that offset to the page pointer -- which would not have done the
> right thing on sparsemem without vmemmap.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> mm/hugetlb.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a97e4f14c0dc..1f42186a85ea4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> {
> enum zone_type zone = zone_idx(folio_zone(folio));
> int nid = folio_nid(folio);
> + struct page *page = folio_page(folio, start_page_number);
> unsigned long head_pfn = folio_pfn(folio);
> unsigned long pfn, end_pfn = head_pfn + end_page_number;
> - int ret;
> -
> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> - struct page *page = pfn_to_page(pfn);
>
> + /*
> + * We mark all tail pages with memblock_reserved_mark_noinit(),
> + * so these pages are completely uninitialized.
^ not? ;-)
> + */
> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail((struct page *)folio, pfn - head_pfn);
> - ret = page_ref_freeze(page, 1);
> - VM_BUG_ON(!ret);
> + set_page_count(page, 0);
> }
> }
>
> @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> {
> int ret;
>
> - /* Prepare folio head */
> + /*
> + * This is an open-coded prep_compound_page() whereby we avoid
> + * walking pages twice by initializing/preparing+freezing them in the
> + * same go.
> + */
> __folio_clear_reserved(folio);
> __folio_set_head(folio);
> ret = folio_ref_freeze(folio, 1);
> VM_BUG_ON(!ret);
> - /* Initialize the necessary tail struct pages */
> hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
> prep_compound_head((struct page *)folio, huge_page_order(h));
> }
> --
> 2.50.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-28 7:21 ` Mike Rapoport
@ 2025-08-28 7:44 ` David Hildenbrand
2025-08-28 8:06 ` Mike Rapoport
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-28 7:44 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On 28.08.25 09:21, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
>> We can now safely iterate over all pages in a folio, so no need for the
>> pfn_to_page().
>>
>> Also, as we already force the refcount in __init_single_page() to 1,
>> we can just set the refcount to 0 and avoid page_ref_freeze() +
>> VM_BUG_ON. Likely, in the future, we would just want to tell
>> __init_single_page() to which value to initialize the refcount.
>>
>> Further, adjust the comments to highlight that we are dealing with an
>> open-coded prep_compound_page() variant, and add another comment explaining
>> why we really need the __init_single_page() only on the tail pages.
>>
>> Note that the current code was likely problematic, but we never ran into
>> it: prep_compound_tail() would have been called with an offset that might
>> exceed a memory section, and prep_compound_tail() would have simply
>> added that offset to the page pointer -- which would not have done the
>> right thing on sparsemem without vmemmap.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> mm/hugetlb.c | 20 ++++++++++++--------
>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 4a97e4f14c0dc..1f42186a85ea4 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
>> {
>> enum zone_type zone = zone_idx(folio_zone(folio));
>> int nid = folio_nid(folio);
>> + struct page *page = folio_page(folio, start_page_number);
>> unsigned long head_pfn = folio_pfn(folio);
>> unsigned long pfn, end_pfn = head_pfn + end_page_number;
>> - int ret;
>> -
>> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
>> - struct page *page = pfn_to_page(pfn);
>>
>> + /*
>> + * We mark all tail pages with memblock_reserved_mark_noinit(),
>> + * so these pages are completely uninitialized.
>
> ^ not? ;-)
Can you elaborate?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-28 7:44 ` David Hildenbrand
@ 2025-08-28 8:06 ` Mike Rapoport
2025-08-28 8:18 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Mike Rapoport @ 2025-08-28 8:06 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote:
> On 28.08.25 09:21, Mike Rapoport wrote:
> > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> > > We can now safely iterate over all pages in a folio, so no need for the
> > > pfn_to_page().
> > >
> > > Also, as we already force the refcount in __init_single_page() to 1,
> > > we can just set the refcount to 0 and avoid page_ref_freeze() +
> > > VM_BUG_ON. Likely, in the future, we would just want to tell
> > > __init_single_page() to which value to initialize the refcount.
> > >
> > > Further, adjust the comments to highlight that we are dealing with an
> > > open-coded prep_compound_page() variant, and add another comment explaining
> > > why we really need the __init_single_page() only on the tail pages.
> > >
> > > Note that the current code was likely problematic, but we never ran into
> > > it: prep_compound_tail() would have been called with an offset that might
> > > exceed a memory section, and prep_compound_tail() would have simply
> > > added that offset to the page pointer -- which would not have done the
> > > right thing on sparsemem without vmemmap.
> > >
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > mm/hugetlb.c | 20 ++++++++++++--------
> > > 1 file changed, 12 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 4a97e4f14c0dc..1f42186a85ea4 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> > > {
> > > enum zone_type zone = zone_idx(folio_zone(folio));
> > > int nid = folio_nid(folio);
> > > + struct page *page = folio_page(folio, start_page_number);
> > > unsigned long head_pfn = folio_pfn(folio);
> > > unsigned long pfn, end_pfn = head_pfn + end_page_number;
> > > - int ret;
> > > -
> > > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> > > - struct page *page = pfn_to_page(pfn);
> > > + /*
> > > + * We mark all tail pages with memblock_reserved_mark_noinit(),
> > > + * so these pages are completely uninitialized.
> >
> > ^ not? ;-)
>
> Can you elaborate?
Oh, sorry, I misread "uninitialized".
Still, I'd phrase it as
/*
* We marked all tail pages with memblock_reserved_mark_noinit(),
* so we must initialize them here.
*/
> --
> Cheers
>
> David / dhildenb
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-28 8:06 ` Mike Rapoport
@ 2025-08-28 8:18 ` David Hildenbrand
2025-08-28 8:37 ` Mike Rapoport
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-28 8:18 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On 28.08.25 10:06, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote:
>> On 28.08.25 09:21, Mike Rapoport wrote:
>>> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
>>>> We can now safely iterate over all pages in a folio, so no need for the
>>>> pfn_to_page().
>>>>
>>>> Also, as we already force the refcount in __init_single_page() to 1,
>>>> we can just set the refcount to 0 and avoid page_ref_freeze() +
>>>> VM_BUG_ON. Likely, in the future, we would just want to tell
>>>> __init_single_page() to which value to initialize the refcount.
>>>>
>>>> Further, adjust the comments to highlight that we are dealing with an
>>>> open-coded prep_compound_page() variant, and add another comment explaining
>>>> why we really need the __init_single_page() only on the tail pages.
>>>>
>>>> Note that the current code was likely problematic, but we never ran into
>>>> it: prep_compound_tail() would have been called with an offset that might
>>>> exceed a memory section, and prep_compound_tail() would have simply
>>>> added that offset to the page pointer -- which would not have done the
>>>> right thing on sparsemem without vmemmap.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> mm/hugetlb.c | 20 ++++++++++++--------
>>>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 4a97e4f14c0dc..1f42186a85ea4 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
>>>> {
>>>> enum zone_type zone = zone_idx(folio_zone(folio));
>>>> int nid = folio_nid(folio);
>>>> + struct page *page = folio_page(folio, start_page_number);
>>>> unsigned long head_pfn = folio_pfn(folio);
>>>> unsigned long pfn, end_pfn = head_pfn + end_page_number;
>>>> - int ret;
>>>> -
>>>> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
>>>> - struct page *page = pfn_to_page(pfn);
>>>> + /*
>>>> + * We mark all tail pages with memblock_reserved_mark_noinit(),
>>>> + * so these pages are completely uninitialized.
>>>
>>> ^ not? ;-)
>>
>> Can you elaborate?
>
> Oh, sorry, I misread "uninitialized".
> Still, I'd phrase it as
>
> /*
> * We marked all tail pages with memblock_reserved_mark_noinit(),
> * so we must initialize them here.
> */
I prefer what I currently have, but thanks for the review.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-28 8:18 ` David Hildenbrand
@ 2025-08-28 8:37 ` Mike Rapoport
2025-08-29 12:00 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Mike Rapoport @ 2025-08-28 8:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On Thu, Aug 28, 2025 at 10:18:23AM +0200, David Hildenbrand wrote:
> On 28.08.25 10:06, Mike Rapoport wrote:
> > On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote:
> > > On 28.08.25 09:21, Mike Rapoport wrote:
> > > > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> > > > > + /*
> > > > > + * We mark all tail pages with memblock_reserved_mark_noinit(),
> > > > > + * so these pages are completely uninitialized.
> > > >
> > > > ^ not? ;-)
> > >
> > > Can you elaborate?
> >
> > Oh, sorry, I misread "uninitialized".
> > Still, I'd phrase it as
> >
> > /*
> > * We marked all tail pages with memblock_reserved_mark_noinit(),
> > * so we must initialize them here.
> > */
>
> I prefer what I currently have, but thanks for the review.
No strong feelings, feel free to add
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-28 8:37 ` Mike Rapoport
@ 2025-08-29 12:00 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 12:00 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On 28.08.25 10:37, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 10:18:23AM +0200, David Hildenbrand wrote:
>> On 28.08.25 10:06, Mike Rapoport wrote:
>>> On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote:
>>>> On 28.08.25 09:21, Mike Rapoport wrote:
>>>>> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
>>>>>> + /*
>>>>>> + * We mark all tail pages with memblock_reserved_mark_noinit(),
>>>>>> + * so these pages are completely uninitialized.
>>>>>
>>>>> ^ not? ;-)
>>>>
>>>> Can you elaborate?
>>>
>>> Oh, sorry, I misread "uninitialized".
>>> Still, I'd phrase it as
>>>
>>> /*
>>> * We marked all tail pages with memblock_reserved_mark_noinit(),
>>> * so we must initialize them here.
>>> */
>>
>> I prefer what I currently have, but thanks for the review.
>
> No strong feelings, feel free to add
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
I now have
"As we marked all tail pages with memblock_reserved_mark_noinit(), we
must initialize them ourselves here."
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-27 22:01 ` [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
2025-08-28 7:21 ` Mike Rapoport
@ 2025-08-28 15:37 ` Lorenzo Stoakes
2025-08-29 11:59 ` David Hildenbrand
2025-08-29 14:57 ` Liam R. Howlett
2 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 15:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> We can now safely iterate over all pages in a folio, so no need for the
> pfn_to_page().
>
> Also, as we already force the refcount in __init_single_page() to 1,
Mega huge nit (ignore if you want), but maybe worth saying 'via
init_page_count()'.
> we can just set the refcount to 0 and avoid page_ref_freeze() +
> VM_BUG_ON. Likely, in the future, we would just want to tell
> __init_single_page() to which value to initialize the refcount.
Right yes :)
>
> Further, adjust the comments to highlight that we are dealing with an
> open-coded prep_compound_page() variant, and add another comment explaining
> why we really need the __init_single_page() only on the tail pages.
Ah nice another 'anchor' to grep for!
>
> Note that the current code was likely problematic, but we never ran into
> it: prep_compound_tail() would have been called with an offset that might
> exceed a memory section, and prep_compound_tail() would have simply
> added that offset to the page pointer -- which would not have done the
> right thing on sparsemem without vmemmap.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/hugetlb.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a97e4f14c0dc..1f42186a85ea4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> {
> enum zone_type zone = zone_idx(folio_zone(folio));
> int nid = folio_nid(folio);
> + struct page *page = folio_page(folio, start_page_number);
> unsigned long head_pfn = folio_pfn(folio);
> unsigned long pfn, end_pfn = head_pfn + end_page_number;
> - int ret;
> -
> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> - struct page *page = pfn_to_page(pfn);
>
> + /*
> + * We mark all tail pages with memblock_reserved_mark_noinit(),
> + * so these pages are completely uninitialized.
> + */
> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail((struct page *)folio, pfn - head_pfn);
> - ret = page_ref_freeze(page, 1);
> - VM_BUG_ON(!ret);
> + set_page_count(page, 0);
> }
> }
>
> @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> {
> int ret;
>
> - /* Prepare folio head */
> + /*
> + * This is an open-coded prep_compound_page() whereby we avoid
> + * walking pages twice by initializing/preparing+freezing them in the
> + * same go.
> + */
> __folio_clear_reserved(folio);
> __folio_set_head(folio);
> ret = folio_ref_freeze(folio, 1);
> VM_BUG_ON(!ret);
> - /* Initialize the necessary tail struct pages */
> hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
> prep_compound_head((struct page *)folio, huge_page_order(h));
> }
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-28 15:37 ` Lorenzo Stoakes
@ 2025-08-29 11:59 ` David Hildenbrand
2025-08-29 12:02 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 11:59 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On 28.08.25 17:37, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
>> We can now safely iterate over all pages in a folio, so no need for the
>> pfn_to_page().
>>
>> Also, as we already force the refcount in __init_single_page() to 1,
>
> Mega huge nit (ignore if you want), but maybe worth saying 'via
> init_page_count()'.
Will add, thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-29 11:59 ` David Hildenbrand
@ 2025-08-29 12:02 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:02 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Fri, Aug 29, 2025 at 01:59:19PM +0200, David Hildenbrand wrote:
> On 28.08.25 17:37, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> > > We can now safely iterate over all pages in a folio, so no need for the
> > > pfn_to_page().
> > >
> > > Also, as we already force the refcount in __init_single_page() to 1,
> >
> > Mega huge nit (ignore if you want), but maybe worth saying 'via
> > init_page_count()'.
>
> Will add, thanks!
Thanks!
>
> --
> Cheers
>
> David / dhildenb
>
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
2025-08-27 22:01 ` [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
2025-08-28 7:21 ` Mike Rapoport
2025-08-28 15:37 ` Lorenzo Stoakes
@ 2025-08-29 14:57 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 14:57 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:06]:
> We can now safely iterate over all pages in a folio, so no need for the
> pfn_to_page().
>
> Also, as we already force the refcount in __init_single_page() to 1,
> we can just set the refcount to 0 and avoid page_ref_freeze() +
> VM_BUG_ON. Likely, in the future, we would just want to tell
> __init_single_page() to which value to initialize the refcount.
>
> Further, adjust the comments to highlight that we are dealing with an
> open-coded prep_compound_page() variant, and add another comment explaining
> why we really need the __init_single_page() only on the tail pages.
>
> Note that the current code was likely problematic, but we never ran into
> it: prep_compound_tail() would have been called with an offset that might
> exceed a memory section, and prep_compound_tail() would have simply
> added that offset to the page pointer -- which would not have done the
> right thing on sparsemem without vmemmap.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/hugetlb.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a97e4f14c0dc..1f42186a85ea4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> {
> enum zone_type zone = zone_idx(folio_zone(folio));
> int nid = folio_nid(folio);
> + struct page *page = folio_page(folio, start_page_number);
> unsigned long head_pfn = folio_pfn(folio);
> unsigned long pfn, end_pfn = head_pfn + end_page_number;
> - int ret;
> -
> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> - struct page *page = pfn_to_page(pfn);
>
> + /*
> + * We mark all tail pages with memblock_reserved_mark_noinit(),
> + * so these pages are completely uninitialized.
> + */
> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail((struct page *)folio, pfn - head_pfn);
> - ret = page_ref_freeze(page, 1);
> - VM_BUG_ON(!ret);
> + set_page_count(page, 0);
> }
> }
>
> @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> {
> int ret;
>
> - /* Prepare folio head */
> + /*
> + * This is an open-coded prep_compound_page() whereby we avoid
> + * walking pages twice by initializing/preparing+freezing them in the
> + * same go.
> + */
> __folio_clear_reserved(folio);
> __folio_set_head(folio);
> ret = folio_ref_freeze(folio, 1);
> VM_BUG_ON(!ret);
> - /* Initialize the necessary tail struct pages */
> hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
> prep_compound_head((struct page *)folio, huge_page_order(h));
> }
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation
[not found] <20250827220141.262669-1-david@redhat.com>
` (7 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 13/36] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 15:43 ` Lorenzo Stoakes
2025-08-29 14:59 ` Liam R. Howlett
2025-08-27 22:01 ` [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison() David Hildenbrand
` (16 subsequent siblings)
25 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
We're allocating a higher-order page from the buddy. For these pages
(that are guaranteed to not exceed a single memory section) there is no
need to use nth_page().
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/percpu-km.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index fe31aa19db81a..4efa74a495cb6 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -69,7 +69,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
}
for (i = 0; i < nr_pages; i++)
- pcpu_set_page_chunk(nth_page(pages, i), chunk);
+ pcpu_set_page_chunk(pages + i, chunk);
chunk->data = pages;
chunk->base_addr = page_address(pages);
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation
2025-08-27 22:01 ` [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation David Hildenbrand
@ 2025-08-28 15:43 ` Lorenzo Stoakes
2025-08-29 14:59 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 15:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:18AM +0200, David Hildenbrand wrote:
> We're allocating a higher-order page from the buddy. For these pages
> (that are guaranteed to not exceed a single memory section) there is no
> need to use nth_page().
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Oh hello! Now it all comes together :)
nth_tag():
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/percpu-km.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/percpu-km.c b/mm/percpu-km.c
> index fe31aa19db81a..4efa74a495cb6 100644
> --- a/mm/percpu-km.c
> +++ b/mm/percpu-km.c
> @@ -69,7 +69,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
> }
>
> for (i = 0; i < nr_pages; i++)
> - pcpu_set_page_chunk(nth_page(pages, i), chunk);
> + pcpu_set_page_chunk(pages + i, chunk);
>
> chunk->data = pages;
> chunk->base_addr = page_address(pages);
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation
2025-08-27 22:01 ` [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation David Hildenbrand
2025-08-28 15:43 ` Lorenzo Stoakes
@ 2025-08-29 14:59 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 14:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:06]:
> We're allocating a higher-order page from the buddy. For these pages
> (that are guaranteed to not exceed a single memory section) there is no
> need to use nth_page().
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/percpu-km.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/percpu-km.c b/mm/percpu-km.c
> index fe31aa19db81a..4efa74a495cb6 100644
> --- a/mm/percpu-km.c
> +++ b/mm/percpu-km.c
> @@ -69,7 +69,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
> }
>
> for (i = 0; i < nr_pages; i++)
> - pcpu_set_page_chunk(nth_page(pages, i), chunk);
> + pcpu_set_page_chunk(pages + i, chunk);
>
> chunk->data = pages;
> chunk->base_addr = page_address(pages);
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()
[not found] <20250827220141.262669-1-david@redhat.com>
` (8 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 14/36] mm/mm/percpu-km: drop nth_page() usage within single allocation David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 1:14 ` [PATCH v1 15/36] " Zi Yan
2025-08-28 15:45 ` [PATCH v1 15/36] fs: " Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 16/36] fs: hugetlbfs: cleanup " David Hildenbrand
` (15 subsequent siblings)
25 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
The nth_page() is not really required anymore, so let's remove it.
While at it, cleanup and simplify the code a bit.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/hugetlbfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 34d496a2b7de6..c5a46d10afaa0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -217,7 +217,7 @@ static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
break;
offset += n;
if (offset == PAGE_SIZE) {
- page = nth_page(page, 1);
+ page++;
offset = 0;
}
}
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 15/36] hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()
2025-08-27 22:01 ` [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison() David Hildenbrand
@ 2025-08-28 1:14 ` Zi Yan
2025-08-28 15:45 ` [PATCH v1 15/36] fs: " Lorenzo Stoakes
1 sibling, 0 replies; 108+ messages in thread
From: Zi Yan @ 2025-08-28 1:14 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86
On 27 Aug 2025, at 18:01, David Hildenbrand wrote:
> The nth_page() is not really required anymore, so let's remove it.
> While at it, cleanup and simplify the code a bit.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()
2025-08-27 22:01 ` [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison() David Hildenbrand
2025-08-28 1:14 ` [PATCH v1 15/36] " Zi Yan
@ 2025-08-28 15:45 ` Lorenzo Stoakes
2025-08-29 12:02 ` David Hildenbrand
1 sibling, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 15:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:19AM +0200, David Hildenbrand wrote:
> The nth_page() is not really required anymore, so let's remove it.
> While at it, cleanup and simplify the code a bit.
Hm Not sure which bit is the cleanup? Was there meant to be more here or?
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 34d496a2b7de6..c5a46d10afaa0 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -217,7 +217,7 @@ static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
> break;
> offset += n;
> if (offset == PAGE_SIZE) {
> - page = nth_page(page, 1);
> + page++;
LOL at that diff. Great!
> offset = 0;
> }
> }
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()
2025-08-28 15:45 ` [PATCH v1 15/36] fs: " Lorenzo Stoakes
@ 2025-08-29 12:02 ` David Hildenbrand
2025-08-29 12:09 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 12:02 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On 28.08.25 17:45, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:19AM +0200, David Hildenbrand wrote:
>> The nth_page() is not really required anymore, so let's remove it.
>> While at it, cleanup and simplify the code a bit.
>
> Hm Not sure which bit is the cleanup? Was there meant to be more here or?
Thanks, leftover from the pre-split of this patch!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()
2025-08-29 12:02 ` David Hildenbrand
@ 2025-08-29 12:09 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:09 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Fri, Aug 29, 2025 at 02:02:02PM +0200, David Hildenbrand wrote:
> On 28.08.25 17:45, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:19AM +0200, David Hildenbrand wrote:
> > > The nth_page() is not really required anymore, so let's remove it.
> > > While at it, cleanup and simplify the code a bit.
> >
> > Hm Not sure which bit is the cleanup? Was there meant to be more here or?
>
> Thanks, leftover from the pre-split of this patch!
Thanks! :)
(Am replying even on 'not really needing a reply' like this so I know which
stuff I replied to :P)
>
> --
> Cheers
>
> David / dhildenb
>
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 16/36] fs: hugetlbfs: cleanup folio in adjust_range_hwpoison()
[not found] <20250827220141.262669-1-david@redhat.com>
` (9 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 15/36] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 1:18 ` [PATCH v1 16/36] " Zi Yan
2025-08-28 16:20 ` [PATCH v1 16/36] fs: " Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start() David Hildenbrand
` (14 subsequent siblings)
25 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Let's cleanup and simplify the function a bit.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/hugetlbfs/inode.c | 33 +++++++++++----------------------
1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c5a46d10afaa0..6ca1f6b45c1e5 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -198,31 +198,20 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
size_t bytes)
{
- struct page *page;
- size_t n = 0;
- size_t res = 0;
-
- /* First page to start the loop. */
- page = folio_page(folio, offset / PAGE_SIZE);
- offset %= PAGE_SIZE;
- while (1) {
- if (is_raw_hwpoison_page_in_hugepage(page))
- break;
+ struct page *page = folio_page(folio, offset / PAGE_SIZE);
+ size_t safe_bytes;
+
+ if (is_raw_hwpoison_page_in_hugepage(page))
+ return 0;
+ /* Safe to read the remaining bytes in this page. */
+ safe_bytes = PAGE_SIZE - (offset % PAGE_SIZE);
+ page++;
- /* Safe to read n bytes without touching HWPOISON subpage. */
- n = min(bytes, (size_t)PAGE_SIZE - offset);
- res += n;
- bytes -= n;
- if (!bytes || !n)
+ for (; safe_bytes < bytes; safe_bytes += PAGE_SIZE, page++)
+ if (is_raw_hwpoison_page_in_hugepage(page))
break;
- offset += n;
- if (offset == PAGE_SIZE) {
- page++;
- offset = 0;
- }
- }
- return res;
+ return min(safe_bytes, bytes);
}
/*
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 16/36] hugetlbfs: cleanup folio in adjust_range_hwpoison()
2025-08-27 22:01 ` [PATCH v1 16/36] fs: hugetlbfs: cleanup " David Hildenbrand
@ 2025-08-28 1:18 ` Zi Yan
2025-08-28 16:20 ` [PATCH v1 16/36] fs: " Lorenzo Stoakes
1 sibling, 0 replies; 108+ messages in thread
From: Zi Yan @ 2025-08-28 1:18 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86
On 27 Aug 2025, at 18:01, David Hildenbrand wrote:
> Let's cleanup and simplify the function a bit.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/hugetlbfs/inode.c | 33 +++++++++++----------------------
> 1 file changed, 11 insertions(+), 22 deletions(-)
>
LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 16/36] fs: hugetlbfs: cleanup folio in adjust_range_hwpoison()
2025-08-27 22:01 ` [PATCH v1 16/36] fs: hugetlbfs: cleanup " David Hildenbrand
2025-08-28 1:18 ` [PATCH v1 16/36] " Zi Yan
@ 2025-08-28 16:20 ` Lorenzo Stoakes
2025-08-29 13:22 ` David Hildenbrand
1 sibling, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 16:20 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:20AM +0200, David Hildenbrand wrote:
> Let's cleanup and simplify the function a bit.
Ah I guess you separated this out from the previous patch? :)
I feel like it might be worth talking about the implementation here in the
commit message as it took me a while to figure this out.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
This original implementation is SO GROSS.
God this hurts my mind
n = min(bytes, (size_t)PAGE_SIZE - offset);
So either it'll be remaining bytes in page or we're only spanning one page first
time round
Then we
res += n;
bytes -= n;
So bytes comes to end of page if spanning multiple
Then offset if spanning multiple pages will be PAGE_SIZE -offset + offset (!!!)
therefore PAGE_SIZE And we move to the next page and reset offset to 0:
offset += n;
if (offset == PAGE_SIZE) {
page = nth_page(page, 1);
offset = 0;
}
Then from then on n = min(bytes, PAGE_SIZE) (!!!!!!)
So res = remaining safe bytes in first page + num other pages OR bytes if we
don't span more than 1.
Lord above.
Also semantics of 'if bytes == 0, then check first page anyway' which you do
capture.
OK think I have convinced myself this is right, so hopefully no deeply subtle
off-by-one issues here :P
Anyway, LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> fs/hugetlbfs/inode.c | 33 +++++++++++----------------------
> 1 file changed, 11 insertions(+), 22 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index c5a46d10afaa0..6ca1f6b45c1e5 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -198,31 +198,20 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
> size_t bytes)
> {
> - struct page *page;
> - size_t n = 0;
> - size_t res = 0;
> -
> - /* First page to start the loop. */
> - page = folio_page(folio, offset / PAGE_SIZE);
> - offset %= PAGE_SIZE;
> - while (1) {
> - if (is_raw_hwpoison_page_in_hugepage(page))
> - break;
> + struct page *page = folio_page(folio, offset / PAGE_SIZE);
> + size_t safe_bytes;
> +
> + if (is_raw_hwpoison_page_in_hugepage(page))
> + return 0;
> + /* Safe to read the remaining bytes in this page. */
> + safe_bytes = PAGE_SIZE - (offset % PAGE_SIZE);
> + page++;
>
> - /* Safe to read n bytes without touching HWPOISON subpage. */
> - n = min(bytes, (size_t)PAGE_SIZE - offset);
> - res += n;
> - bytes -= n;
> - if (!bytes || !n)
> + for (; safe_bytes < bytes; safe_bytes += PAGE_SIZE, page++)
OK this is quite subtle - so if safe_bytes == bytes, this means we've confirmed
that all requested bytes are safe.
So offset=0, bytes = 4096 would fail this (as safe_bytes == 4096).
Maybe worth putting something like:
/*
* Now we check page-by-page in the folio to see if any bytes we don't
* yet know to be safe are contained within posioned pages or not.
*/
Above the loop. Or something like this.
> + if (is_raw_hwpoison_page_in_hugepage(page))
> break;
> - offset += n;
> - if (offset == PAGE_SIZE) {
> - page++;
> - offset = 0;
> - }
> - }
>
> - return res;
> + return min(safe_bytes, bytes);
Yeah given above analysis this seems correct.
You must have torn your hair out over this :)
> }
>
> /*
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 16/36] fs: hugetlbfs: cleanup folio in adjust_range_hwpoison()
2025-08-28 16:20 ` [PATCH v1 16/36] fs: " Lorenzo Stoakes
@ 2025-08-29 13:22 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 13:22 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
>
> Lord above.
>
> Also semantics of 'if bytes == 0, then check first page anyway' which you do
> capture.
Yeah, I think bytes == 0 would not make any sense, though. Staring
briefly at the single caller, that seems to be the case (bytes != 0).
>
> OK think I have convinced myself this is right, so hopefully no deeply subtle
> off-by-one issues here :P
>
> Anyway, LGTM, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
>> ---
>> fs/hugetlbfs/inode.c | 33 +++++++++++----------------------
>> 1 file changed, 11 insertions(+), 22 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index c5a46d10afaa0..6ca1f6b45c1e5 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -198,31 +198,20 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>> static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
>> size_t bytes)
>> {
>> - struct page *page;
>> - size_t n = 0;
>> - size_t res = 0;
>> -
>> - /* First page to start the loop. */
>> - page = folio_page(folio, offset / PAGE_SIZE);
>> - offset %= PAGE_SIZE;
>> - while (1) {
>> - if (is_raw_hwpoison_page_in_hugepage(page))
>> - break;
>> + struct page *page = folio_page(folio, offset / PAGE_SIZE);
>> + size_t safe_bytes;
>> +
>> + if (is_raw_hwpoison_page_in_hugepage(page))
>> + return 0;
>> + /* Safe to read the remaining bytes in this page. */
>> + safe_bytes = PAGE_SIZE - (offset % PAGE_SIZE);
>> + page++;
>>
>> - /* Safe to read n bytes without touching HWPOISON subpage. */
>> - n = min(bytes, (size_t)PAGE_SIZE - offset);
>> - res += n;
>> - bytes -= n;
>> - if (!bytes || !n)
>> + for (; safe_bytes < bytes; safe_bytes += PAGE_SIZE, page++)
>
> OK this is quite subtle - so if safe_bytes == bytes, this means we've confirmed
> that all requested bytes are safe.
>
> So offset=0, bytes = 4096 would fail this (as safe_bytes == 4096).
>
> Maybe worth putting something like:
>
> /*
> * Now we check page-by-page in the folio to see if any bytes we don't
> * yet know to be safe are contained within posioned pages or not.
> */
>
> Above the loop. Or something like this.
"Check each remaining page as long as we are not done yet."
>
>> + if (is_raw_hwpoison_page_in_hugepage(page))
>> break;
>> - offset += n;
>> - if (offset == PAGE_SIZE) {
>> - page++;
>> - offset = 0;
>> - }
>> - }
>>
>> - return res;
>> + return min(safe_bytes, bytes);
>
> Yeah given above analysis this seems correct.
>
> You must have torn your hair out over this :)
I could resist the urge to clean that up, yes.
I'll also drop the "The implementation borrows the iteration logic from
copy_page_to_iter*." part, because I suspect this comment no longer
makes sense.
Thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()
[not found] <20250827220141.262669-1-david@redhat.com>
` (10 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 16/36] fs: hugetlbfs: cleanup " David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 16:21 ` Lorenzo Stoakes
2025-08-29 15:11 ` Liam R. Howlett
2025-08-27 22:01 ` [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages David Hildenbrand
` (13 subsequent siblings)
25 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
It's no longer required to use nth_page() within a folio, so let's just
drop the nth_page() in folio_walk_start().
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/pagewalk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index c6753d370ff4e..9e4225e5fcf5c 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -1004,7 +1004,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
found:
if (expose_page)
/* Note: Offset from the mapped page, not the folio start. */
- fw->page = nth_page(page, (addr & (entry_size - 1)) >> PAGE_SHIFT);
+ fw->page = page + ((addr & (entry_size - 1)) >> PAGE_SHIFT);
else
fw->page = NULL;
fw->ptl = ptl;
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()
2025-08-27 22:01 ` [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start() David Hildenbrand
@ 2025-08-28 16:21 ` Lorenzo Stoakes
2025-08-29 15:11 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 16:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:21AM +0200, David Hildenbrand wrote:
> It's no longer required to use nth_page() within a folio, so let's just
> drop the nth_page() in folio_walk_start().
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/pagewalk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index c6753d370ff4e..9e4225e5fcf5c 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -1004,7 +1004,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> found:
> if (expose_page)
> /* Note: Offset from the mapped page, not the folio start. */
> - fw->page = nth_page(page, (addr & (entry_size - 1)) >> PAGE_SHIFT);
> + fw->page = page + ((addr & (entry_size - 1)) >> PAGE_SHIFT);
Be nice to clean this horrid one liner up a bit also but that's out of
scope here :)
> else
> fw->page = NULL;
> fw->ptl = ptl;
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()
2025-08-27 22:01 ` [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start() David Hildenbrand
2025-08-28 16:21 ` Lorenzo Stoakes
@ 2025-08-29 15:11 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 15:11 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:07]:
> It's no longer required to use nth_page() within a folio, so let's just
> drop the nth_page() in folio_walk_start().
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/pagewalk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index c6753d370ff4e..9e4225e5fcf5c 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -1004,7 +1004,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> found:
> if (expose_page)
> /* Note: Offset from the mapped page, not the folio start. */
> - fw->page = nth_page(page, (addr & (entry_size - 1)) >> PAGE_SHIFT);
> + fw->page = page + ((addr & (entry_size - 1)) >> PAGE_SHIFT);
> else
> fw->page = NULL;
> fw->ptl = ptl;
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages
[not found] <20250827220141.262669-1-david@redhat.com>
` (11 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 17/36] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 16:37 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 19/36] io_uring/zcrx: remove nth_page() usage within folio David Hildenbrand
` (12 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
nth_page() is no longer required when iterating over pages within a
single folio, so let's just drop it when recording subpages.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/gup.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index b2a78f0291273..89ca0813791ab 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -488,12 +488,11 @@ static int record_subpages(struct page *page, unsigned long sz,
unsigned long addr, unsigned long end,
struct page **pages)
{
- struct page *start_page;
int nr;
- start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT);
+ page += (addr & (sz - 1)) >> PAGE_SHIFT;
for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
- pages[nr] = nth_page(start_page, nr);
+ pages[nr] = page++;
return nr;
}
@@ -1512,7 +1511,7 @@ static long __get_user_pages(struct mm_struct *mm,
}
for (j = 0; j < page_increm; j++) {
- subpage = nth_page(page, j);
+ subpage = page + j;
pages[i + j] = subpage;
flush_anon_page(vma, subpage, start + j * PAGE_SIZE);
flush_dcache_page(subpage);
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages
2025-08-27 22:01 ` [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages David Hildenbrand
@ 2025-08-28 16:37 ` Lorenzo Stoakes
2025-08-29 13:41 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 16:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:22AM +0200, David Hildenbrand wrote:
> nth_page() is no longer required when iterating over pages within a
> single folio, so let's just drop it when recording subpages.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
This looks correct to me, so notwithtsanding suggestion below, LGTM and:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/gup.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index b2a78f0291273..89ca0813791ab 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -488,12 +488,11 @@ static int record_subpages(struct page *page, unsigned long sz,
> unsigned long addr, unsigned long end,
> struct page **pages)
> {
> - struct page *start_page;
> int nr;
>
> - start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT);
> + page += (addr & (sz - 1)) >> PAGE_SHIFT;
> for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
> - pages[nr] = nth_page(start_page, nr);
> + pages[nr] = page++;
This is really nice, but I wonder if (while we're here) we can't be even
more clear as to what's going on here, e.g.:
static int record_subpages(struct page *page, unsigned long sz,
unsigned long addr, unsigned long end,
struct page **pages)
{
size_t offset_in_folio = (addr & (sz - 1)) >> PAGE_SHIFT;
struct page *subpage = page + offset_in_folio;
for (; addr != end; addr += PAGE_SIZE)
*pages++ = subpage++;
return nr;
}
Or some variant of that with the masking stuff self-documented.
>
> return nr;
> }
> @@ -1512,7 +1511,7 @@ static long __get_user_pages(struct mm_struct *mm,
> }
>
> for (j = 0; j < page_increm; j++) {
> - subpage = nth_page(page, j);
> + subpage = page + j;
> pages[i + j] = subpage;
> flush_anon_page(vma, subpage, start + j * PAGE_SIZE);
> flush_dcache_page(subpage);
> --
> 2.50.1
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages
2025-08-28 16:37 ` Lorenzo Stoakes
@ 2025-08-29 13:41 ` David Hildenbrand
2025-08-29 15:19 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 13:41 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On 28.08.25 18:37, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:22AM +0200, David Hildenbrand wrote:
>> nth_page() is no longer required when iterating over pages within a
>> single folio, so let's just drop it when recording subpages.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> This looks correct to me, so notwithtsanding suggestion below, LGTM and:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
>> ---
>> mm/gup.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index b2a78f0291273..89ca0813791ab 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -488,12 +488,11 @@ static int record_subpages(struct page *page, unsigned long sz,
>> unsigned long addr, unsigned long end,
>> struct page **pages)
>> {
>> - struct page *start_page;
>> int nr;
>>
>> - start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT);
>> + page += (addr & (sz - 1)) >> PAGE_SHIFT;
>> for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
>> - pages[nr] = nth_page(start_page, nr);
>> + pages[nr] = page++;
>
>
> This is really nice, but I wonder if (while we're here) we can't be even
> more clear as to what's going on here, e.g.:
>
> static int record_subpages(struct page *page, unsigned long sz,
> unsigned long addr, unsigned long end,
> struct page **pages)
> {
> size_t offset_in_folio = (addr & (sz - 1)) >> PAGE_SHIFT;
> struct page *subpage = page + offset_in_folio;
>
> for (; addr != end; addr += PAGE_SIZE)
> *pages++ = subpage++;
>
> return nr;
> }
>
> Or some variant of that with the masking stuff self-documented.
What about the following cleanup on top:
diff --git a/mm/gup.c b/mm/gup.c
index 89ca0813791ab..5a72a135ec70b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -484,19 +484,6 @@ static inline void mm_set_has_pinned_flag(struct mm_struct *mm)
#ifdef CONFIG_MMU
#ifdef CONFIG_HAVE_GUP_FAST
-static int record_subpages(struct page *page, unsigned long sz,
- unsigned long addr, unsigned long end,
- struct page **pages)
-{
- int nr;
-
- page += (addr & (sz - 1)) >> PAGE_SHIFT;
- for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
- pages[nr] = page++;
-
- return nr;
-}
-
/**
* try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
* @page: pointer to page to be grabbed
@@ -2963,8 +2950,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
if (pmd_special(orig))
return 0;
- page = pmd_page(orig);
- refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
+ refs = (end - addr) >> PAGE_SHIFT;
+ page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
@@ -2985,6 +2972,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
}
*nr += refs;
+ for (; refs; refs--)
+ *(pages++) = page++;
folio_set_referenced(folio);
return 1;
}
@@ -3003,8 +2992,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
if (pud_special(orig))
return 0;
- page = pud_page(orig);
- refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
+ refs = (end - addr) >> PAGE_SHIFT;
+ page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
@@ -3026,6 +3015,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
}
*nr += refs;
+ for (; refs; refs--)
+ *(pages++) = page++;
folio_set_referenced(folio);
return 1;
}
The nice thing is that we only record pages in the array if they actually passed our tests.
--
Cheers
David / dhildenb
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages
2025-08-29 13:41 ` David Hildenbrand
@ 2025-08-29 15:19 ` Lorenzo Stoakes
2025-09-01 11:35 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 15:19 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Fri, Aug 29, 2025 at 03:41:40PM +0200, David Hildenbrand wrote:
> On 28.08.25 18:37, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:22AM +0200, David Hildenbrand wrote:
> > > nth_page() is no longer required when iterating over pages within a
> > > single folio, so let's just drop it when recording subpages.
> > >
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > This looks correct to me, so notwithtsanding suggestion below, LGTM and:
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > > ---
> > > mm/gup.c | 7 +++----
> > > 1 file changed, 3 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index b2a78f0291273..89ca0813791ab 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -488,12 +488,11 @@ static int record_subpages(struct page *page, unsigned long sz,
> > > unsigned long addr, unsigned long end,
> > > struct page **pages)
> > > {
> > > - struct page *start_page;
> > > int nr;
> > >
> > > - start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT);
> > > + page += (addr & (sz - 1)) >> PAGE_SHIFT;
> > > for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
> > > - pages[nr] = nth_page(start_page, nr);
> > > + pages[nr] = page++;
> >
> >
> > This is really nice, but I wonder if (while we're here) we can't be even
> > more clear as to what's going on here, e.g.:
> >
> > static int record_subpages(struct page *page, unsigned long sz,
> > unsigned long addr, unsigned long end,
> > struct page **pages)
> > {
> > size_t offset_in_folio = (addr & (sz - 1)) >> PAGE_SHIFT;
> > struct page *subpage = page + offset_in_folio;
> >
> > for (; addr != end; addr += PAGE_SIZE)
> > *pages++ = subpage++;
> >
> > return nr;
> > }
> >
> > Or some variant of that with the masking stuff self-documented.
>
> What about the following cleanup on top:
>
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 89ca0813791ab..5a72a135ec70b 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -484,19 +484,6 @@ static inline void mm_set_has_pinned_flag(struct mm_struct *mm)
> #ifdef CONFIG_MMU
> #ifdef CONFIG_HAVE_GUP_FAST
> -static int record_subpages(struct page *page, unsigned long sz,
> - unsigned long addr, unsigned long end,
> - struct page **pages)
> -{
> - int nr;
> -
> - page += (addr & (sz - 1)) >> PAGE_SHIFT;
> - for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
> - pages[nr] = page++;
> -
> - return nr;
> -}
> -
> /**
> * try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
> * @page: pointer to page to be grabbed
> @@ -2963,8 +2950,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> if (pmd_special(orig))
> return 0;
> - page = pmd_page(orig);
> - refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
> + refs = (end - addr) >> PAGE_SHIFT;
> + page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
Ah I see we use page_folio() in try_grab_folio_fast() so this being within PMD is ok.
> folio = try_grab_folio_fast(page, refs, flags);
> if (!folio)
> @@ -2985,6 +2972,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> }
> *nr += refs;
> + for (; refs; refs--)
> + *(pages++) = page++;
> folio_set_referenced(folio);
> return 1;
> }
> @@ -3003,8 +2992,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
> if (pud_special(orig))
> return 0;
> - page = pud_page(orig);
> - refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
> + refs = (end - addr) >> PAGE_SHIFT;
> + page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> folio = try_grab_folio_fast(page, refs, flags);
> if (!folio)
> @@ -3026,6 +3015,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
> }
> *nr += refs;
> + for (; refs; refs--)
> + *(pages++) = page++;
> folio_set_referenced(folio);
> return 1;
> }
>
>
> The nice thing is that we only record pages in the array if they actually passed our tests.
Yeah that's nice actually.
This is fine (not the meme :P)
So yes let's do this!
>
>
> --
> Cheers
>
> David / dhildenb
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages
2025-08-29 15:19 ` Lorenzo Stoakes
@ 2025-09-01 11:35 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-09-01 11:35 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
>>
>>
>> The nice thing is that we only record pages in the array if they actually passed our tests.
>
> Yeah that's nice actually.
>
> This is fine (not the meme :P)
:D
>
> So yes let's do this!
That leaves us with the following on top of this patch:
From 4533c6e3590cab0c53e81045624d5949e0ad9015 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Fri, 29 Aug 2025 15:41:45 +0200
Subject: [PATCH] mm/gup: remove record_subpages()
We can just cleanup the code by calculating the #refs earlier,
so we can just inline what remains of record_subpages().
Calculate the number of references/pages ahead of times, and record them
only once all our tests passed.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/gup.c | 25 ++++++++-----------------
1 file changed, 8 insertions(+), 17 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 89ca0813791ab..5a72a135ec70b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -484,19 +484,6 @@ static inline void mm_set_has_pinned_flag(struct mm_struct *mm)
#ifdef CONFIG_MMU
#ifdef CONFIG_HAVE_GUP_FAST
-static int record_subpages(struct page *page, unsigned long sz,
- unsigned long addr, unsigned long end,
- struct page **pages)
-{
- int nr;
-
- page += (addr & (sz - 1)) >> PAGE_SHIFT;
- for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
- pages[nr] = page++;
-
- return nr;
-}
-
/**
* try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
* @page: pointer to page to be grabbed
@@ -2963,8 +2950,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
if (pmd_special(orig))
return 0;
- page = pmd_page(orig);
- refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
+ refs = (end - addr) >> PAGE_SHIFT;
+ page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
@@ -2985,6 +2972,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
}
*nr += refs;
+ for (; refs; refs--)
+ *(pages++) = page++;
folio_set_referenced(folio);
return 1;
}
@@ -3003,8 +2992,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
if (pud_special(orig))
return 0;
- page = pud_page(orig);
- refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
+ refs = (end - addr) >> PAGE_SHIFT;
+ page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
@@ -3026,6 +3015,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
}
*nr += refs;
+ for (; refs; refs--)
+ *(pages++) = page++;
folio_set_referenced(folio);
return 1;
}
--
2.50.1
--
Cheers
David / dhildenb
^ permalink raw reply related [flat|nested] 108+ messages in thread
* [PATCH v1 19/36] io_uring/zcrx: remove nth_page() usage within folio
[not found] <20250827220141.262669-1-david@redhat.com>
` (12 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 18/36] mm/gup: drop nth_page() usage within folio when recording subpages David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 16:48 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages() David Hildenbrand
` (11 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Pavel Begunkov, Jens Axboe,
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Johannes Weiner,
John Hubbard, kasan-dev, kvm, Liam R. Howlett, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
Within a folio/compound page, nth_page() is no longer required.
Given that we call folio_test_partial_kmap()+kmap_local_page(), the code
would already be problematic if the pages would span multiple folios.
So let's just assume that all src pages belong to a single
folio/compound page and can be iterated ordinarily. The dst page is
currently always a single page, so we're not actually iterating
anything.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
io_uring/zcrx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index e5ff49f3425e0..18c12f4b56b6c 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -975,9 +975,9 @@ static ssize_t io_copy_page(struct io_copy_cache *cc, struct page *src_page,
if (folio_test_partial_kmap(page_folio(dst_page)) ||
folio_test_partial_kmap(page_folio(src_page))) {
- dst_page = nth_page(dst_page, dst_offset / PAGE_SIZE);
+ dst_page += dst_offset / PAGE_SIZE;
dst_offset = offset_in_page(dst_offset);
- src_page = nth_page(src_page, src_offset / PAGE_SIZE);
+ src_page += src_offset / PAGE_SIZE;
src_offset = offset_in_page(src_offset);
n = min(PAGE_SIZE - src_offset, PAGE_SIZE - dst_offset);
n = min(n, len);
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 19/36] io_uring/zcrx: remove nth_page() usage within folio
2025-08-27 22:01 ` [PATCH v1 19/36] io_uring/zcrx: remove nth_page() usage within folio David Hildenbrand
@ 2025-08-28 16:48 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 16:48 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Pavel Begunkov, Jens Axboe, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:23AM +0200, David Hildenbrand wrote:
> Within a folio/compound page, nth_page() is no longer required.
> Given that we call folio_test_partial_kmap()+kmap_local_page(), the code
> would already be problematic if the pages would span multiple folios.
>
> So let's just assume that all src pages belong to a single
> folio/compound page and can be iterated ordinarily. The dst page is
> currently always a single page, so we're not actually iterating
> anything.
>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
On basis of src pages being within the same folio, LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> io_uring/zcrx.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index e5ff49f3425e0..18c12f4b56b6c 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -975,9 +975,9 @@ static ssize_t io_copy_page(struct io_copy_cache *cc, struct page *src_page,
>
> if (folio_test_partial_kmap(page_folio(dst_page)) ||
> folio_test_partial_kmap(page_folio(src_page))) {
> - dst_page = nth_page(dst_page, dst_offset / PAGE_SIZE);
> + dst_page += dst_offset / PAGE_SIZE;
> dst_offset = offset_in_page(dst_offset);
> - src_page = nth_page(src_page, src_offset / PAGE_SIZE);
> + src_page += src_offset / PAGE_SIZE;
> src_offset = offset_in_page(src_offset);
> n = min(PAGE_SIZE - src_offset, PAGE_SIZE - dst_offset);
> n = min(n, len);
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
[not found] <20250827220141.262669-1-david@redhat.com>
` (13 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 19/36] io_uring/zcrx: remove nth_page() usage within folio David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 16:57 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges David Hildenbrand
` (10 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Thomas Bogendoerfer, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Let's make it clearer that we are operating within a single folio by
providing both the folio and the page.
This implies that for flush_dcache_folio() we'll now avoid one more
page->folio lookup, and that we can safely drop the "nth_page" usage.
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/mips/include/asm/cacheflush.h | 11 +++++++----
arch/mips/mm/cache.c | 8 ++++----
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 5d283ef89d90d..8d79bfc687d21 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -50,13 +50,14 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
extern void (*flush_cache_range)(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
-extern void __flush_dcache_pages(struct page *page, unsigned int nr);
+extern void __flush_dcache_folio_pages(struct folio *folio, struct page *page, unsigned int nr);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
static inline void flush_dcache_folio(struct folio *folio)
{
if (cpu_has_dc_aliases)
- __flush_dcache_pages(&folio->page, folio_nr_pages(folio));
+ __flush_dcache_folio_pages(folio, folio_page(folio, 0),
+ folio_nr_pages(folio));
else if (!cpu_has_ic_fills_f_dc)
folio_set_dcache_dirty(folio);
}
@@ -64,10 +65,12 @@ static inline void flush_dcache_folio(struct folio *folio)
static inline void flush_dcache_page(struct page *page)
{
+ struct folio *folio = page_folio(page);
+
if (cpu_has_dc_aliases)
- __flush_dcache_pages(page, 1);
+ __flush_dcache_folio_pages(folio, page, folio_nr_pages(folio));
else if (!cpu_has_ic_fills_f_dc)
- folio_set_dcache_dirty(page_folio(page));
+ folio_set_dcache_dirty(folio);
}
#define flush_dcache_mmap_lock(mapping) do { } while (0)
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index bf9a37c60e9f0..e3b4224c9a406 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -99,9 +99,9 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
return 0;
}
-void __flush_dcache_pages(struct page *page, unsigned int nr)
+void __flush_dcache_folio_pages(struct folio *folio, struct page *page,
+ unsigned int nr)
{
- struct folio *folio = page_folio(page);
struct address_space *mapping = folio_flush_mapping(folio);
unsigned long addr;
unsigned int i;
@@ -117,12 +117,12 @@ void __flush_dcache_pages(struct page *page, unsigned int nr)
* get faulted into the tlb (and thus flushed) anyways.
*/
for (i = 0; i < nr; i++) {
- addr = (unsigned long)kmap_local_page(nth_page(page, i));
+ addr = (unsigned long)kmap_local_page(page + i);
flush_data_cache_page(addr);
kunmap_local((void *)addr);
}
}
-EXPORT_SYMBOL(__flush_dcache_pages);
+EXPORT_SYMBOL(__flush_dcache_folio_pages);
void __flush_anon_page(struct page *page, unsigned long vmaddr)
{
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
2025-08-27 22:01 ` [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages() David Hildenbrand
@ 2025-08-28 16:57 ` Lorenzo Stoakes
2025-08-28 20:51 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 16:57 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Thomas Bogendoerfer, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:24AM +0200, David Hildenbrand wrote:
> Let's make it clearer that we are operating within a single folio by
> providing both the folio and the page.
>
> This implies that for flush_dcache_folio() we'll now avoid one more
> page->folio lookup, and that we can safely drop the "nth_page" usage.
>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> arch/mips/include/asm/cacheflush.h | 11 +++++++----
> arch/mips/mm/cache.c | 8 ++++----
> 2 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index 5d283ef89d90d..8d79bfc687d21 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -50,13 +50,14 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
> extern void (*flush_cache_range)(struct vm_area_struct *vma,
> unsigned long start, unsigned long end);
> extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> -extern void __flush_dcache_pages(struct page *page, unsigned int nr);
> +extern void __flush_dcache_folio_pages(struct folio *folio, struct page *page, unsigned int nr);
NIT: Be good to drop the extern.
>
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> static inline void flush_dcache_folio(struct folio *folio)
> {
> if (cpu_has_dc_aliases)
> - __flush_dcache_pages(&folio->page, folio_nr_pages(folio));
> + __flush_dcache_folio_pages(folio, folio_page(folio, 0),
> + folio_nr_pages(folio));
> else if (!cpu_has_ic_fills_f_dc)
> folio_set_dcache_dirty(folio);
> }
> @@ -64,10 +65,12 @@ static inline void flush_dcache_folio(struct folio *folio)
>
> static inline void flush_dcache_page(struct page *page)
> {
> + struct folio *folio = page_folio(page);
> +
> if (cpu_has_dc_aliases)
> - __flush_dcache_pages(page, 1);
> + __flush_dcache_folio_pages(folio, page, folio_nr_pages(folio));
Hmmm, shouldn't this be 1 not folio_nr_pages()? Seems that the original
implementation only flushed a single page even if contained within a larger
folio?
> else if (!cpu_has_ic_fills_f_dc)
> - folio_set_dcache_dirty(page_folio(page));
> + folio_set_dcache_dirty(folio);
> }
>
> #define flush_dcache_mmap_lock(mapping) do { } while (0)
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index bf9a37c60e9f0..e3b4224c9a406 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -99,9 +99,9 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
> return 0;
> }
>
> -void __flush_dcache_pages(struct page *page, unsigned int nr)
> +void __flush_dcache_folio_pages(struct folio *folio, struct page *page,
> + unsigned int nr)
> {
> - struct folio *folio = page_folio(page);
> struct address_space *mapping = folio_flush_mapping(folio);
> unsigned long addr;
> unsigned int i;
> @@ -117,12 +117,12 @@ void __flush_dcache_pages(struct page *page, unsigned int nr)
> * get faulted into the tlb (and thus flushed) anyways.
> */
> for (i = 0; i < nr; i++) {
> - addr = (unsigned long)kmap_local_page(nth_page(page, i));
> + addr = (unsigned long)kmap_local_page(page + i);
> flush_data_cache_page(addr);
> kunmap_local((void *)addr);
> }
> }
> -EXPORT_SYMBOL(__flush_dcache_pages);
> +EXPORT_SYMBOL(__flush_dcache_folio_pages);
>
> void __flush_anon_page(struct page *page, unsigned long vmaddr)
> {
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
2025-08-28 16:57 ` Lorenzo Stoakes
@ 2025-08-28 20:51 ` David Hildenbrand
2025-08-29 12:51 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-28 20:51 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Thomas Bogendoerfer, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On 28.08.25 18:57, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:24AM +0200, David Hildenbrand wrote:
>> Let's make it clearer that we are operating within a single folio by
>> providing both the folio and the page.
>>
>> This implies that for flush_dcache_folio() we'll now avoid one more
>> page->folio lookup, and that we can safely drop the "nth_page" usage.
>>
>> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> arch/mips/include/asm/cacheflush.h | 11 +++++++----
>> arch/mips/mm/cache.c | 8 ++++----
>> 2 files changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
>> index 5d283ef89d90d..8d79bfc687d21 100644
>> --- a/arch/mips/include/asm/cacheflush.h
>> +++ b/arch/mips/include/asm/cacheflush.h
>> @@ -50,13 +50,14 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
>> extern void (*flush_cache_range)(struct vm_area_struct *vma,
>> unsigned long start, unsigned long end);
>> extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
>> -extern void __flush_dcache_pages(struct page *page, unsigned int nr);
>> +extern void __flush_dcache_folio_pages(struct folio *folio, struct page *page, unsigned int nr);
>
> NIT: Be good to drop the extern.
I think I'll leave the one in, though, someone should clean up all of
them in one go.
Just imagine how the other functions would think about the new guy
showing off here. :)
>
>>
>> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>> static inline void flush_dcache_folio(struct folio *folio)
>> {
>> if (cpu_has_dc_aliases)
>> - __flush_dcache_pages(&folio->page, folio_nr_pages(folio));
>> + __flush_dcache_folio_pages(folio, folio_page(folio, 0),
>> + folio_nr_pages(folio));
>> else if (!cpu_has_ic_fills_f_dc)
>> folio_set_dcache_dirty(folio);
>> }
>> @@ -64,10 +65,12 @@ static inline void flush_dcache_folio(struct folio *folio)
>>
>> static inline void flush_dcache_page(struct page *page)
>> {
>> + struct folio *folio = page_folio(page);
>> +
>> if (cpu_has_dc_aliases)
>> - __flush_dcache_pages(page, 1);
>> + __flush_dcache_folio_pages(folio, page, folio_nr_pages(folio));
>
> Hmmm, shouldn't this be 1 not folio_nr_pages()? Seems that the original
> implementation only flushed a single page even if contained within a larger
> folio?
Yes, reworked it 3 times and messed it up during the last rework. Thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
2025-08-28 20:51 ` David Hildenbrand
@ 2025-08-29 12:51 ` Lorenzo Stoakes
2025-08-29 13:44 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 12:51 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Thomas Bogendoerfer, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 10:51:46PM +0200, David Hildenbrand wrote:
> On 28.08.25 18:57, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:24AM +0200, David Hildenbrand wrote:
> > > Let's make it clearer that we are operating within a single folio by
> > > providing both the folio and the page.
> > >
> > > This implies that for flush_dcache_folio() we'll now avoid one more
> > > page->folio lookup, and that we can safely drop the "nth_page" usage.
> > >
> > > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > arch/mips/include/asm/cacheflush.h | 11 +++++++----
> > > arch/mips/mm/cache.c | 8 ++++----
> > > 2 files changed, 11 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > > index 5d283ef89d90d..8d79bfc687d21 100644
> > > --- a/arch/mips/include/asm/cacheflush.h
> > > +++ b/arch/mips/include/asm/cacheflush.h
> > > @@ -50,13 +50,14 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
> > > extern void (*flush_cache_range)(struct vm_area_struct *vma,
> > > unsigned long start, unsigned long end);
> > > extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> > > -extern void __flush_dcache_pages(struct page *page, unsigned int nr);
> > > +extern void __flush_dcache_folio_pages(struct folio *folio, struct page *page, unsigned int nr);
> >
> > NIT: Be good to drop the extern.
>
> I think I'll leave the one in, though, someone should clean up all of them
> in one go.
This is how we always clean these up though, buuut to be fair that's in mm.
>
> Just imagine how the other functions would think about the new guy showing
> off here. :)
;)
>
> >
> > >
> > > #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> > > static inline void flush_dcache_folio(struct folio *folio)
> > > {
> > > if (cpu_has_dc_aliases)
> > > - __flush_dcache_pages(&folio->page, folio_nr_pages(folio));
> > > + __flush_dcache_folio_pages(folio, folio_page(folio, 0),
> > > + folio_nr_pages(folio));
> > > else if (!cpu_has_ic_fills_f_dc)
> > > folio_set_dcache_dirty(folio);
> > > }
> > > @@ -64,10 +65,12 @@ static inline void flush_dcache_folio(struct folio *folio)
> > >
> > > static inline void flush_dcache_page(struct page *page)
> > > {
> > > + struct folio *folio = page_folio(page);
> > > +
> > > if (cpu_has_dc_aliases)
> > > - __flush_dcache_pages(page, 1);
> > > + __flush_dcache_folio_pages(folio, page, folio_nr_pages(folio));
> >
> > Hmmm, shouldn't this be 1 not folio_nr_pages()? Seems that the original
> > implementation only flushed a single page even if contained within a larger
> > folio?
>
> Yes, reworked it 3 times and messed it up during the last rework. Thanks!
Woot I found an actual bug :P
Yeah it's fiddly so understandable. :)
>
> --
> Cheers
>
> David / dhildenb
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
2025-08-29 12:51 ` Lorenzo Stoakes
@ 2025-08-29 13:44 ` David Hildenbrand
2025-08-29 14:45 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 13:44 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Thomas Bogendoerfer, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On 29.08.25 14:51, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 10:51:46PM +0200, David Hildenbrand wrote:
>> On 28.08.25 18:57, Lorenzo Stoakes wrote:
>>> On Thu, Aug 28, 2025 at 12:01:24AM +0200, David Hildenbrand wrote:
>>>> Let's make it clearer that we are operating within a single folio by
>>>> providing both the folio and the page.
>>>>
>>>> This implies that for flush_dcache_folio() we'll now avoid one more
>>>> page->folio lookup, and that we can safely drop the "nth_page" usage.
>>>>
>>>> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> arch/mips/include/asm/cacheflush.h | 11 +++++++----
>>>> arch/mips/mm/cache.c | 8 ++++----
>>>> 2 files changed, 11 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
>>>> index 5d283ef89d90d..8d79bfc687d21 100644
>>>> --- a/arch/mips/include/asm/cacheflush.h
>>>> +++ b/arch/mips/include/asm/cacheflush.h
>>>> @@ -50,13 +50,14 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
>>>> extern void (*flush_cache_range)(struct vm_area_struct *vma,
>>>> unsigned long start, unsigned long end);
>>>> extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
>>>> -extern void __flush_dcache_pages(struct page *page, unsigned int nr);
>>>> +extern void __flush_dcache_folio_pages(struct folio *folio, struct page *page, unsigned int nr);
>>>
>>> NIT: Be good to drop the extern.
>>
>> I think I'll leave the one in, though, someone should clean up all of them
>> in one go.
>
> This is how we always clean these up though, buuut to be fair that's in mm.
>
Well, okay, I'll make all the other functions jealous and blame it on
you! :P
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
2025-08-29 13:44 ` David Hildenbrand
@ 2025-08-29 14:45 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 14:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Thomas Bogendoerfer, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Fri, Aug 29, 2025 at 03:44:20PM +0200, David Hildenbrand wrote:
> On 29.08.25 14:51, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 10:51:46PM +0200, David Hildenbrand wrote:
> > > On 28.08.25 18:57, Lorenzo Stoakes wrote:
> > > > On Thu, Aug 28, 2025 at 12:01:24AM +0200, David Hildenbrand wrote:
> > > > > Let's make it clearer that we are operating within a single folio by
> > > > > providing both the folio and the page.
> > > > >
> > > > > This implies that for flush_dcache_folio() we'll now avoid one more
> > > > > page->folio lookup, and that we can safely drop the "nth_page" usage.
> > > > >
> > > > > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> > > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > > ---
> > > > > arch/mips/include/asm/cacheflush.h | 11 +++++++----
> > > > > arch/mips/mm/cache.c | 8 ++++----
> > > > > 2 files changed, 11 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > > > > index 5d283ef89d90d..8d79bfc687d21 100644
> > > > > --- a/arch/mips/include/asm/cacheflush.h
> > > > > +++ b/arch/mips/include/asm/cacheflush.h
> > > > > @@ -50,13 +50,14 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
> > > > > extern void (*flush_cache_range)(struct vm_area_struct *vma,
> > > > > unsigned long start, unsigned long end);
> > > > > extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> > > > > -extern void __flush_dcache_pages(struct page *page, unsigned int nr);
> > > > > +extern void __flush_dcache_folio_pages(struct folio *folio, struct page *page, unsigned int nr);
> > > >
> > > > NIT: Be good to drop the extern.
> > >
> > > I think I'll leave the one in, though, someone should clean up all of them
> > > in one go.
> >
> > This is how we always clean these up though, buuut to be fair that's in mm.
> >
>
> Well, okay, I'll make all the other functions jealous and blame it on you!
> :P
;)
>
> --
> Cheers
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges
[not found] <20250827220141.262669-1-david@redhat.com>
` (14 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 20/36] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 17:28 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 22/36] dma-remap: drop nth_page() in dma_common_contiguous_remap() David Hildenbrand
` (9 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexandru Elisei, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Let's disallow handing out PFN ranges with non-contiguous pages, so we
can remove the nth-page usage in __cma_alloc(), and so any callers don't
have to worry about that either when wanting to blindly iterate pages.
This is really only a problem in configs with SPARSEMEM but without
SPARSEMEM_VMEMMAP, and only when we would cross memory sections in some
cases.
Will this cause harm? Probably not, because it's mostly 32bit that does
not support SPARSEMEM_VMEMMAP. If this ever becomes a problem we could
look into allocating the memmap for the memory sections spanned by a
single CMA region in one go from memblock.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 6 ++++++
mm/cma.c | 39 ++++++++++++++++++++++++---------------
mm/util.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 15 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6880e3225c5c..2ca1eb2db63ec 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -209,9 +209,15 @@ extern unsigned long sysctl_user_reserve_kbytes;
extern unsigned long sysctl_admin_reserve_kbytes;
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
#else
#define nth_page(page,n) ((page) + (n))
+static inline bool page_range_contiguous(const struct page *page,
+ unsigned long nr_pages)
+{
+ return true;
+}
#endif
/* to align the pointer to the (next) page boundary */
diff --git a/mm/cma.c b/mm/cma.c
index e56ec64d0567e..813e6dc7b0954 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -780,10 +780,8 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
unsigned long count, unsigned int align,
struct page **pagep, gfp_t gfp)
{
- unsigned long mask, offset;
- unsigned long pfn = -1;
- unsigned long start = 0;
unsigned long bitmap_maxno, bitmap_no, bitmap_count;
+ unsigned long start, pfn, mask, offset;
int ret = -EBUSY;
struct page *page = NULL;
@@ -795,7 +793,7 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
if (bitmap_count > bitmap_maxno)
goto out;
- for (;;) {
+ for (start = 0; ; start = bitmap_no + mask + 1) {
spin_lock_irq(&cma->lock);
/*
* If the request is larger than the available number
@@ -812,6 +810,22 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
spin_unlock_irq(&cma->lock);
break;
}
+
+ pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
+ page = pfn_to_page(pfn);
+
+ /*
+ * Do not hand out page ranges that are not contiguous, so
+ * callers can just iterate the pages without having to worry
+ * about these corner cases.
+ */
+ if (!page_range_contiguous(page, count)) {
+ spin_unlock_irq(&cma->lock);
+ pr_warn_ratelimited("%s: %s: skipping incompatible area [0x%lx-0x%lx]",
+ __func__, cma->name, pfn, pfn + count - 1);
+ continue;
+ }
+
bitmap_set(cmr->bitmap, bitmap_no, bitmap_count);
cma->available_count -= count;
/*
@@ -821,29 +835,24 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
*/
spin_unlock_irq(&cma->lock);
- pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
mutex_lock(&cma->alloc_mutex);
ret = alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
mutex_unlock(&cma->alloc_mutex);
- if (ret == 0) {
- page = pfn_to_page(pfn);
+ if (!ret)
break;
- }
cma_clear_bitmap(cma, cmr, pfn, count);
if (ret != -EBUSY)
break;
pr_debug("%s(): memory range at pfn 0x%lx %p is busy, retrying\n",
- __func__, pfn, pfn_to_page(pfn));
+ __func__, pfn, page);
- trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
- count, align);
- /* try again with a bit different memory target */
- start = bitmap_no + mask + 1;
+ trace_cma_alloc_busy_retry(cma->name, pfn, page, count, align);
}
out:
- *pagep = page;
+ if (!ret)
+ *pagep = page;
return ret;
}
@@ -882,7 +891,7 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
*/
if (page) {
for (i = 0; i < count; i++)
- page_kasan_tag_reset(nth_page(page, i));
+ page_kasan_tag_reset(page + i);
}
if (ret && !(gfp & __GFP_NOWARN)) {
diff --git a/mm/util.c b/mm/util.c
index d235b74f7aff7..0bf349b19b652 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1280,4 +1280,37 @@ unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
{
return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, 0);
}
+
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/**
+ * page_range_contiguous - test whether the page range is contiguous
+ * @page: the start of the page range.
+ * @nr_pages: the number of pages in the range.
+ *
+ * Test whether the page range is contiguous, such that they can be iterated
+ * naively, corresponding to iterating a contiguous PFN range.
+ *
+ * This function should primarily only be used for debug checks, or when
+ * working with page ranges that are not naturally contiguous (e.g., pages
+ * within a folio are).
+ *
+ * Returns true if contiguous, otherwise false.
+ */
+bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
+{
+ const unsigned long start_pfn = page_to_pfn(page);
+ const unsigned long end_pfn = start_pfn + nr_pages;
+ unsigned long pfn;
+
+ /*
+ * The memmap is allocated per memory section. We need to check
+ * each involved memory section once.
+ */
+ for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
+ pfn < end_pfn; pfn += PAGES_PER_SECTION)
+ if (unlikely(page + (pfn - start_pfn) != pfn_to_page(pfn)))
+ return false;
+ return true;
+}
+#endif
#endif /* CONFIG_MMU */
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges
2025-08-27 22:01 ` [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges David Hildenbrand
@ 2025-08-28 17:28 ` Lorenzo Stoakes
2025-08-29 14:34 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 17:28 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexandru Elisei, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:25AM +0200, David Hildenbrand wrote:
> Let's disallow handing out PFN ranges with non-contiguous pages, so we
> can remove the nth-page usage in __cma_alloc(), and so any callers don't
> have to worry about that either when wanting to blindly iterate pages.
>
> This is really only a problem in configs with SPARSEMEM but without
> SPARSEMEM_VMEMMAP, and only when we would cross memory sections in some
> cases.
I'm guessing this is something that we don't need to worry about in
reality?
>
> Will this cause harm? Probably not, because it's mostly 32bit that does
> not support SPARSEMEM_VMEMMAP. If this ever becomes a problem we could
> look into allocating the memmap for the memory sections spanned by a
> single CMA region in one go from memblock.
>
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM other than refactoring point below.
CMA stuff looks fine afaict after staring at it for a while, on proviso
that handing out ranges within the same section is always going to be the
case.
Anyway overall,
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mm.h | 6 ++++++
> mm/cma.c | 39 ++++++++++++++++++++++++---------------
> mm/util.c | 33 +++++++++++++++++++++++++++++++++
> 3 files changed, 63 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f6880e3225c5c..2ca1eb2db63ec 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -209,9 +209,15 @@ extern unsigned long sysctl_user_reserve_kbytes;
> extern unsigned long sysctl_admin_reserve_kbytes;
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> #else
> #define nth_page(page,n) ((page) + (n))
> +static inline bool page_range_contiguous(const struct page *page,
> + unsigned long nr_pages)
> +{
> + return true;
> +}
> #endif
>
> /* to align the pointer to the (next) page boundary */
> diff --git a/mm/cma.c b/mm/cma.c
> index e56ec64d0567e..813e6dc7b0954 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -780,10 +780,8 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> unsigned long count, unsigned int align,
> struct page **pagep, gfp_t gfp)
> {
> - unsigned long mask, offset;
> - unsigned long pfn = -1;
> - unsigned long start = 0;
> unsigned long bitmap_maxno, bitmap_no, bitmap_count;
> + unsigned long start, pfn, mask, offset;
> int ret = -EBUSY;
> struct page *page = NULL;
>
> @@ -795,7 +793,7 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> if (bitmap_count > bitmap_maxno)
> goto out;
>
> - for (;;) {
> + for (start = 0; ; start = bitmap_no + mask + 1) {
> spin_lock_irq(&cma->lock);
> /*
> * If the request is larger than the available number
> @@ -812,6 +810,22 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> spin_unlock_irq(&cma->lock);
> break;
> }
> +
> + pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> + page = pfn_to_page(pfn);
> +
> + /*
> + * Do not hand out page ranges that are not contiguous, so
> + * callers can just iterate the pages without having to worry
> + * about these corner cases.
> + */
> + if (!page_range_contiguous(page, count)) {
> + spin_unlock_irq(&cma->lock);
> + pr_warn_ratelimited("%s: %s: skipping incompatible area [0x%lx-0x%lx]",
> + __func__, cma->name, pfn, pfn + count - 1);
> + continue;
> + }
> +
> bitmap_set(cmr->bitmap, bitmap_no, bitmap_count);
> cma->available_count -= count;
> /*
> @@ -821,29 +835,24 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> */
> spin_unlock_irq(&cma->lock);
>
> - pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> mutex_lock(&cma->alloc_mutex);
> ret = alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
> mutex_unlock(&cma->alloc_mutex);
> - if (ret == 0) {
> - page = pfn_to_page(pfn);
> + if (!ret)
> break;
> - }
>
> cma_clear_bitmap(cma, cmr, pfn, count);
> if (ret != -EBUSY)
> break;
>
> pr_debug("%s(): memory range at pfn 0x%lx %p is busy, retrying\n",
> - __func__, pfn, pfn_to_page(pfn));
> + __func__, pfn, page);
>
> - trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
> - count, align);
> - /* try again with a bit different memory target */
> - start = bitmap_no + mask + 1;
> + trace_cma_alloc_busy_retry(cma->name, pfn, page, count, align);
> }
> out:
> - *pagep = page;
> + if (!ret)
> + *pagep = page;
> return ret;
> }
>
> @@ -882,7 +891,7 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> */
> if (page) {
> for (i = 0; i < count; i++)
> - page_kasan_tag_reset(nth_page(page, i));
> + page_kasan_tag_reset(page + i);
> }
>
> if (ret && !(gfp & __GFP_NOWARN)) {
> diff --git a/mm/util.c b/mm/util.c
> index d235b74f7aff7..0bf349b19b652 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1280,4 +1280,37 @@ unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
> {
> return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, 0);
> }
> +
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/**
> + * page_range_contiguous - test whether the page range is contiguous
> + * @page: the start of the page range.
> + * @nr_pages: the number of pages in the range.
> + *
> + * Test whether the page range is contiguous, such that they can be iterated
> + * naively, corresponding to iterating a contiguous PFN range.
> + *
> + * This function should primarily only be used for debug checks, or when
> + * working with page ranges that are not naturally contiguous (e.g., pages
> + * within a folio are).
> + *
> + * Returns true if contiguous, otherwise false.
> + */
> +bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> +{
> + const unsigned long start_pfn = page_to_pfn(page);
> + const unsigned long end_pfn = start_pfn + nr_pages;
> + unsigned long pfn;
> +
> + /*
> + * The memmap is allocated per memory section. We need to check
> + * each involved memory section once.
> + */
> + for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
> + pfn < end_pfn; pfn += PAGES_PER_SECTION)
> + if (unlikely(page + (pfn - start_pfn) != pfn_to_page(pfn)))
> + return false;
I find this pretty confusing, my test for this is how many times I have to read
the code to understand what it's doing :)
So we have something like:
(pfn of page)
start_pfn pfn = align UP
| |
v v
| section |
<----------------->
pfn - start_pfn
Then check page + (pfn - start_pfn) == pfn_to_page(pfn)
And loop such that:
(pfn of page)
start_pfn pfn
| |
v v
| section | section |
<------------------------------------------>
pfn - start_pfn
Again check page + (pfn - start_pfn) == pfn_to_page(pfn)
And so on.
So the logic looks good, but it's just... that took me a hot second to
parse :)
I think a few simple fixups
bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
{
const unsigned long start_pfn = page_to_pfn(page);
const unsigned long end_pfn = start_pfn + nr_pages;
/* The PFN of the start of the next section. */
unsigned long pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
/* The page we'd expected to see if the range were contiguous. */
struct page *expected = page + (pfn - start_pfn);
/*
* The memmap is allocated per memory section. We need to check
* each involved memory section once.
*/
for (; pfn < end_pfn; pfn += PAGES_PER_SECTION, expected += PAGES_PER_SECTION)
if (unlikely(expected != pfn_to_page(pfn)))
return false;
return true;
}
> + return true;
> +}
> +#endif
> #endif /* CONFIG_MMU */
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges
2025-08-28 17:28 ` Lorenzo Stoakes
@ 2025-08-29 14:34 ` David Hildenbrand
2025-08-29 14:44 ` Lorenzo Stoakes
0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 14:34 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexandru Elisei, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On 28.08.25 19:28, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:25AM +0200, David Hildenbrand wrote:
>> Let's disallow handing out PFN ranges with non-contiguous pages, so we
>> can remove the nth-page usage in __cma_alloc(), and so any callers don't
>> have to worry about that either when wanting to blindly iterate pages.
>>
>> This is really only a problem in configs with SPARSEMEM but without
>> SPARSEMEM_VMEMMAP, and only when we would cross memory sections in some
>> cases.
>
> I'm guessing this is something that we don't need to worry about in
> reality?
That my theory yes.
>
>>
>> Will this cause harm? Probably not, because it's mostly 32bit that does
>> not support SPARSEMEM_VMEMMAP. If this ever becomes a problem we could
>> look into allocating the memmap for the memory sections spanned by a
>> single CMA region in one go from memblock.
>>
>> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> LGTM other than refactoring point below.
>
> CMA stuff looks fine afaict after staring at it for a while, on proviso
> that handing out ranges within the same section is always going to be the
> case.
>
> Anyway overall,
>
> LGTM, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
>
>> ---
>> include/linux/mm.h | 6 ++++++
>> mm/cma.c | 39 ++++++++++++++++++++++++---------------
>> mm/util.c | 33 +++++++++++++++++++++++++++++++++
>> 3 files changed, 63 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index f6880e3225c5c..2ca1eb2db63ec 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -209,9 +209,15 @@ extern unsigned long sysctl_user_reserve_kbytes;
>> extern unsigned long sysctl_admin_reserve_kbytes;
>>
>> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>> +bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
>> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
>> #else
>> #define nth_page(page,n) ((page) + (n))
>> +static inline bool page_range_contiguous(const struct page *page,
>> + unsigned long nr_pages)
>> +{
>> + return true;
>> +}
>> #endif
>>
>> /* to align the pointer to the (next) page boundary */
>> diff --git a/mm/cma.c b/mm/cma.c
>> index e56ec64d0567e..813e6dc7b0954 100644
>> --- a/mm/cma.c
>> +++ b/mm/cma.c
>> @@ -780,10 +780,8 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
>> unsigned long count, unsigned int align,
>> struct page **pagep, gfp_t gfp)
>> {
>> - unsigned long mask, offset;
>> - unsigned long pfn = -1;
>> - unsigned long start = 0;
>> unsigned long bitmap_maxno, bitmap_no, bitmap_count;
>> + unsigned long start, pfn, mask, offset;
>> int ret = -EBUSY;
>> struct page *page = NULL;
>>
>> @@ -795,7 +793,7 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
>> if (bitmap_count > bitmap_maxno)
>> goto out;
>>
>> - for (;;) {
>> + for (start = 0; ; start = bitmap_no + mask + 1) {
>> spin_lock_irq(&cma->lock);
>> /*
>> * If the request is larger than the available number
>> @@ -812,6 +810,22 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
>> spin_unlock_irq(&cma->lock);
>> break;
>> }
>> +
>> + pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
>> + page = pfn_to_page(pfn);
>> +
>> + /*
>> + * Do not hand out page ranges that are not contiguous, so
>> + * callers can just iterate the pages without having to worry
>> + * about these corner cases.
>> + */
>> + if (!page_range_contiguous(page, count)) {
>> + spin_unlock_irq(&cma->lock);
>> + pr_warn_ratelimited("%s: %s: skipping incompatible area [0x%lx-0x%lx]",
>> + __func__, cma->name, pfn, pfn + count - 1);
>> + continue;
>> + }
>> +
>> bitmap_set(cmr->bitmap, bitmap_no, bitmap_count);
>> cma->available_count -= count;
>> /*
>> @@ -821,29 +835,24 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
>> */
>> spin_unlock_irq(&cma->lock);
>>
>> - pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
>> mutex_lock(&cma->alloc_mutex);
>> ret = alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
>> mutex_unlock(&cma->alloc_mutex);
>> - if (ret == 0) {
>> - page = pfn_to_page(pfn);
>> + if (!ret)
>> break;
>> - }
>>
>> cma_clear_bitmap(cma, cmr, pfn, count);
>> if (ret != -EBUSY)
>> break;
>>
>> pr_debug("%s(): memory range at pfn 0x%lx %p is busy, retrying\n",
>> - __func__, pfn, pfn_to_page(pfn));
>> + __func__, pfn, page);
>>
>> - trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
>> - count, align);
>> - /* try again with a bit different memory target */
>> - start = bitmap_no + mask + 1;
>> + trace_cma_alloc_busy_retry(cma->name, pfn, page, count, align);
>> }
>> out:
>> - *pagep = page;
>> + if (!ret)
>> + *pagep = page;
>> return ret;
>> }
>>
>> @@ -882,7 +891,7 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
>> */
>> if (page) {
>> for (i = 0; i < count; i++)
>> - page_kasan_tag_reset(nth_page(page, i));
>> + page_kasan_tag_reset(page + i);
>> }
>>
>> if (ret && !(gfp & __GFP_NOWARN)) {
>> diff --git a/mm/util.c b/mm/util.c
>> index d235b74f7aff7..0bf349b19b652 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -1280,4 +1280,37 @@ unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
>> {
>> return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, 0);
>> }
>> +
>> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>> +/**
>> + * page_range_contiguous - test whether the page range is contiguous
>> + * @page: the start of the page range.
>> + * @nr_pages: the number of pages in the range.
>> + *
>> + * Test whether the page range is contiguous, such that they can be iterated
>> + * naively, corresponding to iterating a contiguous PFN range.
>> + *
>> + * This function should primarily only be used for debug checks, or when
>> + * working with page ranges that are not naturally contiguous (e.g., pages
>> + * within a folio are).
>> + *
>> + * Returns true if contiguous, otherwise false.
>> + */
>> +bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
>> +{
>> + const unsigned long start_pfn = page_to_pfn(page);
>> + const unsigned long end_pfn = start_pfn + nr_pages;
>> + unsigned long pfn;
>> +
>> + /*
>> + * The memmap is allocated per memory section. We need to check
>> + * each involved memory section once.
>> + */
>> + for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
>> + pfn < end_pfn; pfn += PAGES_PER_SECTION)
>> + if (unlikely(page + (pfn - start_pfn) != pfn_to_page(pfn)))
>> + return false;
>
> I find this pretty confusing, my test for this is how many times I have to read
> the code to understand what it's doing :)
>
> So we have something like:
>
> (pfn of page)
> start_pfn pfn = align UP
> | |
> v v
> | section |
> <----------------->
> pfn - start_pfn
>
> Then check page + (pfn - start_pfn) == pfn_to_page(pfn)
>
> And loop such that:
>
> (pfn of page)
> start_pfn pfn
> | |
> v v
> | section | section |
> <------------------------------------------>
> pfn - start_pfn
>
> Again check page + (pfn - start_pfn) == pfn_to_page(pfn)
>
> And so on.
>
> So the logic looks good, but it's just... that took me a hot second to
> parse :)
>
> I think a few simple fixups
>
> bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> {
> const unsigned long start_pfn = page_to_pfn(page);
> const unsigned long end_pfn = start_pfn + nr_pages;
> /* The PFN of the start of the next section. */
> unsigned long pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
> /* The page we'd expected to see if the range were contiguous. */
> struct page *expected = page + (pfn - start_pfn);
>
> /*
> * The memmap is allocated per memory section. We need to check
> * each involved memory section once.
> */
> for (; pfn < end_pfn; pfn += PAGES_PER_SECTION, expected += PAGES_PER_SECTION)
> if (unlikely(expected != pfn_to_page(pfn)))
> return false;
> return true;
> }
>
Hm, I prefer my variant, especially where the pfn is calculated in the for loop. Likely a
matter of personal taste.
But I can see why skipping the first section might be a surprise when not
having the semantics of ALIGN() in the cache.
So I'll add the following on top:
diff --git a/mm/util.c b/mm/util.c
index 0bf349b19b652..fbdb73aaf35fe 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1303,8 +1303,10 @@ bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
unsigned long pfn;
/*
- * The memmap is allocated per memory section. We need to check
- * each involved memory section once.
+ * The memmap is allocated per memory section, so no need to check
+ * within the first section. However, we need to check each other
+ * spanned memory section once, making sure the first page in a
+ * section could similarly be reached by just iterating pages.
*/
for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
pfn < end_pfn; pfn += PAGES_PER_SECTION)
Thanks!
--
Cheers
David / dhildenb
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges
2025-08-29 14:34 ` David Hildenbrand
@ 2025-08-29 14:44 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-29 14:44 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexandru Elisei, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Fri, Aug 29, 2025 at 04:34:54PM +0200, David Hildenbrand wrote:
> On 28.08.25 19:28, Lorenzo Stoakes wrote:
> > On Thu, Aug 28, 2025 at 12:01:25AM +0200, David Hildenbrand wrote:
> > > Let's disallow handing out PFN ranges with non-contiguous pages, so we
> > > can remove the nth-page usage in __cma_alloc(), and so any callers don't
> > > have to worry about that either when wanting to blindly iterate pages.
> > >
> > > This is really only a problem in configs with SPARSEMEM but without
> > > SPARSEMEM_VMEMMAP, and only when we would cross memory sections in some
> > > cases.
> >
> > I'm guessing this is something that we don't need to worry about in
> > reality?
>
> That my theory yes.
Let's hope correct haha, but seems reasonable.
>
> >
> > >
> > > Will this cause harm? Probably not, because it's mostly 32bit that does
> > > not support SPARSEMEM_VMEMMAP. If this ever becomes a problem we could
> > > look into allocating the memmap for the memory sections spanned by a
> > > single CMA region in one go from memblock.
> > >
> > > Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > LGTM other than refactoring point below.
> >
> > CMA stuff looks fine afaict after staring at it for a while, on proviso
> > that handing out ranges within the same section is always going to be the
> > case.
> >
> > Anyway overall,
> >
> > LGTM, so:
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> >
> > > ---
> > > include/linux/mm.h | 6 ++++++
> > > mm/cma.c | 39 ++++++++++++++++++++++++---------------
> > > mm/util.c | 33 +++++++++++++++++++++++++++++++++
> > > 3 files changed, 63 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index f6880e3225c5c..2ca1eb2db63ec 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -209,9 +209,15 @@ extern unsigned long sysctl_user_reserve_kbytes;
> > > extern unsigned long sysctl_admin_reserve_kbytes;
> > >
> > > #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> > > +bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
> > > #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> > > #else
> > > #define nth_page(page,n) ((page) + (n))
> > > +static inline bool page_range_contiguous(const struct page *page,
> > > + unsigned long nr_pages)
> > > +{
> > > + return true;
> > > +}
> > > #endif
> > >
> > > /* to align the pointer to the (next) page boundary */
> > > diff --git a/mm/cma.c b/mm/cma.c
> > > index e56ec64d0567e..813e6dc7b0954 100644
> > > --- a/mm/cma.c
> > > +++ b/mm/cma.c
> > > @@ -780,10 +780,8 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> > > unsigned long count, unsigned int align,
> > > struct page **pagep, gfp_t gfp)
> > > {
> > > - unsigned long mask, offset;
> > > - unsigned long pfn = -1;
> > > - unsigned long start = 0;
> > > unsigned long bitmap_maxno, bitmap_no, bitmap_count;
> > > + unsigned long start, pfn, mask, offset;
> > > int ret = -EBUSY;
> > > struct page *page = NULL;
> > >
> > > @@ -795,7 +793,7 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> > > if (bitmap_count > bitmap_maxno)
> > > goto out;
> > >
> > > - for (;;) {
> > > + for (start = 0; ; start = bitmap_no + mask + 1) {
> > > spin_lock_irq(&cma->lock);
> > > /*
> > > * If the request is larger than the available number
> > > @@ -812,6 +810,22 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> > > spin_unlock_irq(&cma->lock);
> > > break;
> > > }
> > > +
> > > + pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> > > + page = pfn_to_page(pfn);
> > > +
> > > + /*
> > > + * Do not hand out page ranges that are not contiguous, so
> > > + * callers can just iterate the pages without having to worry
> > > + * about these corner cases.
> > > + */
> > > + if (!page_range_contiguous(page, count)) {
> > > + spin_unlock_irq(&cma->lock);
> > > + pr_warn_ratelimited("%s: %s: skipping incompatible area [0x%lx-0x%lx]",
> > > + __func__, cma->name, pfn, pfn + count - 1);
> > > + continue;
> > > + }
> > > +
> > > bitmap_set(cmr->bitmap, bitmap_no, bitmap_count);
> > > cma->available_count -= count;
> > > /*
> > > @@ -821,29 +835,24 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> > > */
> > > spin_unlock_irq(&cma->lock);
> > >
> > > - pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> > > mutex_lock(&cma->alloc_mutex);
> > > ret = alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
> > > mutex_unlock(&cma->alloc_mutex);
> > > - if (ret == 0) {
> > > - page = pfn_to_page(pfn);
> > > + if (!ret)
> > > break;
> > > - }
> > >
> > > cma_clear_bitmap(cma, cmr, pfn, count);
> > > if (ret != -EBUSY)
> > > break;
> > >
> > > pr_debug("%s(): memory range at pfn 0x%lx %p is busy, retrying\n",
> > > - __func__, pfn, pfn_to_page(pfn));
> > > + __func__, pfn, page);
> > >
> > > - trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
> > > - count, align);
> > > - /* try again with a bit different memory target */
> > > - start = bitmap_no + mask + 1;
> > > + trace_cma_alloc_busy_retry(cma->name, pfn, page, count, align);
> > > }
> > > out:
> > > - *pagep = page;
> > > + if (!ret)
> > > + *pagep = page;
> > > return ret;
> > > }
> > >
> > > @@ -882,7 +891,7 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> > > */
> > > if (page) {
> > > for (i = 0; i < count; i++)
> > > - page_kasan_tag_reset(nth_page(page, i));
> > > + page_kasan_tag_reset(page + i);
> > > }
> > >
> > > if (ret && !(gfp & __GFP_NOWARN)) {
> > > diff --git a/mm/util.c b/mm/util.c
> > > index d235b74f7aff7..0bf349b19b652 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -1280,4 +1280,37 @@ unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
> > > {
> > > return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, 0);
> > > }
> > > +
> > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> > > +/**
> > > + * page_range_contiguous - test whether the page range is contiguous
> > > + * @page: the start of the page range.
> > > + * @nr_pages: the number of pages in the range.
> > > + *
> > > + * Test whether the page range is contiguous, such that they can be iterated
> > > + * naively, corresponding to iterating a contiguous PFN range.
> > > + *
> > > + * This function should primarily only be used for debug checks, or when
> > > + * working with page ranges that are not naturally contiguous (e.g., pages
> > > + * within a folio are).
> > > + *
> > > + * Returns true if contiguous, otherwise false.
> > > + */
> > > +bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> > > +{
> > > + const unsigned long start_pfn = page_to_pfn(page);
> > > + const unsigned long end_pfn = start_pfn + nr_pages;
> > > + unsigned long pfn;
> > > +
> > > + /*
> > > + * The memmap is allocated per memory section. We need to check
> > > + * each involved memory section once.
> > > + */
> > > + for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
> > > + pfn < end_pfn; pfn += PAGES_PER_SECTION)
> > > + if (unlikely(page + (pfn - start_pfn) != pfn_to_page(pfn)))
> > > + return false;
> >
> > I find this pretty confusing, my test for this is how many times I have to read
> > the code to understand what it's doing :)
> >
> > So we have something like:
> >
> > (pfn of page)
> > start_pfn pfn = align UP
> > | |
> > v v
> > | section |
> > <----------------->
> > pfn - start_pfn
> >
> > Then check page + (pfn - start_pfn) == pfn_to_page(pfn)
> >
> > And loop such that:
> >
> > (pfn of page)
> > start_pfn pfn
> > | |
> > v v
> > | section | section |
> > <------------------------------------------>
> > pfn - start_pfn
> >
> > Again check page + (pfn - start_pfn) == pfn_to_page(pfn)
> >
> > And so on.
> >
> > So the logic looks good, but it's just... that took me a hot second to
> > parse :)
> >
> > I think a few simple fixups
> >
> > bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> > {
> > const unsigned long start_pfn = page_to_pfn(page);
> > const unsigned long end_pfn = start_pfn + nr_pages;
> > /* The PFN of the start of the next section. */
> > unsigned long pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
> > /* The page we'd expected to see if the range were contiguous. */
> > struct page *expected = page + (pfn - start_pfn);
> >
> > /*
> > * The memmap is allocated per memory section. We need to check
> > * each involved memory section once.
> > */
> > for (; pfn < end_pfn; pfn += PAGES_PER_SECTION, expected += PAGES_PER_SECTION)
> > if (unlikely(expected != pfn_to_page(pfn)))
> > return false;
> > return true;
> > }
> >
>
> Hm, I prefer my variant, especially where the pfn is calculated in the for loop. Likely a
> matter of personal taste.
Sure this is always a factor in code :)
>
> But I can see why skipping the first section might be a surprise when not
> having the semantics of ALIGN() in the cache.
Yup!
>
> So I'll add the following on top:
>
> diff --git a/mm/util.c b/mm/util.c
> index 0bf349b19b652..fbdb73aaf35fe 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1303,8 +1303,10 @@ bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> unsigned long pfn;
> /*
> - * The memmap is allocated per memory section. We need to check
> - * each involved memory section once.
> + * The memmap is allocated per memory section, so no need to check
> + * within the first section. However, we need to check each other
> + * spanned memory section once, making sure the first page in a
> + * section could similarly be reached by just iterating pages.
> */
> for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
> pfn < end_pfn; pfn += PAGES_PER_SECTION)
Cool this helps clarify things, that'll do fine!
>
> Thanks!
>
> --
> Cheers
>
> David / dhildenb
>
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 22/36] dma-remap: drop nth_page() in dma_common_contiguous_remap()
[not found] <20250827220141.262669-1-david@redhat.com>
` (15 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 21/36] mm/cma: refuse handing out non-contiguous page ranges David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 17:29 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 23/36] scatterlist: disallow non-contigous page ranges in a single SG entry David Hildenbrand
` (8 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Marek Szyprowski, Robin Murphy,
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
dma_common_contiguous_remap() is used to remap an "allocated contiguous
region". Within a single allocation, there is no need to use nth_page()
anymore.
Neither the buddy, nor hugetlb, nor CMA will hand out problematic page
ranges.
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
kernel/dma/remap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index 9e2afad1c6152..b7c1c0c92d0c8 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -49,7 +49,7 @@ void *dma_common_contiguous_remap(struct page *page, size_t size,
if (!pages)
return NULL;
for (i = 0; i < count; i++)
- pages[i] = nth_page(page, i);
+ pages[i] = page++;
vaddr = vmap(pages, count, VM_DMA_COHERENT, prot);
kvfree(pages);
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 22/36] dma-remap: drop nth_page() in dma_common_contiguous_remap()
2025-08-27 22:01 ` [PATCH v1 22/36] dma-remap: drop nth_page() in dma_common_contiguous_remap() David Hildenbrand
@ 2025-08-28 17:29 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 17:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Marek Szyprowski, Robin Murphy, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:26AM +0200, David Hildenbrand wrote:
> dma_common_contiguous_remap() is used to remap an "allocated contiguous
> region". Within a single allocation, there is no need to use nth_page()
> anymore.
>
> Neither the buddy, nor hugetlb, nor CMA will hand out problematic page
> ranges.
>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Nice!
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> kernel/dma/remap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
> index 9e2afad1c6152..b7c1c0c92d0c8 100644
> --- a/kernel/dma/remap.c
> +++ b/kernel/dma/remap.c
> @@ -49,7 +49,7 @@ void *dma_common_contiguous_remap(struct page *page, size_t size,
> if (!pages)
> return NULL;
> for (i = 0; i < count; i++)
> - pages[i] = nth_page(page, i);
> + pages[i] = page++;
> vaddr = vmap(pages, count, VM_DMA_COHERENT, prot);
> kvfree(pages);
>
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 23/36] scatterlist: disallow non-contigous page ranges in a single SG entry
[not found] <20250827220141.262669-1-david@redhat.com>
` (16 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 22/36] dma-remap: drop nth_page() in dma_common_contiguous_remap() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 17:53 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 33/36] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock() David Hildenbrand
` (7 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Marek Szyprowski, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
The expectation is that there is currently no user that would pass in
non-contigous page ranges: no allocator, not even VMA, will hand these
out.
The only problematic part would be if someone would provide a range
obtained directly from memblock, or manually merge problematic ranges.
If we find such cases, we should fix them to create separate
SG entries.
Let's check in sg_set_page() that this is really the case. No need to
check in sg_set_folio(), as pages in a folio are guaranteed to be
contiguous. As sg_set_page() gets inlined into modules, we have to
export the page_range_contiguous() helper -- use EXPORT_SYMBOL, there is
nothing special about this helper such that we would want to enforce
GPL-only modules.
We can now drop the nth_page() usage in sg_page_iter_page().
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/scatterlist.h | 3 ++-
mm/util.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6f8a4965f9b98..29f6ceb98d74b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -158,6 +158,7 @@ static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
static inline void sg_set_page(struct scatterlist *sg, struct page *page,
unsigned int len, unsigned int offset)
{
+ VM_WARN_ON_ONCE(!page_range_contiguous(page, ALIGN(len + offset, PAGE_SIZE) / PAGE_SIZE));
sg_assign_page(sg, page);
sg->offset = offset;
sg->length = len;
@@ -600,7 +601,7 @@ void __sg_page_iter_start(struct sg_page_iter *piter,
*/
static inline struct page *sg_page_iter_page(struct sg_page_iter *piter)
{
- return nth_page(sg_page(piter->sg), piter->sg_pgoffset);
+ return sg_page(piter->sg) + piter->sg_pgoffset;
}
/**
diff --git a/mm/util.c b/mm/util.c
index 0bf349b19b652..e8b9da6b13230 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1312,5 +1312,6 @@ bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
return false;
return true;
}
+EXPORT_SYMBOL(page_range_contiguous);
#endif
#endif /* CONFIG_MMU */
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 23/36] scatterlist: disallow non-contigous page ranges in a single SG entry
2025-08-27 22:01 ` [PATCH v1 23/36] scatterlist: disallow non-contigous page ranges in a single SG entry David Hildenbrand
@ 2025-08-28 17:53 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 17:53 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Marek Szyprowski, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:27AM +0200, David Hildenbrand wrote:
> The expectation is that there is currently no user that would pass in
> non-contigous page ranges: no allocator, not even VMA, will hand these
> out.
>
> The only problematic part would be if someone would provide a range
> obtained directly from memblock, or manually merge problematic ranges.
> If we find such cases, we should fix them to create separate
> SG entries.
>
> Let's check in sg_set_page() that this is really the case. No need to
> check in sg_set_folio(), as pages in a folio are guaranteed to be
> contiguous. As sg_set_page() gets inlined into modules, we have to
> export the page_range_contiguous() helper -- use EXPORT_SYMBOL, there is
> nothing special about this helper such that we would want to enforce
> GPL-only modules.
Ah you mention this here (I wrote end of this first :)
>
> We can now drop the nth_page() usage in sg_page_iter_page().
>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
All LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/scatterlist.h | 3 ++-
> mm/util.c | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 6f8a4965f9b98..29f6ceb98d74b 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -158,6 +158,7 @@ static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
> static inline void sg_set_page(struct scatterlist *sg, struct page *page,
> unsigned int len, unsigned int offset)
> {
> + VM_WARN_ON_ONCE(!page_range_contiguous(page, ALIGN(len + offset, PAGE_SIZE) / PAGE_SIZE));
This is pretty horrible as one statement, but I guess we can't really do better,
I had a quick look around for some helper that could work but nothing is clearly
suitable.
So this should be fine.
> sg_assign_page(sg, page);
> sg->offset = offset;
> sg->length = len;
> @@ -600,7 +601,7 @@ void __sg_page_iter_start(struct sg_page_iter *piter,
> */
> static inline struct page *sg_page_iter_page(struct sg_page_iter *piter)
> {
> - return nth_page(sg_page(piter->sg), piter->sg_pgoffset);
> + return sg_page(piter->sg) + piter->sg_pgoffset;
> }
>
> /**
> diff --git a/mm/util.c b/mm/util.c
> index 0bf349b19b652..e8b9da6b13230 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1312,5 +1312,6 @@ bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> return false;
> return true;
> }
> +EXPORT_SYMBOL(page_range_contiguous);
Kinda sad that we're doing this as EXPORT_SYMBOL() rather than
EXPORT_SYMBOL_GPL() :( but I guess necessary to stay consistent...
> #endif
> #endif /* CONFIG_MMU */
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 33/36] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()
[not found] <20250827220141.262669-1-david@redhat.com>
` (17 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 23/36] scatterlist: disallow non-contigous page ranges in a single SG entry David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 18:09 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 34/36] kfence: drop nth_page() usage David Hildenbrand
` (6 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
There is the concern that unpin_user_page_range_dirty_lock() might do
some weird merging of PFN ranges -- either now or in the future -- such
that PFN range is contiguous but the page range might not be.
Let's sanity-check for that and drop the nth_page() usage.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/gup.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index 89ca0813791ab..c24f6009a7a44 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -237,7 +237,7 @@ void folio_add_pin(struct folio *folio)
static inline struct folio *gup_folio_range_next(struct page *start,
unsigned long npages, unsigned long i, unsigned int *ntails)
{
- struct page *next = nth_page(start, i);
+ struct page *next = start + i;
struct folio *folio = page_folio(next);
unsigned int nr = 1;
@@ -342,6 +342,9 @@ EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
* "gup-pinned page range" refers to a range of pages that has had one of the
* pin_user_pages() variants called on that page.
*
+ * The page range must be truly contiguous: the page range corresponds
+ * to a contiguous PFN range and all pages can be iterated naturally.
+ *
* For the page ranges defined by [page .. page+npages], make that range (or
* its head pages, if a compound page) dirty, if @make_dirty is true, and if the
* page range was previously listed as clean.
@@ -359,6 +362,8 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
struct folio *folio;
unsigned int nr;
+ VM_WARN_ON_ONCE(!page_range_contiguous(page, npages));
+
for (i = 0; i < npages; i += nr) {
folio = gup_folio_range_next(page, npages, i, &nr);
if (make_dirty && !folio_test_dirty(folio)) {
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 33/36] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()
2025-08-27 22:01 ` [PATCH v1 33/36] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock() David Hildenbrand
@ 2025-08-28 18:09 ` Lorenzo Stoakes
2025-08-29 14:41 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 18:09 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:37AM +0200, David Hildenbrand wrote:
> There is the concern that unpin_user_page_range_dirty_lock() might do
> some weird merging of PFN ranges -- either now or in the future -- such
> that PFN range is contiguous but the page range might not be.
>
> Let's sanity-check for that and drop the nth_page() usage.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Seems one user uses SG and the other is IOMMU and in each instance you'd
expect physical contiguity (maybe Jason G. or somebody else more familiar
with these uses can also chime in).
Anyway, on that basis, LGTM (though 1 small nit below), so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/gup.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 89ca0813791ab..c24f6009a7a44 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -237,7 +237,7 @@ void folio_add_pin(struct folio *folio)
> static inline struct folio *gup_folio_range_next(struct page *start,
> unsigned long npages, unsigned long i, unsigned int *ntails)
> {
> - struct page *next = nth_page(start, i);
> + struct page *next = start + i;
> struct folio *folio = page_folio(next);
> unsigned int nr = 1;
>
> @@ -342,6 +342,9 @@ EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
> * "gup-pinned page range" refers to a range of pages that has had one of the
> * pin_user_pages() variants called on that page.
> *
> + * The page range must be truly contiguous: the page range corresponds
NIT: maybe 'physically contiguous'?
> + * to a contiguous PFN range and all pages can be iterated naturally.
> + *
> * For the page ranges defined by [page .. page+npages], make that range (or
> * its head pages, if a compound page) dirty, if @make_dirty is true, and if the
> * page range was previously listed as clean.
> @@ -359,6 +362,8 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> struct folio *folio;
> unsigned int nr;
>
> + VM_WARN_ON_ONCE(!page_range_contiguous(page, npages));
> +
> for (i = 0; i < npages; i += nr) {
> folio = gup_folio_range_next(page, npages, i, &nr);
> if (make_dirty && !folio_test_dirty(folio)) {
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 33/36] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()
2025-08-28 18:09 ` Lorenzo Stoakes
@ 2025-08-29 14:41 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 14:41 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On 28.08.25 20:09, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:37AM +0200, David Hildenbrand wrote:
>> There is the concern that unpin_user_page_range_dirty_lock() might do
>> some weird merging of PFN ranges -- either now or in the future -- such
>> that PFN range is contiguous but the page range might not be.
>>
>> Let's sanity-check for that and drop the nth_page() usage.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Seems one user uses SG and the other is IOMMU and in each instance you'd
> expect physical contiguity (maybe Jason G. or somebody else more familiar
> with these uses can also chime in).
Right, and I added the sanity-check so we can identify and fix any such
wrong merging of ranges.
Thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 34/36] kfence: drop nth_page() usage
[not found] <20250827220141.262669-1-david@redhat.com>
` (18 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 33/36] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 8:43 ` Marco Elver
2025-08-28 18:19 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 35/36] block: update comment of "struct bio_vec" regarding nth_page() David Hildenbrand
` (5 subsequent siblings)
25 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Marco Elver,
Dmitry Vyukov, Andrew Morton, Brendan Jackman, Christoph Lameter,
Dennis Zhou, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
We want to get rid of nth_page(), and kfence init code is the last user.
Unfortunately, we might actually walk a PFN range where the pages are
not contiguous, because we might be allocating an area from memblock
that could span memory sections in problematic kernel configs (SPARSEMEM
without SPARSEMEM_VMEMMAP).
We could check whether the page range is contiguous
using page_range_contiguous() and failing kfence init, or making kfence
incompatible these problemtic kernel configs.
Let's keep it simple and simply use pfn_to_page() by iterating PFNs.
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/kfence/core.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 0ed3be100963a..727c20c94ac59 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -594,15 +594,14 @@ static void rcu_guarded_free(struct rcu_head *h)
*/
static unsigned long kfence_init_pool(void)
{
- unsigned long addr;
- struct page *pages;
+ unsigned long addr, start_pfn;
int i;
if (!arch_kfence_init_pool())
return (unsigned long)__kfence_pool;
addr = (unsigned long)__kfence_pool;
- pages = virt_to_page(__kfence_pool);
+ start_pfn = PHYS_PFN(virt_to_phys(__kfence_pool));
/*
* Set up object pages: they must have PGTY_slab set to avoid freeing
@@ -613,11 +612,12 @@ static unsigned long kfence_init_pool(void)
* enters __slab_free() slow-path.
*/
for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
- struct slab *slab = page_slab(nth_page(pages, i));
+ struct slab *slab;
if (!i || (i % 2))
continue;
+ slab = page_slab(pfn_to_page(start_pfn + i));
__folio_set_slab(slab_folio(slab));
#ifdef CONFIG_MEMCG
slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts |
@@ -665,10 +665,12 @@ static unsigned long kfence_init_pool(void)
reset_slab:
for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
- struct slab *slab = page_slab(nth_page(pages, i));
+ struct slab *slab;
if (!i || (i % 2))
continue;
+
+ slab = page_slab(pfn_to_page(start_pfn + i));
#ifdef CONFIG_MEMCG
slab->obj_exts = 0;
#endif
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 34/36] kfence: drop nth_page() usage
2025-08-27 22:01 ` [PATCH v1 34/36] kfence: drop nth_page() usage David Hildenbrand
@ 2025-08-28 8:43 ` Marco Elver
2025-08-28 18:19 ` Lorenzo Stoakes
1 sibling, 0 replies; 108+ messages in thread
From: Marco Elver @ 2025-08-28 8:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Dmitry Vyukov, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, 28 Aug 2025 at 00:11, 'David Hildenbrand' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> We want to get rid of nth_page(), and kfence init code is the last user.
>
> Unfortunately, we might actually walk a PFN range where the pages are
> not contiguous, because we might be allocating an area from memblock
> that could span memory sections in problematic kernel configs (SPARSEMEM
> without SPARSEMEM_VMEMMAP).
>
> We could check whether the page range is contiguous
> using page_range_contiguous() and failing kfence init, or making kfence
> incompatible these problemtic kernel configs.
>
> Let's keep it simple and simply use pfn_to_page() by iterating PFNs.
>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Marco Elver <elver@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Marco Elver <elver@google.com>
Thanks.
> ---
> mm/kfence/core.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/kfence/core.c b/mm/kfence/core.c
> index 0ed3be100963a..727c20c94ac59 100644
> --- a/mm/kfence/core.c
> +++ b/mm/kfence/core.c
> @@ -594,15 +594,14 @@ static void rcu_guarded_free(struct rcu_head *h)
> */
> static unsigned long kfence_init_pool(void)
> {
> - unsigned long addr;
> - struct page *pages;
> + unsigned long addr, start_pfn;
> int i;
>
> if (!arch_kfence_init_pool())
> return (unsigned long)__kfence_pool;
>
> addr = (unsigned long)__kfence_pool;
> - pages = virt_to_page(__kfence_pool);
> + start_pfn = PHYS_PFN(virt_to_phys(__kfence_pool));
>
> /*
> * Set up object pages: they must have PGTY_slab set to avoid freeing
> @@ -613,11 +612,12 @@ static unsigned long kfence_init_pool(void)
> * enters __slab_free() slow-path.
> */
> for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
> - struct slab *slab = page_slab(nth_page(pages, i));
> + struct slab *slab;
>
> if (!i || (i % 2))
> continue;
>
> + slab = page_slab(pfn_to_page(start_pfn + i));
> __folio_set_slab(slab_folio(slab));
> #ifdef CONFIG_MEMCG
> slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts |
> @@ -665,10 +665,12 @@ static unsigned long kfence_init_pool(void)
>
> reset_slab:
> for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
> - struct slab *slab = page_slab(nth_page(pages, i));
> + struct slab *slab;
>
> if (!i || (i % 2))
> continue;
> +
> + slab = page_slab(pfn_to_page(start_pfn + i));
> #ifdef CONFIG_MEMCG
> slab->obj_exts = 0;
> #endif
> --
> 2.50.1
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/kasan-dev/20250827220141.262669-35-david%40redhat.com.
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 34/36] kfence: drop nth_page() usage
2025-08-27 22:01 ` [PATCH v1 34/36] kfence: drop nth_page() usage David Hildenbrand
2025-08-28 8:43 ` Marco Elver
@ 2025-08-28 18:19 ` Lorenzo Stoakes
1 sibling, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 18:19 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Marco Elver, Dmitry Vyukov,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marek Szyprowski, Michal Hocko, Mike Rapoport,
Muchun Song, netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:38AM +0200, David Hildenbrand wrote:
> We want to get rid of nth_page(), and kfence init code is the last user.
>
> Unfortunately, we might actually walk a PFN range where the pages are
> not contiguous, because we might be allocating an area from memblock
> that could span memory sections in problematic kernel configs (SPARSEMEM
> without SPARSEMEM_VMEMMAP).
Sad.
>
> We could check whether the page range is contiguous
> using page_range_contiguous() and failing kfence init, or making kfence
> incompatible these problemtic kernel configs.
Sounds iffy though.
>
> Let's keep it simple and simply use pfn_to_page() by iterating PFNs.
Yes.
>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Marco Elver <elver@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Stared at this and can't see anything wrong, so - LGTM and:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/kfence/core.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/kfence/core.c b/mm/kfence/core.c
> index 0ed3be100963a..727c20c94ac59 100644
> --- a/mm/kfence/core.c
> +++ b/mm/kfence/core.c
> @@ -594,15 +594,14 @@ static void rcu_guarded_free(struct rcu_head *h)
> */
> static unsigned long kfence_init_pool(void)
> {
> - unsigned long addr;
> - struct page *pages;
> + unsigned long addr, start_pfn;
> int i;
>
> if (!arch_kfence_init_pool())
> return (unsigned long)__kfence_pool;
>
> addr = (unsigned long)__kfence_pool;
> - pages = virt_to_page(__kfence_pool);
> + start_pfn = PHYS_PFN(virt_to_phys(__kfence_pool));
>
> /*
> * Set up object pages: they must have PGTY_slab set to avoid freeing
> @@ -613,11 +612,12 @@ static unsigned long kfence_init_pool(void)
> * enters __slab_free() slow-path.
> */
> for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
> - struct slab *slab = page_slab(nth_page(pages, i));
> + struct slab *slab;
>
> if (!i || (i % 2))
> continue;
>
> + slab = page_slab(pfn_to_page(start_pfn + i));
> __folio_set_slab(slab_folio(slab));
> #ifdef CONFIG_MEMCG
> slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts |
> @@ -665,10 +665,12 @@ static unsigned long kfence_init_pool(void)
>
> reset_slab:
> for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
> - struct slab *slab = page_slab(nth_page(pages, i));
> + struct slab *slab;
>
> if (!i || (i % 2))
> continue;
> +
> + slab = page_slab(pfn_to_page(start_pfn + i));
> #ifdef CONFIG_MEMCG
> slab->obj_exts = 0;
> #endif
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 35/36] block: update comment of "struct bio_vec" regarding nth_page()
[not found] <20250827220141.262669-1-david@redhat.com>
` (19 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 34/36] kfence: drop nth_page() usage David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 18:19 ` Lorenzo Stoakes
2025-08-27 22:01 ` [PATCH v1 36/36] mm: remove nth_page() David Hildenbrand
` (4 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Ever since commit 858c708d9efb ("block: move the bi_size update out of
__bio_try_merge_page"), page_is_mergeable() no longer exists, and the
logic in bvec_try_merge_page() is now a simple page pointer
comparison.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/bvec.h | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 0a80e1f9aa201..3fc0efa0825b1 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,11 +22,8 @@ struct page;
* @bv_len: Number of bytes in the address range.
* @bv_offset: Start of the address range relative to the start of @bv_page.
*
- * The following holds for a bvec if n * PAGE_SIZE < bv_offset + bv_len:
- *
- * nth_page(@bv_page, n) == @bv_page + n
- *
- * This holds because page_is_mergeable() checks the above property.
+ * All pages within a bio_vec starting from @bv_page are contiguous and
+ * can simply be iterated (see bvec_advance()).
*/
struct bio_vec {
struct page *bv_page;
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 35/36] block: update comment of "struct bio_vec" regarding nth_page()
2025-08-27 22:01 ` [PATCH v1 35/36] block: update comment of "struct bio_vec" regarding nth_page() David Hildenbrand
@ 2025-08-28 18:19 ` Lorenzo Stoakes
0 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 18:19 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:39AM +0200, David Hildenbrand wrote:
> Ever since commit 858c708d9efb ("block: move the bi_size update out of
> __bio_try_merge_page"), page_is_mergeable() no longer exists, and the
> logic in bvec_try_merge_page() is now a simple page pointer
> comparison.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Nice! :)
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/bvec.h | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index 0a80e1f9aa201..3fc0efa0825b1 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -22,11 +22,8 @@ struct page;
> * @bv_len: Number of bytes in the address range.
> * @bv_offset: Start of the address range relative to the start of @bv_page.
> *
> - * The following holds for a bvec if n * PAGE_SIZE < bv_offset + bv_len:
> - *
> - * nth_page(@bv_page, n) == @bv_page + n
> - *
> - * This holds because page_is_mergeable() checks the above property.
> + * All pages within a bio_vec starting from @bv_page are contiguous and
> + * can simply be iterated (see bvec_advance()).
> */
> struct bio_vec {
> struct page *bv_page;
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* [PATCH v1 36/36] mm: remove nth_page()
[not found] <20250827220141.262669-1-david@redhat.com>
` (20 preceding siblings ...)
2025-08-27 22:01 ` [PATCH v1 35/36] block: update comment of "struct bio_vec" regarding nth_page() David Hildenbrand
@ 2025-08-27 22:01 ` David Hildenbrand
2025-08-28 18:25 ` Lorenzo Stoakes
[not found] ` <20250827220141.262669-25-david@redhat.com>
` (3 subsequent siblings)
25 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2025-08-27 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
Now that all users are gone, let's remove it.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 2 --
tools/testing/scatterlist/linux/mm.h | 1 -
2 files changed, 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ca1eb2db63ec..b26ca8b2162d9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -210,9 +210,7 @@ extern unsigned long sysctl_admin_reserve_kbytes;
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
-#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
#else
-#define nth_page(page,n) ((page) + (n))
static inline bool page_range_contiguous(const struct page *page,
unsigned long nr_pages)
{
diff --git a/tools/testing/scatterlist/linux/mm.h b/tools/testing/scatterlist/linux/mm.h
index 5bd9e6e806254..121ae78d6e885 100644
--- a/tools/testing/scatterlist/linux/mm.h
+++ b/tools/testing/scatterlist/linux/mm.h
@@ -51,7 +51,6 @@ static inline unsigned long page_to_phys(struct page *page)
#define page_to_pfn(page) ((unsigned long)(page) / PAGE_SIZE)
#define pfn_to_page(pfn) (void *)((pfn) * PAGE_SIZE)
-#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
#define __min(t1, t2, min1, min2, x, y) ({ \
t1 min1 = (x); \
--
2.50.1
^ permalink raw reply related [flat|nested] 108+ messages in thread
* Re: [PATCH v1 36/36] mm: remove nth_page()
2025-08-27 22:01 ` [PATCH v1 36/36] mm: remove nth_page() David Hildenbrand
@ 2025-08-28 18:25 ` Lorenzo Stoakes
2025-08-29 14:42 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 18:25 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:40AM +0200, David Hildenbrand wrote:
> Now that all users are gone, let's remove it.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
HAPPY DAYYS!!!!
Happy to have reached this bit, great work! :)
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mm.h | 2 --
> tools/testing/scatterlist/linux/mm.h | 1 -
> 2 files changed, 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2ca1eb2db63ec..b26ca8b2162d9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -210,9 +210,7 @@ extern unsigned long sysctl_admin_reserve_kbytes;
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
> -#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> #else
> -#define nth_page(page,n) ((page) + (n))
> static inline bool page_range_contiguous(const struct page *page,
> unsigned long nr_pages)
> {
> diff --git a/tools/testing/scatterlist/linux/mm.h b/tools/testing/scatterlist/linux/mm.h
> index 5bd9e6e806254..121ae78d6e885 100644
> --- a/tools/testing/scatterlist/linux/mm.h
> +++ b/tools/testing/scatterlist/linux/mm.h
> @@ -51,7 +51,6 @@ static inline unsigned long page_to_phys(struct page *page)
>
> #define page_to_pfn(page) ((unsigned long)(page) / PAGE_SIZE)
> #define pfn_to_page(pfn) (void *)((pfn) * PAGE_SIZE)
> -#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
>
> #define __min(t1, t2, min1, min2, x, y) ({ \
> t1 min1 = (x); \
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 36/36] mm: remove nth_page()
2025-08-28 18:25 ` Lorenzo Stoakes
@ 2025-08-29 14:42 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 14:42 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-kernel, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Mike Rapoport, Muchun Song,
netdev, Oscar Salvador, Peter Xu, Robin Murphy,
Suren Baghdasaryan, Tejun Heo, virtualization, Vlastimil Babka,
wireguard, x86, Zi Yan
On 28.08.25 20:25, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:40AM +0200, David Hildenbrand wrote:
>> Now that all users are gone, let's remove it.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> HAPPY DAYYS!!!!
>
> Happy to have reached this bit, great work! :)
I was just as happy when I made it to the end of this series :)
Thanks for all the review!!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
[parent not found: <20250827220141.262669-25-david@redhat.com>]
* Re: [PATCH v1 24/36] ata: libata-eh: drop nth_page() usage within SG entry
[not found] ` <20250827220141.262669-25-david@redhat.com>
@ 2025-08-28 4:24 ` Damien Le Moal
2025-08-28 17:53 ` Lorenzo Stoakes
1 sibling, 0 replies; 108+ messages in thread
From: Damien Le Moal @ 2025-08-28 4:24 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel
Cc: Niklas Cassel, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
On 8/28/25 7:01 AM, David Hildenbrand wrote:
> It's no longer required to use nth_page() when iterating pages within a
> single SG entry, so let's drop the nth_page() usage.
>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Niklas Cassel <cassel@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 24/36] ata: libata-eh: drop nth_page() usage within SG entry
[not found] ` <20250827220141.262669-25-david@redhat.com>
2025-08-28 4:24 ` [PATCH v1 24/36] ata: libata-eh: drop nth_page() usage within SG entry Damien Le Moal
@ 2025-08-28 17:53 ` Lorenzo Stoakes
2025-08-29 0:22 ` Damien Le Moal
1 sibling, 1 reply; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 17:53 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Damien Le Moal, Niklas Cassel, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:28AM +0200, David Hildenbrand wrote:
> It's no longer required to use nth_page() when iterating pages within a
> single SG entry, so let's drop the nth_page() usage.
>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Niklas Cassel <cassel@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> drivers/ata/libata-sff.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 7fc407255eb46..1e2a2c33cdc80 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -614,7 +614,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
> offset = qc->cursg->offset + qc->cursg_ofs;
>
> /* get the current page and offset */
> - page = nth_page(page, (offset >> PAGE_SHIFT));
> + page += offset >> PAGE_SHIFT;
> offset %= PAGE_SIZE;
>
> /* don't overrun current sg */
> @@ -631,7 +631,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
> unsigned int split_len = PAGE_SIZE - offset;
>
> ata_pio_xfer(qc, page, offset, split_len);
> - ata_pio_xfer(qc, nth_page(page, 1), 0, count - split_len);
> + ata_pio_xfer(qc, page + 1, 0, count - split_len);
> } else {
> ata_pio_xfer(qc, page, offset, count);
> }
> @@ -751,7 +751,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
> offset = sg->offset + qc->cursg_ofs;
>
> /* get the current page and offset */
> - page = nth_page(page, (offset >> PAGE_SHIFT));
> + page += offset >> PAGE_SHIFT;
> offset %= PAGE_SIZE;
>
> /* don't overrun current sg */
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 24/36] ata: libata-eh: drop nth_page() usage within SG entry
2025-08-28 17:53 ` Lorenzo Stoakes
@ 2025-08-29 0:22 ` Damien Le Moal
2025-08-29 14:37 ` David Hildenbrand
0 siblings, 1 reply; 108+ messages in thread
From: Damien Le Moal @ 2025-08-29 0:22 UTC (permalink / raw)
To: Lorenzo Stoakes, David Hildenbrand
Cc: linux-kernel, Niklas Cassel, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On 8/29/25 2:53 AM, Lorenzo Stoakes wrote:
> On Thu, Aug 28, 2025 at 12:01:28AM +0200, David Hildenbrand wrote:
>> It's no longer required to use nth_page() when iterating pages within a
>> single SG entry, so let's drop the nth_page() usage.
>>
>> Cc: Damien Le Moal <dlemoal@kernel.org>
>> Cc: Niklas Cassel <cassel@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> LGTM, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Just noticed this:
s/libata-eh/libata-sff
in the commit title please.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 24/36] ata: libata-eh: drop nth_page() usage within SG entry
2025-08-29 0:22 ` Damien Le Moal
@ 2025-08-29 14:37 ` David Hildenbrand
0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2025-08-29 14:37 UTC (permalink / raw)
To: Damien Le Moal, Lorenzo Stoakes
Cc: linux-kernel, Niklas Cassel, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On 29.08.25 02:22, Damien Le Moal wrote:
> On 8/29/25 2:53 AM, Lorenzo Stoakes wrote:
>> On Thu, Aug 28, 2025 at 12:01:28AM +0200, David Hildenbrand wrote:
>>> It's no longer required to use nth_page() when iterating pages within a
>>> single SG entry, so let's drop the nth_page() usage.
>>>
>>> Cc: Damien Le Moal <dlemoal@kernel.org>
>>> Cc: Niklas Cassel <cassel@kernel.org>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> LGTM, so:
>>
>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Just noticed this:
>
> s/libata-eh/libata-sff
>
> in the commit title please.
>
Sure, I think some quick git-log search mislead me.
Thanks!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 108+ messages in thread
[parent not found: <20250827220141.262669-3-david@redhat.com>]
* Re: [PATCH v1 02/36] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
[not found] ` <20250827220141.262669-3-david@redhat.com>
@ 2025-08-28 10:43 ` Catalin Marinas
2025-08-28 14:12 ` Lorenzo Stoakes
2025-08-29 0:27 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Catalin Marinas @ 2025-08-28 10:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Will Deacon,
Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On Thu, Aug 28, 2025 at 12:01:06AM +0200, David Hildenbrand wrote:
> Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE
> is selected.
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 02/36] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
[not found] ` <20250827220141.262669-3-david@redhat.com>
2025-08-28 10:43 ` [PATCH v1 02/36] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP" Catalin Marinas
@ 2025-08-28 14:12 ` Lorenzo Stoakes
2025-08-29 0:27 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 14:12 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Catalin Marinas,
Will Deacon, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On Thu, Aug 28, 2025 at 12:01:06AM +0200, David Hildenbrand wrote:
> Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE
> is selected.
Do you plan to do this for other cases then I guess? Or was this an
outlier? I guess I will see :)
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> arch/arm64/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a64..b1d1f2ff2493b 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1570,7 +1570,6 @@ source "kernel/Kconfig.hz"
> config ARCH_SPARSEMEM_ENABLE
> def_bool y
> select SPARSEMEM_VMEMMAP_ENABLE
> - select SPARSEMEM_VMEMMAP
>
> config HW_PERF_EVENTS
> def_bool y
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 02/36] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
[not found] ` <20250827220141.262669-3-david@redhat.com>
2025-08-28 10:43 ` [PATCH v1 02/36] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP" Catalin Marinas
2025-08-28 14:12 ` Lorenzo Stoakes
@ 2025-08-29 0:27 ` Liam R. Howlett
2 siblings, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 0:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Catalin Marinas,
Will Deacon, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:03]:
> Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE
> is selected.
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> arch/arm64/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a64..b1d1f2ff2493b 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1570,7 +1570,6 @@ source "kernel/Kconfig.hz"
> config ARCH_SPARSEMEM_ENABLE
> def_bool y
> select SPARSEMEM_VMEMMAP_ENABLE
> - select SPARSEMEM_VMEMMAP
>
> config HW_PERF_EVENTS
> def_bool y
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
[parent not found: <20250827220141.262669-33-david@redhat.com>]
* Re: [PATCH v1 32/36] crypto: remove nth_page() usage within SG entry
[not found] ` <20250827220141.262669-33-david@redhat.com>
@ 2025-08-28 18:02 ` Lorenzo Stoakes
2025-08-30 8:50 ` Herbert Xu
1 sibling, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 18:02 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Herbert Xu, David S. Miller, Alexander Potapenko,
Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
kasan-dev, kvm, Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:36AM +0200, David Hildenbrand wrote:
> It's no longer required to use nth_page() when iterating pages within a
> single SG entry, so let's drop the nth_page() usage.
>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> crypto/ahash.c | 4 ++--
> crypto/scompress.c | 8 ++++----
> include/crypto/scatterwalk.h | 4 ++--
> 3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/crypto/ahash.c b/crypto/ahash.c
> index a227793d2c5b5..dfb4f5476428f 100644
> --- a/crypto/ahash.c
> +++ b/crypto/ahash.c
> @@ -88,7 +88,7 @@ static int hash_walk_new_entry(struct crypto_hash_walk *walk)
>
> sg = walk->sg;
> walk->offset = sg->offset;
> - walk->pg = nth_page(sg_page(walk->sg), (walk->offset >> PAGE_SHIFT));
> + walk->pg = sg_page(walk->sg) + (walk->offset >> PAGE_SHIFT);
> walk->offset = offset_in_page(walk->offset);
> walk->entrylen = sg->length;
>
> @@ -226,7 +226,7 @@ int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc)
> if (!IS_ENABLED(CONFIG_HIGHMEM))
> return crypto_shash_digest(desc, data, nbytes, req->result);
>
> - page = nth_page(page, offset >> PAGE_SHIFT);
> + page += offset >> PAGE_SHIFT;
> offset = offset_in_page(offset);
>
> if (nbytes > (unsigned int)PAGE_SIZE - offset)
> diff --git a/crypto/scompress.c b/crypto/scompress.c
> index c651e7f2197a9..1a7ed8ae65b07 100644
> --- a/crypto/scompress.c
> +++ b/crypto/scompress.c
> @@ -198,7 +198,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> } else
> return -ENOSYS;
>
> - dpage = nth_page(dpage, doff / PAGE_SIZE);
> + dpage += doff / PAGE_SIZE;
> doff = offset_in_page(doff);
>
> n = (dlen - 1) / PAGE_SIZE;
> @@ -220,12 +220,12 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> } else
> break;
>
> - spage = nth_page(spage, soff / PAGE_SIZE);
> + spage = spage + soff / PAGE_SIZE;
> soff = offset_in_page(soff);
>
> n = (slen - 1) / PAGE_SIZE;
> n += (offset_in_page(slen - 1) + soff) / PAGE_SIZE;
> - if (PageHighMem(nth_page(spage, n)) &&
> + if (PageHighMem(spage + n) &&
> size_add(soff, slen) > PAGE_SIZE)
> break;
> src = kmap_local_page(spage) + soff;
> @@ -270,7 +270,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> if (dlen <= PAGE_SIZE)
> break;
> dlen -= PAGE_SIZE;
> - dpage = nth_page(dpage, 1);
> + dpage++;
Can't help but chuckle when I see this simplification each time, really nice! :)
> }
> }
>
> diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
> index 15ab743f68c8f..83d14376ff2bc 100644
> --- a/include/crypto/scatterwalk.h
> +++ b/include/crypto/scatterwalk.h
> @@ -159,7 +159,7 @@ static inline void scatterwalk_map(struct scatter_walk *walk)
> if (IS_ENABLED(CONFIG_HIGHMEM)) {
> struct page *page;
>
> - page = nth_page(base_page, offset >> PAGE_SHIFT);
> + page = base_page + (offset >> PAGE_SHIFT);
> offset = offset_in_page(offset);
> addr = kmap_local_page(page) + offset;
> } else {
> @@ -259,7 +259,7 @@ static inline void scatterwalk_done_dst(struct scatter_walk *walk,
> end += (offset_in_page(offset) + offset_in_page(nbytes) +
> PAGE_SIZE - 1) >> PAGE_SHIFT;
> for (i = start; i < end; i++)
> - flush_dcache_page(nth_page(base_page, i));
> + flush_dcache_page(base_page + i);
> }
> scatterwalk_advance(walk, nbytes);
> }
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 32/36] crypto: remove nth_page() usage within SG entry
[not found] ` <20250827220141.262669-33-david@redhat.com>
2025-08-28 18:02 ` [PATCH v1 32/36] crypto: remove nth_page() usage within SG entry Lorenzo Stoakes
@ 2025-08-30 8:50 ` Herbert Xu
1 sibling, 0 replies; 108+ messages in thread
From: Herbert Xu @ 2025-08-30 8:50 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, David S. Miller, Alexander Potapenko, Andrew Morton,
Brendan Jackman, Christoph Lameter, Dennis Zhou, Dmitry Vyukov,
dri-devel, intel-gfx, iommu, io-uring, Jason Gunthorpe,
Jens Axboe, Johannes Weiner, John Hubbard, kasan-dev, kvm,
Liam R. Howlett, Linus Torvalds, linux-arm-kernel,
linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
linux-scsi, Lorenzo Stoakes, Marco Elver, Marek Szyprowski,
Michal Hocko, Mike Rapoport, Muchun Song, netdev, Oscar Salvador,
Peter Xu, Robin Murphy, Suren Baghdasaryan, Tejun Heo,
virtualization, Vlastimil Babka, wireguard, x86, Zi Yan
On Thu, Aug 28, 2025 at 12:01:36AM +0200, David Hildenbrand wrote:
> It's no longer required to use nth_page() when iterating pages within a
> single SG entry, so let's drop the nth_page() usage.
>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> crypto/ahash.c | 4 ++--
> crypto/scompress.c | 8 ++++----
> include/crypto/scatterwalk.h | 4 ++--
> 3 files changed, 8 insertions(+), 8 deletions(-)
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 108+ messages in thread
[parent not found: <20250827220141.262669-6-david@redhat.com>]
* Re: [PATCH v1 05/36] wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config
[not found] ` <20250827220141.262669-6-david@redhat.com>
@ 2025-08-28 14:26 ` Lorenzo Stoakes
2025-08-29 0:29 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Lorenzo Stoakes @ 2025-08-28 14:26 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Jason A. Donenfeld,
Shuah Khan, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Liam R. Howlett,
Linus Torvalds, linux-arm-kernel, linux-arm-kernel, linux-crypto,
linux-ide, linux-kselftest, linux-mips, linux-mmc, linux-mm,
linux-riscv, linux-s390, linux-scsi, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
On Thu, Aug 28, 2025 at 12:01:09AM +0200, David Hildenbrand wrote:
> It's no longer user-selectable (and the default was already "y"), so
> let's just drop it.
>
> It was never really relevant to the wireguard selftests either way.
>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> tools/testing/selftests/wireguard/qemu/kernel.config | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
> index 0a5381717e9f4..1149289f4b30f 100644
> --- a/tools/testing/selftests/wireguard/qemu/kernel.config
> +++ b/tools/testing/selftests/wireguard/qemu/kernel.config
> @@ -48,7 +48,6 @@ CONFIG_JUMP_LABEL=y
> CONFIG_FUTEX=y
> CONFIG_SHMEM=y
> CONFIG_SLUB=y
> -CONFIG_SPARSEMEM_VMEMMAP=y
> CONFIG_SMP=y
> CONFIG_SCHED_SMT=y
> CONFIG_SCHED_MC=y
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 108+ messages in thread
* Re: [PATCH v1 05/36] wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config
[not found] ` <20250827220141.262669-6-david@redhat.com>
2025-08-28 14:26 ` [PATCH v1 05/36] wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config Lorenzo Stoakes
@ 2025-08-29 0:29 ` Liam R. Howlett
1 sibling, 0 replies; 108+ messages in thread
From: Liam R. Howlett @ 2025-08-29 0:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, Mike Rapoport (Microsoft), Jason A. Donenfeld,
Shuah Khan, Alexander Potapenko, Andrew Morton, Brendan Jackman,
Christoph Lameter, Dennis Zhou, Dmitry Vyukov, dri-devel,
intel-gfx, iommu, io-uring, Jason Gunthorpe, Jens Axboe,
Johannes Weiner, John Hubbard, kasan-dev, kvm, Linus Torvalds,
linux-arm-kernel, linux-arm-kernel, linux-crypto, linux-ide,
linux-kselftest, linux-mips, linux-mmc, linux-mm, linux-riscv,
linux-s390, linux-scsi, Lorenzo Stoakes, Marco Elver,
Marek Szyprowski, Michal Hocko, Muchun Song, netdev,
Oscar Salvador, Peter Xu, Robin Murphy, Suren Baghdasaryan,
Tejun Heo, virtualization, Vlastimil Babka, wireguard, x86,
Zi Yan
* David Hildenbrand <david@redhat.com> [250827 18:04]:
> It's no longer user-selectable (and the default was already "y"), so
> let's just drop it.
>
> It was never really relevant to the wireguard selftests either way.
>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> tools/testing/selftests/wireguard/qemu/kernel.config | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
> index 0a5381717e9f4..1149289f4b30f 100644
> --- a/tools/testing/selftests/wireguard/qemu/kernel.config
> +++ b/tools/testing/selftests/wireguard/qemu/kernel.config
> @@ -48,7 +48,6 @@ CONFIG_JUMP_LABEL=y
> CONFIG_FUTEX=y
> CONFIG_SHMEM=y
> CONFIG_SLUB=y
> -CONFIG_SPARSEMEM_VMEMMAP=y
> CONFIG_SMP=y
> CONFIG_SCHED_SMT=y
> CONFIG_SCHED_MC=y
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 108+ messages in thread