Re: [PATCH v6 0/3] mm: Free contiguous order-0 pages efficiently

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Ryan Roberts <ryan.roberts@arm.com>
To: Muhammad Usama Anjum <usama.anjum@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Nick Terrell <terrelln@fb.com>, David Sterba <dsterba@suse.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, david.hildenbrand@arm.com
Subject: Re: [PATCH v6 0/3] mm: Free contiguous order-0 pages efficiently
Date: Wed, 22 Apr 2026 14:42:51 +0100	[thread overview]
Message-ID: <0a8ba718-7ff2-4876-9154-9dd5d404df3e@arm.com> (raw)
In-Reply-To: <20260401101634.2868165-1-usama.anjum@arm.com>

Hi Andrew,

Small nudge on this series; I believe it has all appropriate A-b/R-b tags and no
outstanding actions. Is it ready to go into mm-new?

Thanks,
Ryan


On 01/04/2026 11:16, Muhammad Usama Anjum wrote:
> Hi All,
> 
> A recent change to vmalloc caused some performance benchmark regressions (see
> [1]). I'm attempting to fix that (and at the same time significantly improve
> beyond the baseline) by freeing a contiguous set of order-0 pages as a batch.
> 
> At the same time I observed that free_contig_range() was essentially doing the
> same thing as vfree() so I've fixed it there too. While at it, optimize the
> __free_contig_frozen_range() as well.
> 
> Check that the contiguous range falls in the same section. If they aren't enabled,
> the if conditions get optimized out by the compiler as memdesc_section() returns 0.
> See num_pages_contiguous() for more details about it.
> 
> [1] https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com
> 
> v6.18      - Before the patch causing regression was added
> mm-new     - current latest code
> this series - v2 series of these patches
> 
> (>0 is faster, <0 is slower, (R)/(I) = statistically significant
> Regression/Improvement)
> 
> v6.18 vs mm-new
> +-----------------+----------------------------------------------------------+-------------------+-------------+
> | Benchmark       | Result Class                                             |   v6.18    (base) |    mm-new   |
> +=================+==========================================================+===================+=============+
> | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec)          |         653643.33 | (R) -50.92% |
> |                 | fix_size_alloc_test: p:1, h:0, l:500000 (usec)           |         366167.33 | (R) -11.96% |
> |                 | fix_size_alloc_test: p:4, h:0, l:500000 (usec)           |         489484.00 | (R) -35.21% |
> |                 | fix_size_alloc_test: p:16, h:0, l:500000 (usec)          |        1011250.33 | (R) -36.45% |
> |                 | fix_size_alloc_test: p:16, h:1, l:500000 (usec)          |        1086812.33 | (R) -31.83% |
> |                 | fix_size_alloc_test: p:64, h:0, l:100000 (usec)          |         657940.00 | (R) -38.62% |
> |                 | fix_size_alloc_test: p:64, h:1, l:100000 (usec)          |         765422.00 | (R) -24.84% |
> |                 | fix_size_alloc_test: p:256, h:0, l:100000 (usec)         |        2468585.00 | (R) -37.83% |
> |                 | fix_size_alloc_test: p:256, h:1, l:100000 (usec)         |        2815758.33 | (R) -26.32% |
> |                 | fix_size_alloc_test: p:512, h:0, l:100000 (usec)         |        4851969.00 | (R) -37.76% |
> |                 | fix_size_alloc_test: p:512, h:1, l:100000 (usec)         |        4496257.33 | (R) -31.15% |
> |                 | full_fit_alloc_test: p:1, h:0, l:500000 (usec)           |         570605.00 |      -8.97% |
> |                 | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |         500866.00 |      -5.88% |
> |                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |         499733.00 |      -6.95% |
> |                 | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec)     |        5266237.67 | (R) -40.19% |
> |                 | pcpu_alloc_test: p:1, h:0, l:500000 (usec)               |         490284.00 |      -2.10% |
> |                 | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)  |         850986.33 | (R) -48.03% |
> |                 | random_size_alloc_test: p:1, h:0, l:500000 (usec)        |        2712106.00 | (R) -40.48% |
> |                 | vm_map_ram_test: p:1, h:0, l:500000 (usec)               |         111151.33 |       3.52% |
> +-----------------+----------------------------------------------------------+-------------------+-------------+
> 
> v6.18 vs mm-new with patches
> +-----------------+----------------------------------------------------------+-------------------+--------------+
> | Benchmark       | Result Class                                             |   v6.18 (base)    |  this series |
> +=================+==========================================================+===================+==============+
> | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec)          |         653643.33 |      -14.02% |
> |                 | fix_size_alloc_test: p:1, h:0, l:500000 (usec)           |         366167.33 |       -7.23% |
> |                 | fix_size_alloc_test: p:4, h:0, l:500000 (usec)           |         489484.00 |       -1.57% |
> |                 | fix_size_alloc_test: p:16, h:0, l:500000 (usec)          |        1011250.33 |        1.57% |
> |                 | fix_size_alloc_test: p:16, h:1, l:500000 (usec)          |        1086812.33 |   (I) 15.75% |
> |                 | fix_size_alloc_test: p:64, h:0, l:100000 (usec)          |         657940.00 |    (I) 9.05% |
> |                 | fix_size_alloc_test: p:64, h:1, l:100000 (usec)          |         765422.00 |   (I) 38.45% |
> |                 | fix_size_alloc_test: p:256, h:0, l:100000 (usec)         |        2468585.00 |   (I) 12.56% |
> |                 | fix_size_alloc_test: p:256, h:1, l:100000 (usec)         |        2815758.33 |   (I) 38.61% |
> |                 | fix_size_alloc_test: p:512, h:0, l:100000 (usec)         |        4851969.00 |   (I) 13.43% |
> |                 | fix_size_alloc_test: p:512, h:1, l:100000 (usec)         |        4496257.33 |   (I) 49.21% |
> |                 | full_fit_alloc_test: p:1, h:0, l:500000 (usec)           |         570605.00 |       -8.47% |
> |                 | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |         500866.00 |       -8.17% |
> |                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |         499733.00 |       -5.54% |
> |                 | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec)     |        5266237.67 |    (I) 4.63% |
> |                 | pcpu_alloc_test: p:1, h:0, l:500000 (usec)               |         490284.00 |        1.53% |
> |                 | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)  |         850986.33 |       -0.00% |
> |                 | random_size_alloc_test: p:1, h:0, l:500000 (usec)        |        2712106.00 |        1.22% |
> |                 | vm_map_ram_test: p:1, h:0, l:500000 (usec)               |         111151.33 |    (I) 4.98% |
> +-----------------+----------------------------------------------------------+-------------------+--------------+
> 
> mm-new vs vmalloc_2 results are in 2/3 patch.
> 
> So this series is mitigating the regression on average as results show -14% to 49% improvement.
> 
> Thanks,
> Muhammad Usama Anjum
> 
> ---
> Changes since v5:
> - Patch 1: Move page_to_pfn() outside the loop free_prepared_contig_range()
> - Patch 2: Change subject of the patch
> Changes since v4: (summary)
> - Patch 1: move can_free initialization inside the loop
> - Patch 1: Use pfn_to_page() for each pfn instead of page++
> - Patch 2: Use num_pages_contiguous() instead of raw loop
> 
> Chagnes since v3: (summary)
> - Introduce __free_contig_range_common() in first patch  and use it in
>   3rd patch as well
> - Cosmetic changes related to comments and kerneldoc
> 
> Changes since v2: (summary)
> - Patch 1 and 3:  Rework the loop to check for memory sections
> - Patch 2: Rework by removing the BUG on and add helper free_pages_bulk()
> 
> Changes since v1:
> - Update description
> - Rebase on mm-new and rerun benchmarks/tests
> - Patch 1: move FPI_PREPARED check and add todo
> - Patch 2: Rework catering newer changes in vfree()
> - New Patch 3: optimizes __free_contig_frozen_range()
> 
> Muhammad Usama Anjum (1):
>   mm/page_alloc: Optimize __free_contig_frozen_range()
> 
> Ryan Roberts (2):
>   mm/page_alloc: Optimize free_contig_range()
>   vmalloc: Optimize vfree with free_pages_bulk()
> 
>  include/linux/gfp.h |   4 ++
>  mm/page_alloc.c     | 143 ++++++++++++++++++++++++++++++++++++++++++--
>  mm/vmalloc.c        |  16 ++---
>  3 files changed, 146 insertions(+), 17 deletions(-)
>

next prev parent reply	other threads:[~2026-04-22 13:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 10:16 [PATCH v6 0/3] mm: Free contiguous order-0 pages efficiently Muhammad Usama Anjum
2026-04-01 10:16 ` [PATCH v6 1/3] mm/page_alloc: Optimize free_contig_range() Muhammad Usama Anjum
2026-04-01 10:16 ` [PATCH v6 2/3] vmalloc: Optimize vfree with free_pages_bulk() Muhammad Usama Anjum
2026-04-01 10:19   ` David Hildenbrand (Arm)
2026-04-01 15:13   ` Uladzislau Rezki
2026-04-01 10:16 ` [PATCH v6 3/3] mm/page_alloc: Optimize __free_contig_frozen_range() Muhammad Usama Anjum
2026-04-22 13:42 ` Ryan Roberts [this message]
2026-04-22 15:40   ` [PATCH v6 0/3] mm: Free contiguous order-0 pages efficiently David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a8ba718-7ff2-4876-9154-9dd5d404df3e@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=david.hildenbrand@arm.com \
    --cc=david@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=terrelln@fb.com \
    --cc=urezki@gmail.com \
    --cc=usama.anjum@arm.com \
    --cc=vbabka@kernel.org \
    --cc=vishal.moola@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox