From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BB9CD109C036 for ; Wed, 25 Mar 2026 16:16:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32C5B6B008C; Wed, 25 Mar 2026 12:16:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 303A76B0092; Wed, 25 Mar 2026 12:16:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 219B26B0093; Wed, 25 Mar 2026 12:16:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0C9876B008C for ; Wed, 25 Mar 2026 12:16:45 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E3B52C30CF for ; Wed, 25 Mar 2026 16:16:43 +0000 (UTC) X-FDA: 84585088686.12.948D5C9 Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf22.hostedemail.com (Postfix) with ESMTP id B6B7CC000A for ; Wed, 25 Mar 2026 16:16:41 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20251104 header.b=khRcxBEF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1774455401; a=rsa-sha256; cv=none; b=qBwnA/1jpKGlISc7yl6CuvyPjcDEDCPq+viyrmntq5Y2+jsnq45UqJ7zA9Jou43p021APL ZBUQ7tsKK4qFVzrs98EUNLL0KvWxVVVdC6AbU0NMWkfwRl1trFCdh0EcxDhpM8ExcIhvBp KyuiXiOFNVrFHoJDMtwGG35NPJEJoHs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20251104 header.b=khRcxBEF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1774455401; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mQNkW1E4iLDlDxzhZtGOPx0+whhP6nU11icM6Ww2f1o=; b=oDYmnl/2jEWy9jc91xrFLdGQZwQ9qwwqDrDBn1WULuXdDvMu3sIkkoFhrSjC3TeQKcWwpc d94drL1gDL9DquLapBNI1SPkUkd0VmwLBebaQxi1D14+15V3eDW5yvS4Ht+YhYwhl/9za8 Xrq0zdBLMJtxdImdwv1kb1GfZRr4XLw= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-5a13a06fc85so120329e87.1 for ; Wed, 25 Mar 2026 09:16:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774455400; x=1775060200; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=mQNkW1E4iLDlDxzhZtGOPx0+whhP6nU11icM6Ww2f1o=; b=khRcxBEF9nLyjCdxzRiCCNIVUp47JmgWNfQ17lY4PLrKwP7uEQp6Ry8GlRyjkUVJB6 h+VYLj13/lLNumlPGX/ESmGW9RGuHOiEE5ZNfQeUPg/DOEtkq8inD1gGP8hlPSsv/RTB ETLFDT9qgKmLLbpKx2gDHXfpCyWR54EXsoRTN6zynpFSq9t/3eymZOUcC7QhnuBGNafU 7+W+8keumjrDVekxixE86k24t6yUu06o4m1+1n3hyZVwilnlANHZMBmTZyICZxwjSX9q 2SuRouLuXt19HEYf2sDVD5kEFaXMKhltZv3Po/NP1Tgf6F1B77O0nLe5EaKqbOaLefvV s4IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774455400; x=1775060200; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mQNkW1E4iLDlDxzhZtGOPx0+whhP6nU11icM6Ww2f1o=; b=HjXoa+mDmrJbgAytdYSNEkUEsKJdpqPPkMQukMBbuiqIxLSUyjlhzBwozZyFX/rc8L X1Yii0yL3a87fQhf8PId4k46fP4BgyrvJwBE0OTu0gYhswF13AXEFguX4UYRlDjqA6lh gyc4memXYnT4UX2kakpeqbaJz0kl3q+pFJh5SbW1C9rwXT+ZcN5qocBc1M7KJ377mm0V FnoeeE9Uk6b16zG0APOnskpPGGtLFaGEf1cWbGXwZ6yFg1FCOmrhFlWxJPecBUZxqnWJ stseULwRDldHJxwfHOL2ftLyabgFlhWY76GbJt5LS8T1GQnS1fGPb5cuzN9DBCNn/ucN BOOA== X-Forwarded-Encrypted: i=1; AJvYcCWfsQFND1NptW82eN6zaihbi4dsPMVmGMr51q2wysJg/KwO0IiGl9maHmH3uM7lU0tO9KqFRER2dQ==@kvack.org X-Gm-Message-State: AOJu0Yyn3+d051mTgGn4y/jGnnloE4L0j1KtmZ5bnDYBPQCaWo2PnKWx s/qpNTB9gkF369/DMU2xBrlhdfRwS7xdO63ni7CygUM7N4aV2QJbQ5bc X-Gm-Gg: ATEYQzxPPQsoZpYLyfgsezCNArO627T80/LoBaPYz5EpxD/MJw/BCxEjF3bR2jGU1kd dHnX+QINRyDgIq2EkHTZuXoyPLAcZjltHkIXS2A1HdZ2zRyzB9ObrzBk6X0yLXPmRU57M20Qfih epo7dLQRqdMRZwWSGKjmVwpycp13DCS+6BvfMPCJvfdPRI+aFifs63qbHfAk5kseDu8U93H+3c9 TKyuxxhVlqlhjH2qVyrlGvfmYlSHhJ9nbirhF56nrx36RAVR6R2Tx5cs64qiKSZeZoU3VmdX+aR MYqX8Mbmw8TfgMYnkakrs+yOWCLG0qn+6MlgY0EuUQMbtNCEqhKykEoYRyPN61dnCzPJidDicGq hXxqP7+WW4TRBTHhWeH3OWqm2Vdcf25PDwmSuiAZyOCAR+THzHdYSmpXfMaIWPrRm X-Received: by 2002:a05:6512:33c6:b0:5a2:7ba1:4932 with SMTP id 2adb3069b0e04-5a29b99ef6fmr1703865e87.42.1774455399259; Wed, 25 Mar 2026 09:16:39 -0700 (PDT) Received: from milan ([2001:9b1:d5a0:a500::24b]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5a2a064d35esm4117e87.34.2026.03.25.09.16.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Mar 2026 09:16:38 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Wed, 25 Mar 2026 17:16:36 +0100 To: Muhammad Usama Anjum Cc: Uladzislau Rezki , Zi Yan , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Nick Terrell , David Sterba , Vishal Moola , linux-mm@kvack.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Ryan.Roberts@arm.com, david.hildenbrand@arm.com Subject: Re: [PATCH v3 2/3] vmalloc: Optimize vfree Message-ID: References: <20260324133538.497616-1-usama.anjum@arm.com> <20260324133538.497616-3-usama.anjum@arm.com> <1D88CFF0-8A74-413F-9A6A-39E27B760AE1@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: t11dxycxx81ikmsy5hi3chdahhydjnzn X-Rspamd-Queue-Id: B6B7CC000A X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1774455401-374292 X-HE-Meta: U2FsdGVkX19Y62PB1CjoHViuBmbHURe1n0U7kr0JTnmtkNm6ighcNLByE8BJsezU5W+M0sI+MHRdgI0iR85yje7alKQnuIUHmCnwoUi1Uxpc1dgsW/8/k61k2TyhIsUJHiLHgTz0htpxbGseOpkMDe/eHhwvdo14vCQZzL5RgdHY7fLbuCDtNXj+nYn2mKvwQyRq9FW+uY87JLdDiHO84a3YxnHWUCOeLYObPrlIBdLy/50YQQx/cDwZ8aWCXukKPlXVXxwxQYq4h1QfWwZAlH6QHWo13NjXfCgOpnUT8nYEYeShPuSkFakCSE7fo40fZUVPuTsnyeDOMwCnhiS8A0wWf9VutI6oBa7MbZT+uwnoiNvTgMBwri4YNkTYy4B0I1lBbwAylQgoWe0Nxf07eE2JBtzFP4uNNu17NJPujbQxckTYBPMJrLA75FBU2QeexVZzQlSeUExew7Exzh85yzC7Brq37ztfKKtrOzdHsmFDy0mK/io7GbALnCETepY71pFr+WaYy1xTDncMvBdZgGYkKvVMNL+NeSqx7yyUwqG8JkPNOLFjCuq4BS8T0VDM1voquO7LGBmjpo4v3TVdt7bmzbpAAPXe0ag2HJmrMh//u0O292XebOtYXOiTXoYIOls5TlgIyu01fxxw7IDzQkboWpig6OMH2FnBQxI5nMAnacstV+fFvywvvlwwaWw2POVz+G36K0gvzbuOYzEkE/4pOyaSMRPdjH78DdFj7V0XAjxgBo8BLtMqZmiA54I8LYEtwVp8/rx2V5r0qQEmiHLj2sJ3gwqBBWlXGb3+P77oDsYJdwh2Nt1K2o+Tt47+aHxcQYOLbAjG1h3lobc45Nhnz7bTl1yDtwybCA4hcyKD00BZ+HfTeCFn7aGFjI0i2EHTpO+ECfIETH1WOYJvvKol4/Evv7CwHddbCbl8wiFxWn1k2EdcePAdkdQgcERj0pwv5nX+oRSJcr/99dw fndNbZNp iFr61UqH2/asGahtyjTO9kMGXnQWYX9nx9f/0EADwgK7H+xIxxf2gy03IxOWroVDLoMT8NeflAVhQYyw9TPVlgs/vwm2qdIGa9gfGGajsH+gCyz3QhfDKiWfUGHbNIbJgMF7WRSiluZW1iLOcINSdSIh6LDz8/DMuQlbiMJY7Xe/g5xuogp3I/ai2Fz0zlLzKE5YpJ7qgThNoCyaARgj06WRAC1XKl5bV8LYLOLuJfKVEAeXUpXZzVF0Ooc5UpkQkilii9LNbZp7EMOlVW9ai3K+sY9OKAOonrRa2QRzXAJ2OQxjStHIPRTt3DX1zMYmMNmPoc7+iTi7TfHZHS9GZ1SutWtW9O1gol9Hx1uE4CC/Mec72J5srYnxzq56ZNQ0E/NV9m/zKVSazOetOY3i9FpDMg+yiEX2npA8YkfTdsCf4oONyDw71fjyhKOI21AoX78G4Iyvm4guYVsNJ1tHL5UIsEYALzZGld87mGL9Fvnwcy0bMheRjbGkvZIhut3tdUbR4SRxBw7CIrHM687TAY3932ucrguWFto5PiPX4FrOie+JX26GRNKErm+lgK5MLWMpntWqlNlNhgOXT7ExdZwuWmA== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 25, 2026 at 03:02:14PM +0000, Muhammad Usama Anjum wrote: > On 25/03/2026 8:56 am, Uladzislau Rezki wrote: > > On Tue, Mar 24, 2026 at 10:55:55AM -0400, Zi Yan wrote: > >> On 24 Mar 2026, at 9:35, Muhammad Usama Anjum wrote: > >> > >>> From: Ryan Roberts > >>> > >>> Whenever vmalloc allocates high order pages (e.g. for a huge mapping) it > >>> must immediately split_page() to order-0 so that it remains compatible > >>> with users that want to access the underlying struct page. > >>> Commit a06157804399 ("mm/vmalloc: request large order pages from buddy > >>> allocator") recently made it much more likely for vmalloc to allocate > >>> high order pages which are subsequently split to order-0. > >>> > >>> Unfortunately this had the side effect of causing performance > >>> regressions for tight vmalloc/vfree loops (e.g. test_vmalloc.ko > >>> benchmarks). See Closes: tag. This happens because the high order pages > >>> must be gotten from the buddy but then because they are split to > >>> order-0, when they are freed they are freed to the order-0 pcp. > >>> Previously allocation was for order-0 pages so they were recycled from > >>> the pcp. > >>> > >>> It would be preferable if when vmalloc allocates an (e.g.) order-3 page > >>> that it also frees that order-3 page to the order-3 pcp, then the > >>> regression could be removed. > >>> > >>> So let's do exactly that; use the new __free_contig_range() API to > >>> batch-free contiguous ranges of pfns. This not only removes the > >>> regression, but significantly improves performance of vfree beyond the > >>> baseline. > >>> > >>> A selection of test_vmalloc benchmarks running on arm64 server class > >>> system. mm-new is the baseline. Commit a06157804399 ("mm/vmalloc: request > >>> large order pages from buddy allocator") was added in v6.19-rc1 where we > >>> see regressions. Then with this change performance is much better. (>0 > >>> is faster, <0 is slower, (R)/(I) = statistically significant > >>> Regression/Improvement): > >>> > >>> +-----------------+----------------------------------------------------------+-------------------+--------------------+ > >>> | Benchmark | Result Class | mm-new | this series | > >>> +=================+==========================================================+===================+====================+ > >>> | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | 1331843.33 | (I) 67.17% | > >>> | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 415907.33 | -5.14% | > >>> | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 755448.00 | (I) 53.55% | > >>> | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | 1591331.33 | (I) 57.26% | > >>> | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 1594345.67 | (I) 68.46% | > >>> | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 1071826.00 | (I) 79.27% | > >>> | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 1018385.00 | (I) 84.17% | > >>> | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 3970899.67 | (I) 77.01% | > >>> | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 3821788.67 | (I) 89.44% | > >>> | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | 7795968.00 | (I) 82.67% | > >>> | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | 6530169.67 | (I) 118.09% | > >>> | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 626808.33 | -0.98% | > >>> | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 532145.67 | -1.68% | > >>> | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 537032.67 | -0.96% | > >>> | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | 8805069.00 | (I) 74.58% | > >>> | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | 500824.67 | 4.35% | > >>> | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 1637554.67 | (I) 76.99% | > >>> | | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 4556288.67 | (I) 72.23% | > >>> | | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 107371.00 | -0.70% | > >>> +-----------------+----------------------------------------------------------+-------------------+--------------------+ > >>> > >>> Fixes: a06157804399 ("mm/vmalloc: request large order pages from buddy allocator") > >>> Closes: https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com/ > >>> Signed-off-by: Ryan Roberts > >>> Co-developed-by: Muhammad Usama Anjum > >>> Signed-off-by: Muhammad Usama Anjum > >>> --- > >>> Changes since v2: > >>> - Remove BUG_ON in favour of simple implementation as this has never > >>> been seen to output any bug in the past as well > >>> - Move the free loop to separate function, free_pages_bulk() > >>> - Update stats, lruvec_stat in separate loop > >>> > >>> Changes since v1: > >>> - Rebase on mm-new > >>> - Rerun benchmarks > >>> > >>> Made-with: Cursor > >>> --- > >>> include/linux/gfp.h | 2 ++ > >>> mm/page_alloc.c | 23 +++++++++++++++++++++++ > >>> mm/vmalloc.c | 16 +++++----------- > >>> 3 files changed, 30 insertions(+), 11 deletions(-) > >>> > >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h > >>> index 7c1f9da7c8e56..71f9097ab99a0 100644 > >>> --- a/include/linux/gfp.h > >>> +++ b/include/linux/gfp.h > >>> @@ -239,6 +239,8 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, > >>> struct page **page_array); > >>> #define __alloc_pages_bulk(...) alloc_hooks(alloc_pages_bulk_noprof(__VA_ARGS__)) > >>> > >>> +void free_pages_bulk(struct page **page_array, unsigned long nr_pages); > >>> + > >>> unsigned long alloc_pages_bulk_mempolicy_noprof(gfp_t gfp, > >>> unsigned long nr_pages, > >>> struct page **page_array); > >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>> index eedce9a30eb7e..250cc07e547b8 100644 > >>> --- a/mm/page_alloc.c > >>> +++ b/mm/page_alloc.c > >>> @@ -5175,6 +5175,29 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, > >>> } > >>> EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof); > >>> > >>> +void free_pages_bulk(struct page **page_array, unsigned long nr_pages) > >>> +{ > >>> + unsigned long start_pfn = 0, pfn; > >>> + unsigned long i, nr_contig = 0; > >>> + > >>> + for (i = 0; i < nr_pages; i++) { > >>> + pfn = page_to_pfn(page_array[i]); > >>> + if (!nr_contig) { > >>> + start_pfn = pfn; > >>> + nr_contig = 1; > >>> + } else if (start_pfn + nr_contig != pfn) { > >>> + __free_contig_range(start_pfn, nr_contig); > >>> + start_pfn = pfn; > >>> + nr_contig = 1; > >>> + cond_resched(); > >> > > It will cause schedule while atomic. Have you checked that > > __free_contig_range() also can sleep? Of so then we are aligned, if not > > probably we should remove it. > Sorry, I didn't get it. How does having cond_resched() in this function > affects __free_contig_range()? > It is not. What i am asking is about: spin_lock(); free_pages_bulk() ... so this is not allowed because there is cond_resched() call. We can remove it and make it possible to invoke free_pages_bulk() under spin-lock, __but__ only if for example other calls do not sleep: __free_contig_range() memdesc_section() free_prepared_contig_range() ... > > The current user of this function is only vfree() which is sleepable. > I know. But this function can be used by others soon or later. Another option is add a comment, saying that it is only for sleepable contexts. -- Uladzislau Rezki