From: Dave Chinner <david@fromorbit.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Yishai Hadas <yishaih@nvidia.com>, Jason Gunthorpe <jgg@ziepe.ca>,
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
Kevin Tian <kevin.tian@intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Gao Xiang <xiang@kernel.org>,
Chao Yu <chao@kernel.org>, Yue Hu <zbestahu@gmail.com>,
Jeffle Xu <jefflexu@linux.alibaba.com>,
Sandeep Dhavale <dhavale@google.com>,
Carlos Maiolino <cem@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Luiz Capitulino <luizcap@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-erofs@lists.ozlabs.org, linux-xfs@vger.kernel.org,
linux-mm@kvack.org, netdev@vger.kernel.org,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements
Date: Tue, 4 Mar 2025 19:18:33 +1100 [thread overview]
Message-ID: <Z8a3WSOrlY4n5_37@dread.disaster.area> (raw)
In-Reply-To: <20250228094424.757465-1-linyunsheng@huawei.com>
On Fri, Feb 28, 2025 at 05:44:20PM +0800, Yunsheng Lin wrote:
> As mentioned in [1], it seems odd to check NULL elements in
> the middle of page bulk allocating, and it seems caller can
> do a better job of bulk allocating pages into a whole array
> sequentially without checking NULL elements first before
> doing the page bulk allocation for most of existing users.
>
> Through analyzing of bulk allocation API used in fs, it
> seems that the callers are depending on the assumption of
> populating only NULL elements in fs/btrfs/extent_io.c and
> net/sunrpc/svc_xprt.c while erofs and btrfs don't, see:
> commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator")
> commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
> commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers")
> commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>
> Change SUNRPC and btrfs to not depend on the assumption.
> Other existing callers seems to be passing all NULL elements
> via memset, kzalloc, etc.
>
> Remove assumption of populating only NULL elements and treat
> page_array as output parameter like kmem_cache_alloc_bulk().
> Remove the above assumption also enable the caller to not
> zero the array before calling the page bulk allocating API,
> which has about 1~2 ns performance improvement for the test
> case of time_bench_page_pool03_slow() for page_pool in a
> x86 vm system, this reduces some performance impact of
> fixing the DMA API misuse problem in [2], performance
> improves from 87.886 ns to 86.429 ns.
How much slower did you make btrfs and sunrpc by adding all the
defragmenting code there?
>
> 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@huawei.com/
> 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@huawei.com/
> CC: Jesper Dangaard Brouer <hawk@kernel.org>
> CC: Luiz Capitulino <luizcap@redhat.com>
> CC: Mel Gorman <mgorman@techsingularity.net>
> CC: Dave Chinner <david@fromorbit.com>
> CC: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Acked-by: Jeff Layton <jlayton@kernel.org>
> ---
> V2:
> 1. Drop RFC tag and rebased on latest linux-next.
> 2. Fix a compile error for xfs.
And you still haven't tested the code changes to XFS, because
this patch is also broken.
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 5d560e9073f4..b4e95b2dd0f0 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -319,16 +319,17 @@ xfs_buf_alloc_pages(
> * least one extra page.
> */
> for (;;) {
> - long last = filled;
> + long alloc;
>
> - filled = alloc_pages_bulk(gfp_mask, bp->b_page_count,
> - bp->b_pages);
> + alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - filled,
> + bp->b_pages + filled);
> + filled += alloc;
> if (filled == bp->b_page_count) {
> XFS_STATS_INC(bp->b_mount, xb_page_found);
> break;
> }
>
> - if (filled != last)
> + if (alloc)
> continue;
alloc_pages_bulk() now returns the number of pages allocated in the
array. So if we ask for 4 pages, then get 2, filled is now 2. Then
we loop, ask for another 2 pages, get those two pages and it returns
4. Now filled is 6, and we continue.
Now we ask alloc_pages_bulk() for -2 pages, which returns 4 pages...
Worse behaviour: second time around, no page allocation succeeds
so it returns 2 pages. Filled is now 4, which is the number of pages
we need, so we break out of the loop with only 2 pages allocated.
There's about to be kernel crashes occur.....
Once is a mistake, twice is compeltely unacceptable. When XFS stops
using alloc_pages_bulk (probably 6.15) I won't care anymore. But
until then, please stop trying to change this code.
NACK.
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-03-04 8:18 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 9:44 [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements Yunsheng Lin
2025-02-28 9:44 ` Yunsheng Lin via Linux-erofs
2025-03-03 22:13 ` Chuck Lever
2025-03-03 22:13 ` Chuck Lever via Linux-erofs
2025-03-04 12:04 ` Yunsheng Lin
2025-03-04 8:18 ` Dave Chinner [this message]
2025-03-04 12:09 ` Yunsheng Lin
2025-03-08 6:43 ` Dave Chinner
2025-03-09 13:40 ` Yunsheng Lin
2025-03-10 0:32 ` Gao Xiang
2025-03-10 12:31 ` Yunsheng Lin
2025-03-10 12:59 ` Gao Xiang
2025-03-11 22:55 ` NeilBrown
2025-03-12 1:45 ` Gao Xiang
2025-03-12 12:05 ` Yunsheng Lin
2025-03-12 12:41 ` Gao Xiang
2025-03-04 9:17 ` Qu Wenruo
2025-03-05 12:17 ` Yunsheng Lin
2025-03-05 23:41 ` NeilBrown
2025-03-06 11:43 ` Yunsheng Lin
2025-03-06 21:14 ` NeilBrown
2025-03-07 9:23 ` Yunsheng Lin
2025-03-07 21:02 ` NeilBrown
2025-03-09 13:23 ` Yunsheng Lin
2025-03-10 0:10 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8a3WSOrlY4n5_37@dread.disaster.area \
--to=david@fromorbit.com \
--cc=Dai.Ngo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=anna@kernel.org \
--cc=cem@kernel.org \
--cc=chao@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=davem@davemloft.net \
--cc=dhavale@google.com \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jefflexu@linux.alibaba.com \
--cc=jgg@ziepe.ca \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kevin.tian@intel.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=luizcap@redhat.com \
--cc=mgorman@techsingularity.net \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=pabeni@redhat.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=xiang@kernel.org \
--cc=yishaih@nvidia.com \
--cc=zbestahu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.