From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71F9D1A23AC for ; Mon, 17 Feb 2025 21:31:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739827922; cv=none; b=m43jkjEs1nqAGgUnenxpMxQ3vtnAvIHWCOiCFkOzpSK3BiqQqVwe86d3GXzgA0TSiPyrqpWphfmTDpSUmiXTNPwRGKxr0UWygcxvoEWjPADiBayasRcpB7y6ycyv9e2/+aPqdVSRRUoJJPnDCR6r8PPxcZPmDNO76LFbEg68Rug= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739827922; c=relaxed/simple; bh=5Z3w6D6rLKLuQ6zEjNL6ikv+CipM0JHOZMCbXFfqNBQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Z9JLnzs3CcJ4Yrnof3ZVjljeoaxjOKTDsUiyPRpHhlzZnU2eFDaOFcFZsMfLvu4FfzODxpWBOb7uqUBHdAughEDfdujT53BxmI9Y1xd/6y2tLReAc2BL+K1Ndmrr6iNB14hwg746gF6UvmcFygbfIcJWqGa853ydvoUQdugSBP0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=iSnbtykA; arc=none smtp.client-ip=209.85.216.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="iSnbtykA" Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2fc0bd358ccso9586745a91.2 for ; Mon, 17 Feb 2025 13:31:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1739827919; x=1740432719; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=QhTfy1/ybvgYfqGEX8d/5gFaFMopC4YNbC7hGVBL/4c=; b=iSnbtykAPgeoaamX/D5Xv39Q6ZYYIHEVE4zgMl66cwZvyQ48oaYzC/IgKLS86w/HD5 CKBcfACcqxE79g5417jqRYfPLcOSB6GqY0F0IGgHXLJSrAsw+pHW+/BrFDI6sARK9/gu Ykml65dgc1/utgt1Fmn5RhHyRVJuUcEvZU3xJchUB3WXLYnbSu+ReTOwGxknLjjSZHeX 98GJwsv/gTKG2lH8c9qcf91n5QTxQQ0DqQEQLYYDOqXZmgxu6lKi/Skz8q+fHbi/IGsU A5VQDVfWr3DZcIo3115AKKu7y5dD6S6IG87n4slcw011eIp41wnUCyXKCC04ss9B4jwe KEEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739827919; x=1740432719; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=QhTfy1/ybvgYfqGEX8d/5gFaFMopC4YNbC7hGVBL/4c=; b=xHxgEs5ZmCRJcN6FJgT3qeAgEdNJin0erUQHEbuMZlSe8iz3oX4seKgUljIfVJUcVg bwtec0NP/5cxFcVGsMeM0eb10s61+8QInFluauQkE1jZJENYIU3AzDh8vVOfKz1eW0mY 3FvR5ohayTH/yM32+wz6BTEXXo7JqPg0Iy6Oxib36zBCwE+NuVhgPzG33nEON9xjfkZJ AUN/0oWqeYu2z5n1SAmPxpFDllHO9CXvCb2MrwdTl/K57F1QKh2mhu9FLx/36p6XUdfx f0bFHegJn6xS24fJ14gQ3CPIlqoTHutYtDPP2hkrdHVG+qJcw9QSwbGX2lM9iBQ9U5p2 4bVg== X-Forwarded-Encrypted: i=1; AJvYcCWQMnyo/2wVemFUkD0zTVxFHzHOYt95y3kxoJRfQiAVr8Uk2YnKWxNRhp7CmdCG5+XVeZBUUosYVe8rWQ==@vger.kernel.org X-Gm-Message-State: AOJu0Ywi2YBxu/hAiuy5M+XqcZd53AumYoENp6xqAbn1F3gTq13ahRxf hwhZO2WfA/Q0c6KEjzSqRIit5JBddHz07TXUPt3HkvUaasVP8nzMXqbjHrDuwEQ= X-Gm-Gg: ASbGnct3MSbzLNgU32lrvaZMlgDfyZJzWQvurgL+xYJXEuYWtctbGMAzI3yAS5+X3e4 XSeaFtDciXdNYQpVjBC7vZJPSSjYQ2CQcNjdJJGicw2hGh4/dBolPJ88pn4/rRhkEOlWbi77qMz BXrgIOz+9wUf9PKJjTWPsDQ5NxLijR6Cb5VtFipVQTg1Q2zSq8orXXyJIE+k76unBdG/18Zwpm4 U4uxlUTy4gm5i2TWmk47xjwpGLHNV7P9I0UZpYtcXo6S86Y5uxH3mwGCLIvD6YhTQZWf9BNpkaF WxcYXf0mbi2j5N9Mq4/XKiuXCqCKuGwX0eoV4jBHt73nqeU1aIpfwR6OMUzvhon312Q= X-Google-Smtp-Source: AGHT+IGWbBJAtL2UtKSWCS6Jd9Ns/0Md8Y1s79IB++xb28sgasLU0+0/qaH46F3zNCw25wym7QqlJQ== X-Received: by 2002:a17:90b:3b4b:b0:2ee:fdf3:390d with SMTP id 98e67ed59e1d1-2fc4115087fmr15333822a91.31.1739827918683; Mon, 17 Feb 2025 13:31:58 -0800 (PST) Received: from dread.disaster.area (pa49-186-89-135.pa.vic.optusnet.com.au. [49.186.89.135]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d53491d4sm76091525ad.4.2025.02.17.13.31.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Feb 2025 13:31:58 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1tk8iZ-00000002XxN-2C1Y; Tue, 18 Feb 2025 08:31:55 +1100 Date: Tue, 18 Feb 2025 08:31:55 +1100 From: Dave Chinner To: Yunsheng Lin Cc: Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , Alex Williamson , Chris Mason , Josef Bacik , David Sterba , Gao Xiang , Chao Yu , Yue Hu , Jeffle Xu , Sandeep Dhavale , Carlos Maiolino , "Darrick J. Wong" , Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Trond Myklebust , Anna Schumaker , Chuck Lever , Jeff Layton , Neil Brown , Olga Kornievskaia , Dai Ngo , Tom Talpey , Luiz Capitulino , Mel Gorman , kvm@vger.kernel.org, virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-nfs@vger.kernel.org Subject: Re: [RFC] mm: alloc_pages_bulk: remove assumption of populating only NULL elements Message-ID: References: <20250217123127.3674033-1-linyunsheng@huawei.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250217123127.3674033-1-linyunsheng@huawei.com> On Mon, Feb 17, 2025 at 08:31:23PM +0800, Yunsheng Lin wrote: > As mentioned in [1], it seems odd to check NULL elements in > the middle of page bulk allocating, and it seems caller can > do a better job of bulk allocating pages into a whole array > sequentially without checking NULL elements first before > doing the page bulk allocation. .... IMO, the new API is a poor one, and you've demonstrated it clearly in this patch. ..... > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 15bb790359f8..9e1ce0ab9c35 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -377,16 +377,17 @@ xfs_buf_alloc_pages( > * least one extra page. > */ > for (;;) { > - long last = filled; > + long alloc; > > - filled = alloc_pages_bulk(gfp_mask, bp->b_page_count, > - bp->b_pages); > + alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - refill, > + bp->b_pages + refill); > + refill += alloc; > if (filled == bp->b_page_count) { > XFS_STATS_INC(bp->b_mount, xb_page_found); > break; > } > > - if (filled != last) > + if (alloc) > continue; You didn't even compile this code - refill is not defined anywhere. Even if it did complile, you clearly didn't test it. The logic is broken (what updates filled?) and will result in the first allocation attempt succeeding and then falling into an endless retry loop. i.e. you stepped on the API landmine of your own creation where it is impossible to tell the difference between alloc_pages_bulk() returning "memory allocation failed, you need to retry" and it returning "array is full, nothing more to allocate". Both these cases now return 0. The existing code returns nr_populated in both cases, so it doesn't matter why alloc_pages_bulk() returns with nr_populated != full, it is very clear that we still need to allocate more memory to fill it. The whole point of the existing API is to prevent callers from making stupid, hard to spot logic mistakes like this. Forcing callers to track both empty slots and how full the array is itself, whilst also constraining where in the array empty slots can occur greatly reduces both the safety and functionality that alloc_pages_bulk() provides. Anyone that has code that wants to steal a random page from the array and then refill it now has a heap more complex code to add to their allocator wrapper. IOWs, you just demonstrated why the existing API is more desirable than a highly constrained, slightly faster API that requires callers to get every detail right. i.e. it's hard to get it wrong with the existing API, yet it's so easy to make mistakes with the proposed API that the patch proposing the change has serious bugs in it. -Dave. -- Dave Chinner david@fromorbit.com