From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBFBD2045BC; Fri, 21 Feb 2025 09:34:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740130475; cv=none; b=nkn5hnSjvnb1hnRAD/aQEN5qR+Sl0X80VAslbPS4y4el6CvUu9wJ/BtRzeVCUw59TD9+PPr2SSzcwqkpLtxF3m0f8ThvfhyIUxK64PFI5d/9OhtMq6BAUwEpFzFV3TAdCVT8QiYcm/SIzZCkAEMswIKhba583JUjU9fVDwF2yko= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740130475; c=relaxed/simple; bh=UrE4zNeA6rGUajNEitvvWzIh6MglysOpeI2P5JxANP8=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=WU55k9ffti/1mCydPUJ82xGOQpXMrMkL5tnQreBqwBJyw/yr5pNBA/VfG6kTwdGxpgmd+hwh+OoTse7BVfPFREkMsWM7KmPnya/X3cLlI1v7OoD4ttF1EeXO8+KXe9RXjdijPcO5BNcAROYNCi1I7irotd+LMukh7XNENgOq3mo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4YzlG03WHwzdb9B; Fri, 21 Feb 2025 17:29:44 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 57A03140158; Fri, 21 Feb 2025 17:34:23 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 21 Feb 2025 17:34:22 +0800 Message-ID: Date: Fri, 21 Feb 2025 17:34:22 +0800 Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC] mm: alloc_pages_bulk: remove assumption of populating only NULL elements To: Chuck Lever , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , Alex Williamson , Chris Mason , Josef Bacik , David Sterba , Gao Xiang , Chao Yu , Yue Hu , Jeffle Xu , Sandeep Dhavale , Carlos Maiolino , "Darrick J. Wong" , Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Trond Myklebust , Anna Schumaker , Jeff Layton , Neil Brown , Olga Kornievskaia , Dai Ngo , Tom Talpey CC: Luiz Capitulino , Mel Gorman , , , , , , , , , References: <20250217123127.3674033-1-linyunsheng@huawei.com> <7b7492c0-a3a7-470b-b7aa-697ac790a94b@oracle.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: <7b7492c0-a3a7-470b-b7aa-697ac790a94b@oracle.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61) On 2025/2/18 22:17, Chuck Lever wrote: > On 2/18/25 4:16 AM, Yunsheng Lin wrote: >> On 2025/2/17 22:20, Chuck Lever wrote: >>> On 2/17/25 7:31 AM, Yunsheng Lin wrote: >>>> As mentioned in [1], it seems odd to check NULL elements in >>>> the middle of page bulk allocating, >>> >>> I think I requested that check to be added to the bulk page allocator. >>> >>> When sending an RPC reply, NFSD might release pages in the middle of >> >> It seems there is no usage of the page bulk allocation API in fs/nfsd/ >> or fs/nfs/, which specific fs the above 'NFSD' is referring to? > > NFSD is in fs/nfsd/, and it is the major consumer of > net/sunrpc/svc_xprt.c. > > >>> the rq_pages array, marking each of those array entries with a NULL >>> pointer. We want to ensure that the array is refilled completely in this >>> case. >>> >> >> I did some researching, it seems you requested that in [1]? >> It seems the 'holes are always at the start' for the case in that >> discussion too, I am not sure if the case is referring to the caller >> in net/sunrpc/svc_xprt.c? If yes, it seems caller can do a better >> job of bulk allocating pages into a whole array sequentially without >> checking NULL elements first before doing the page bulk allocation >> as something below: >> >> +++ b/net/sunrpc/svc_xprt.c >> @@ -663,9 +663,10 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp) >> pages = RPCSVC_MAXPAGES; >> } >> >> - for (filled = 0; filled < pages; filled = ret) { >> - ret = alloc_pages_bulk(GFP_KERNEL, pages, rqstp->rq_pages); >> - if (ret > filled) >> + for (filled = 0; filled < pages; filled += ret) { >> + ret = alloc_pages_bulk(GFP_KERNEL, pages - filled, >> + rqstp->rq_pages + filled); >> + if (ret) >> /* Made progress, don't sleep yet */ >> continue; >> >> @@ -674,7 +675,7 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp) >> set_current_state(TASK_RUNNING); >> return false; >> } >> - trace_svc_alloc_arg_err(pages, ret); >> + trace_svc_alloc_arg_err(pages, filled); >> memalloc_retry_wait(GFP_KERNEL); >> } >> rqstp->rq_page_end = &rqstp->rq_pages[pages]; >> >> >> 1. https://lkml.iu.edu/hypermail/linux/kernel/2103.2/09060.html > > I still don't see what is broken about the current API. As mentioned in [1], the page bulk alloc API before this patch may have some space for improvement from performance and easy-to-use perspective as the most existing calllers of page bulk alloc API are trying to bulk allocate the page for the whole array sequentially. 1. https://lore.kernel.org/all/c9950a79-7bcb-41c2-a59e-af315dc6d7ff@huawei.com/ > > Anyway, any changes in svc_alloc_arg() will need to be run through the > upstream NFSD CI suite before they are merged. Is there any web link pointing to the above NFSD CI suite, so that I can test it if removing assumption of populating only NULL elements is indeed possible? Look more closely, it seems svc_rqst_release_pages()/svc_rdma_save_io_pages() does set rqstp->rq_respages[i] to NULL based on rqstp->rq_next_page, and the original code before using the page bulk alloc API does seem to only allocate page for NULL elements as can see from the below patch: https://lore.kernel.org/all/20210325114228.27719-8-mgorman@techsingularity.net/T/#u The clearing of rqstp->rq_respages[] to NULL does seems sequentially, is it possible to only pass NULL elements in rqstp->rq_respages[] to alloc_pages_bulk() so that bulk alloc API does not have to do the NULL checking and use the array only as output parameter? > >