From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 198A9225DE; Fri, 5 Jan 2024 08:40:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4T5xjG2Wkkz1vrfH; Fri, 5 Jan 2024 16:40:02 +0800 (CST) Received: from dggpemm500005.china.huawei.com (unknown [7.185.36.74]) by mail.maildlp.com (Postfix) with ESMTPS id 298FF1400DD; Fri, 5 Jan 2024 16:40:16 +0800 (CST) Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 5 Jan 2024 16:40:15 +0800 Subject: Re: [RFC PATCH net-next v1 4/4] net: page_pool: use netmem_t instead of struct page in API To: Mina Almasry CC: Shakeel Butt , Jakub Kicinski , , , , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , , "H. Peter Anvin" , Greg Kroah-Hartman , "Rafael J. Wysocki" , Sumit Semwal , =?UTF-8?Q?Christian_K=c3=b6nig?= , Michael Chan , "David S. Miller" , Eric Dumazet , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Wei Fang , Shenwei Wang , Clark Wang , NXP Linux Team , Jeroen de Borst , Praveen Kaligineedi , Shailend Chand , Yisen Zhuang , Salil Mehta , Jesse Brandeburg , Tony Nguyen , Thomas Petazzoni , Marcin Wojtas , Russell King , Sunil Goutham , Geetha sowjanya , Subbaraya Sundeep , hariprasad , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Saeed Mahameed , Leon Romanovsky , Horatiu Vultur , , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Jassi Brar , Ilias Apalodimas , Alexandre Torgue , Jose Abreu , Maxime Coquelin , Siddharth Vadapalli , Ravi Gunasekaran , Roger Quadros , Jiawen Wu , Mengyuan Lou , Ronak Doshi , VMware PV-Drivers Reviewers , Ryder Lee , Shayne Chen , Kalle Valo , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Stefan Hajnoczi , Stefano Garzarella , Shuah Khan , =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= , Nathan Chancellor , Nick Desaulniers , Bill Wendling , Justin Stitt , Jason Gunthorpe , Willem de Bruijn References: <20231214020530.2267499-1-almasrymina@google.com> <20231214020530.2267499-5-almasrymina@google.com> <20231215021114.ipvdx2bwtxckrfdg@google.com> <20231215190126.1040fa12@kernel.org> <54f226ef-df2d-9f32-fa3f-e846d6510758@huawei.com> <7c6d35e3-165f-5883-1c1b-fce82c976028@huawei.com> <99817ed2-8ba6-ef8f-3ccb-2a2ab284b4af@huawei.com> From: Yunsheng Lin Message-ID: <894177e8-6403-e31c-e246-d9234218626b@huawei.com> Date: Fri, 5 Jan 2024 16:40:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500005.china.huawei.com (7.185.36.74) On 2024/1/5 2:24, Mina Almasry wrote: > On Thu, Jan 4, 2024 at 12:48 AM Yunsheng Lin wrote: >> >> On 2024/1/4 2:38, Mina Almasry wrote: >>> On Wed, Jan 3, 2024 at 1:47 AM Yunsheng Lin wrote: >>>> >>>> On 2024/1/3 0:14, Mina Almasry wrote: >>>>> >>>>> The idea being that skb_frag_page() can return NULL if the frag is not >>>>> paged, and the relevant callers are modified to handle that. >>>> >>>> There are many existing drivers which are not expecting NULL returning for >>>> skb_frag_page() as those drivers are not supporting devmem, adding additionl >>>> checking overhead in skb_frag_page() for those drivers does not make much >>>> sense, IMHO, it may make more sense to introduce a new helper for the driver >>>> supporting devmem or networking core that needing dealing with both normal >>>> page and devmem. >>>> >>>> And we are also able to keep the old non-NULL returning semantic for >>>> skb_frag_page(). >>> >>> I think I'm seeing agreement that the direction we're heading into >>> here is that most net stack & drivers should use the abstract netmem >> >> As far as I see, at least for the drivers, I don't think we have a clear >> agreement if we should have a unified driver facing struct or API for both >> normal page and devmem yet. >> > > To be honest I definitely read that we have agreement that we should > have a unified driver facing struct from the responses in this thread > like this one: > > https://lore.kernel.org/netdev/20231215190126.1040fa12@kernel.org/ Which specific comment made you thinking as above? I think it definitely need clarifying here, as I read it differently as you did. > > But I'll let folks correct me if I'm wrong. > >>> type, and only specific code that needs a page or devmem (like >>> tcp_receive_zerocopy or tcp_recvmsg_dmabuf) will be the ones that >>> unpack the netmem and get the underlying page or devmem, using >>> skb_frag_page() or something like skb_frag_dmabuf(), etc. >>> >>> As Jason says repeatedly, I'm not allowed to blindly cast a netmem to >>> a page and assume netmem==page. Netmem can only be cast to a page >>> after checking the low bits and verifying the netmem is actually a >> >> I thought it would be best to avoid casting a netmem or devmem to a >> page in the driver, I think the main argument is that it is hard >> to audit very single driver doing a checking before doing the casting >> in the future? and we can do better auditting if the casting is limited >> to a few core functions in the networking core. >> > > Correct, the drivers should never cast directly, but helpers like > skb_frag_page() must check that the netmem is a page before doing a > cast. > >>> page. I think any suggestions that blindly cast a netmem to page >>> without the checks will get nacked by Jason & Christian, so the >>> checking in the specific cases where the code needs to know the >>> underlying memory type seems necessary. >>> >>> IMO I'm not sure the checking is expensive. With likely/unlikely & >>> static branches the checks should be very minimal or a straight no-op. >>> For example in RFC v2 where we were doing a lot of checks for devmem >>> (we don't do that anymore for RFCv5), I had run the page_pool perf >>> tests and proved there is little to no perf regression: >> >> For MAX_SKB_FRAGS being 17, it means we may have 17 additional checking >> overhead for the drivers not supporting devmem, not to mention we may >> have bigger value for MAX_SKB_FRAGS if BIG TCP is enable. >> > > With static branch the checks should be complete no-ops unless the > user's set up enabled devmem. What if the user does set up enabled devmem and still want to enable page_pool for normal page in the same system? Is there a reason I don't know, which stops you from keeping the old helper and introducing a new helper if it is needed for the new netmem thing? > >> Even there is no notiable performance degradation for a specific case, >> we should avoid the overhead as much as possible for the existing use >> case when supporting a new use case. >> >>> >>> https://lore.kernel.org/netdev/CAHS8izM4w2UETAwfnV7w+ZzTMxLkz+FKO+xTgRdtYKzV8RzqXw@mail.gmail.com/ >> >> The above test case does not even seems to be testing a code path calling >> skb_frag_page() as my understanding. >> >>> > > >