From: Yunsheng Lin <linyunsheng@huawei.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Shakeel Butt" <shakeelb@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
bpf@vger.kernel.org, "Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>,
"Michael Chan" <michael.chan@broadcom.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Paolo Abeni" <pabeni@redhat.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Wei Fang" <wei.fang@nxp.com>,
"Shenwei Wang" <shenwei.wang@nxp.com>,
"Clark Wang" <xiaoning.wang@nxp.com>,
"NXP Linux Team" <linux-imx@nxp.com>,
"Jeroen de Borst" <jeroendb@google.com>,
"Praveen Kaligineedi" <pkaligineedi@google.com>,
"Shailend Chand" <shailend@google.com>,
"Yisen Zhuang" <yisen.zhuang@huawei.com>,
"Salil Mehta" <salil.mehta@huawei.com>,
"Jesse Brandeburg" <jesse.brandeburg@intel.com>,
"Tony Nguyen" <anthony.l.nguyen@intel.com>,
"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
"Marcin Wojtas" <mw@semihalf.com>,
"Russell King" <linux@armlinux.org.uk>,
"Sunil Goutham" <sgoutham@marvell.com>,
"Geetha sowjanya" <gakula@marvell.com>,
"Subbaraya Sundeep" <sbhatta@marvell.com>,
hariprasad <hkelam@marvell.com>, "Felix Fietkau" <nbd@nbd.name>,
"John Crispin" <john@phrozen.org>,
"Sean Wang" <sean.wang@mediatek.com>,
"Mark Lee" <Mark-MC.Lee@mediatek.com>,
"Lorenzo Bianconi" <lorenzo@kernel.org>,
"Matthias Brugger" <matthias.bgg@gmail.com>,
"AngeloGioacchino Del Regno"
<angelogioacchino.delregno@collabora.com>,
"Saeed Mahameed" <saeedm@nvidia.com>,
"Leon Romanovsky" <leon@kernel.org>,
"Horatiu Vultur" <horatiu.vultur@microchip.com>,
UNGLinuxDriver@microchip.com,
"K. Y. Srinivasan" <kys@microsoft.com>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Wei Liu" <wei.liu@kernel.org>,
"Dexuan Cui" <decui@microsoft.com>,
"Jassi Brar" <jaswinder.singh@linaro.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Alexandre Torgue" <alexandre.torgue@foss.st.com>,
"Jose Abreu" <joabreu@synopsys.com>,
"Maxime Coquelin" <mcoquelin.stm32@gmail.com>,
"Siddharth Vadapalli" <s-vadapalli@ti.com>,
"Ravi Gunasekaran" <r-gunasekaran@ti.com>,
"Roger Quadros" <rogerq@kernel.org>,
"Jiawen Wu" <jiawenwu@trustnetic.com>,
"Mengyuan Lou" <mengyuanlou@net-swift.com>,
"Ronak Doshi" <doshir@vmware.com>,
"VMware PV-Drivers Reviewers" <pv-drivers@vmware.com>,
"Ryder Lee" <ryder.lee@mediatek.com>,
"Shayne Chen" <shayne.chen@mediatek.com>,
"Kalle Valo" <kvalo@kernel.org>,
"Juergen Gross" <jgross@suse.com>,
"Stefano Stabellini" <sstabellini@kernel.org>,
"Oleksandr Tyshchenko" <oleksandr_tyshchenko@epam.com>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
"Shuah Khan" <shuah@kernel.org>,
"Mickaël Salaün" <mic@digikod.net>,
"Nathan Chancellor" <nathan@kernel.org>,
"Nick Desaulniers" <ndesaulniers@google.com>,
"Bill Wendling" <morbo@google.com>,
"Justin Stitt" <justinstitt@google.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>
Subject: Re: [RFC PATCH net-next v1 4/4] net: page_pool: use netmem_t instead of struct page in API
Date: Fri, 5 Jan 2024 16:40:15 +0800 [thread overview]
Message-ID: <894177e8-6403-e31c-e246-d9234218626b@huawei.com> (raw)
In-Reply-To: <CAHS8izMMdmWoUHetA=GceJWVBgrCNAutn+B4ErMZFG=gmF5rww@mail.gmail.com>
On 2024/1/5 2:24, Mina Almasry wrote:
> On Thu, Jan 4, 2024 at 12:48 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2024/1/4 2:38, Mina Almasry wrote:
>>> On Wed, Jan 3, 2024 at 1:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>>
>>>> On 2024/1/3 0:14, Mina Almasry wrote:
>>>>>
>>>>> The idea being that skb_frag_page() can return NULL if the frag is not
>>>>> paged, and the relevant callers are modified to handle that.
>>>>
>>>> There are many existing drivers which are not expecting NULL returning for
>>>> skb_frag_page() as those drivers are not supporting devmem, adding additionl
>>>> checking overhead in skb_frag_page() for those drivers does not make much
>>>> sense, IMHO, it may make more sense to introduce a new helper for the driver
>>>> supporting devmem or networking core that needing dealing with both normal
>>>> page and devmem.
>>>>
>>>> And we are also able to keep the old non-NULL returning semantic for
>>>> skb_frag_page().
>>>
>>> I think I'm seeing agreement that the direction we're heading into
>>> here is that most net stack & drivers should use the abstract netmem
>>
>> As far as I see, at least for the drivers, I don't think we have a clear
>> agreement if we should have a unified driver facing struct or API for both
>> normal page and devmem yet.
>>
>
> To be honest I definitely read that we have agreement that we should
> have a unified driver facing struct from the responses in this thread
> like this one:
>
> https://lore.kernel.org/netdev/20231215190126.1040fa12@kernel.org/
Which specific comment made you thinking as above?
I think it definitely need clarifying here, as I read it differently as
you did.
>
> But I'll let folks correct me if I'm wrong.
>
>>> type, and only specific code that needs a page or devmem (like
>>> tcp_receive_zerocopy or tcp_recvmsg_dmabuf) will be the ones that
>>> unpack the netmem and get the underlying page or devmem, using
>>> skb_frag_page() or something like skb_frag_dmabuf(), etc.
>>>
>>> As Jason says repeatedly, I'm not allowed to blindly cast a netmem to
>>> a page and assume netmem==page. Netmem can only be cast to a page
>>> after checking the low bits and verifying the netmem is actually a
>>
>> I thought it would be best to avoid casting a netmem or devmem to a
>> page in the driver, I think the main argument is that it is hard
>> to audit very single driver doing a checking before doing the casting
>> in the future? and we can do better auditting if the casting is limited
>> to a few core functions in the networking core.
>>
>
> Correct, the drivers should never cast directly, but helpers like
> skb_frag_page() must check that the netmem is a page before doing a
> cast.
>
>>> page. I think any suggestions that blindly cast a netmem to page
>>> without the checks will get nacked by Jason & Christian, so the
>>> checking in the specific cases where the code needs to know the
>>> underlying memory type seems necessary.
>>>
>>> IMO I'm not sure the checking is expensive. With likely/unlikely &
>>> static branches the checks should be very minimal or a straight no-op.
>>> For example in RFC v2 where we were doing a lot of checks for devmem
>>> (we don't do that anymore for RFCv5), I had run the page_pool perf
>>> tests and proved there is little to no perf regression:
>>
>> For MAX_SKB_FRAGS being 17, it means we may have 17 additional checking
>> overhead for the drivers not supporting devmem, not to mention we may
>> have bigger value for MAX_SKB_FRAGS if BIG TCP is enable.
>>
>
> With static branch the checks should be complete no-ops unless the
> user's set up enabled devmem.
What if the user does set up enabled devmem and still want to enable
page_pool for normal page in the same system?
Is there a reason I don't know, which stops you from keeping the old
helper and introducing a new helper if it is needed for the new netmem
thing?
>
>> Even there is no notiable performance degradation for a specific case,
>> we should avoid the overhead as much as possible for the existing use
>> case when supporting a new use case.
>>
>>>
>>> https://lore.kernel.org/netdev/CAHS8izM4w2UETAwfnV7w+ZzTMxLkz+FKO+xTgRdtYKzV8RzqXw@mail.gmail.com/
>>
>> The above test case does not even seems to be testing a code path calling
>> skb_frag_page() as my understanding.
>>
>>>
>
>
>
prev parent reply other threads:[~2024-01-05 8:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-14 2:05 [RFC PATCH net-next v1 0/4] Abstract page from net stack Mina Almasry
2023-12-14 2:05 ` [RFC PATCH net-next v1 1/4] vsock/virtio: use skb_frag_page() helper Mina Almasry
2023-12-14 6:27 ` David Ahern
2023-12-14 8:19 ` Stefano Garzarella
2023-12-14 2:05 ` [RFC PATCH net-next v1 2/4] net: introduce abstraction for network memory Mina Almasry
2023-12-14 6:34 ` David Ahern
2023-12-16 2:51 ` Jakub Kicinski
2023-12-16 22:10 ` Mina Almasry
2023-12-17 1:45 ` David Ahern
2023-12-17 8:14 ` Mina Almasry
2023-12-18 22:06 ` Jakub Kicinski
2023-12-18 22:38 ` Mina Almasry
2023-12-19 17:27 ` Shakeel Butt
2023-12-14 2:05 ` [RFC PATCH net-next v1 3/4] net: add netmem_t to skb_frag_t Mina Almasry
2023-12-14 2:05 ` [RFC PATCH net-next v1 4/4] net: page_pool: use netmem_t instead of struct page in API Mina Almasry
2023-12-14 12:05 ` Yunsheng Lin
2023-12-14 16:27 ` Mina Almasry
2023-12-15 2:11 ` Shakeel Butt
2023-12-15 11:04 ` Yunsheng Lin
2023-12-15 16:47 ` Shakeel Butt
2023-12-16 3:01 ` Jakub Kicinski
2023-12-16 19:46 ` Shakeel Butt
2023-12-16 22:06 ` Mina Almasry
2023-12-20 3:01 ` Mina Almasry
2023-12-21 11:32 ` Yunsheng Lin
2023-12-21 21:22 ` Mina Almasry
2023-12-22 6:42 ` Yunsheng Lin
2024-01-02 16:14 ` Mina Almasry
2024-01-03 9:47 ` Yunsheng Lin
2024-01-03 18:38 ` Mina Almasry
2024-01-04 8:48 ` Yunsheng Lin
2024-01-04 18:24 ` Mina Almasry
2024-01-05 8:40 ` Yunsheng Lin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=894177e8-6403-e31c-e246-d9234218626b@huawei.com \
--to=linyunsheng@huawei.com \
--cc=Mark-MC.Lee@mediatek.com \
--cc=UNGLinuxDriver@microchip.com \
--cc=alexandre.torgue@foss.st.com \
--cc=almasrymina@google.com \
--cc=andrii@kernel.org \
--cc=angelogioacchino.delregno@collabora.com \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=doshir@vmware.com \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=hkelam@marvell.com \
--cc=horatiu.vultur@microchip.com \
--cc=hpa@zytor.com \
--cc=ilias.apalodimas@linaro.org \
--cc=jaswinder.singh@linaro.org \
--cc=jeroendb@google.com \
--cc=jesse.brandeburg@intel.com \
--cc=jgg@nvidia.com \
--cc=jgross@suse.com \
--cc=jiawenwu@trustnetic.com \
--cc=joabreu@synopsys.com \
--cc=john.fastabend@gmail.com \
--cc=john@phrozen.org \
--cc=jolsa@kernel.org \
--cc=justinstitt@google.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=kvalo@kernel.org \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-imx@nxp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=lorenzo@kernel.org \
--cc=martin.lau@linux.dev \
--cc=matthias.bgg@gmail.com \
--cc=mcoquelin.stm32@gmail.com \
--cc=mengyuanlou@net-swift.com \
--cc=mic@digikod.net \
--cc=michael.chan@broadcom.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=mw@semihalf.com \
--cc=nathan@kernel.org \
--cc=nbd@nbd.name \
--cc=ndesaulniers@google.com \
--cc=netdev@vger.kernel.org \
--cc=oleksandr_tyshchenko@epam.com \
--cc=pabeni@redhat.com \
--cc=pkaligineedi@google.com \
--cc=pv-drivers@vmware.com \
--cc=r-gunasekaran@ti.com \
--cc=rafael@kernel.org \
--cc=rogerq@kernel.org \
--cc=ryder.lee@mediatek.com \
--cc=s-vadapalli@ti.com \
--cc=saeedm@nvidia.com \
--cc=salil.mehta@huawei.com \
--cc=sbhatta@marvell.com \
--cc=sdf@google.com \
--cc=sean.wang@mediatek.com \
--cc=sgarzare@redhat.com \
--cc=sgoutham@marvell.com \
--cc=shailend@google.com \
--cc=shakeelb@google.com \
--cc=shayne.chen@mediatek.com \
--cc=shenwei.wang@nxp.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=sstabellini@kernel.org \
--cc=stefanha@redhat.com \
--cc=sumit.semwal@linaro.org \
--cc=tglx@linutronix.de \
--cc=thomas.petazzoni@bootlin.com \
--cc=wei.fang@nxp.com \
--cc=wei.liu@kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
--cc=x86@kernel.org \
--cc=xiaoning.wang@nxp.com \
--cc=yisen.zhuang@huawei.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox