public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Shakeel Butt" <shakeelb@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, "Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Michael Chan" <michael.chan@broadcom.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Wei Fang" <wei.fang@nxp.com>,
	"Shenwei Wang" <shenwei.wang@nxp.com>,
	"Clark Wang" <xiaoning.wang@nxp.com>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Shailend Chand" <shailend@google.com>,
	"Yisen Zhuang" <yisen.zhuang@huawei.com>,
	"Salil Mehta" <salil.mehta@huawei.com>,
	"Jesse Brandeburg" <jesse.brandeburg@intel.com>,
	"Tony Nguyen" <anthony.l.nguyen@intel.com>,
	"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
	"Marcin Wojtas" <mw@semihalf.com>,
	"Russell King" <linux@armlinux.org.uk>,
	"Sunil Goutham" <sgoutham@marvell.com>,
	"Geetha sowjanya" <gakula@marvell.com>,
	"Subbaraya Sundeep" <sbhatta@marvell.com>,
	hariprasad <hkelam@marvell.com>, "Felix Fietkau" <nbd@nbd.name>,
	"John Crispin" <john@phrozen.org>,
	"Sean Wang" <sean.wang@mediatek.com>,
	"Mark Lee" <Mark-MC.Lee@mediatek.com>,
	"Lorenzo Bianconi" <lorenzo@kernel.org>,
	"Matthias Brugger" <matthias.bgg@gmail.com>,
	"AngeloGioacchino Del Regno"
	<angelogioacchino.delregno@collabora.com>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Horatiu Vultur" <horatiu.vultur@microchip.com>,
	UNGLinuxDriver@microchip.com,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Jassi Brar" <jaswinder.singh@linaro.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Alexandre Torgue" <alexandre.torgue@foss.st.com>,
	"Jose Abreu" <joabreu@synopsys.com>,
	"Maxime Coquelin" <mcoquelin.stm32@gmail.com>,
	"Siddharth Vadapalli" <s-vadapalli@ti.com>,
	"Ravi Gunasekaran" <r-gunasekaran@ti.com>,
	"Roger Quadros" <rogerq@kernel.org>,
	"Jiawen Wu" <jiawenwu@trustnetic.com>,
	"Mengyuan Lou" <mengyuanlou@net-swift.com>,
	"Ronak Doshi" <doshir@vmware.com>,
	"VMware PV-Drivers Reviewers" <pv-drivers@vmware.com>,
	"Ryder Lee" <ryder.lee@mediatek.com>,
	"Shayne Chen" <shayne.chen@mediatek.com>,
	"Kalle Valo" <kvalo@kernel.org>,
	"Juergen Gross" <jgross@suse.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Oleksandr Tyshchenko" <oleksandr_tyshchenko@epam.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Mickaël Salaün" <mic@digikod.net>,
	"Nathan Chancellor" <nathan@kernel.org>,
	"Nick Desaulniers" <ndesaulniers@google.com>,
	"Bill Wendling" <morbo@google.com>,
	"Justin Stitt" <justinstitt@google.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>
Subject: Re: [RFC PATCH net-next v1 4/4] net: page_pool: use netmem_t instead of struct page in API
Date: Fri, 5 Jan 2024 16:40:15 +0800	[thread overview]
Message-ID: <894177e8-6403-e31c-e246-d9234218626b@huawei.com> (raw)
In-Reply-To: <CAHS8izMMdmWoUHetA=GceJWVBgrCNAutn+B4ErMZFG=gmF5rww@mail.gmail.com>

On 2024/1/5 2:24, Mina Almasry wrote:
> On Thu, Jan 4, 2024 at 12:48 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2024/1/4 2:38, Mina Almasry wrote:
>>> On Wed, Jan 3, 2024 at 1:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>>
>>>> On 2024/1/3 0:14, Mina Almasry wrote:
>>>>>
>>>>> The idea being that skb_frag_page() can return NULL if the frag is not
>>>>> paged, and the relevant callers are modified to handle that.
>>>>
>>>> There are many existing drivers which are not expecting NULL returning for
>>>> skb_frag_page() as those drivers are not supporting devmem, adding additionl
>>>> checking overhead in skb_frag_page() for those drivers does not make much
>>>> sense, IMHO, it may make more sense to introduce a new helper for the driver
>>>> supporting devmem or networking core that needing dealing with both normal
>>>> page and devmem.
>>>>
>>>> And we are also able to keep the old non-NULL returning semantic for
>>>> skb_frag_page().
>>>
>>> I think I'm seeing agreement that the direction we're heading into
>>> here is that most net stack & drivers should use the abstract netmem
>>
>> As far as I see, at least for the drivers, I don't think we have a clear
>> agreement if we should have a unified driver facing struct or API for both
>> normal page and devmem yet.
>>
> 
> To be honest I definitely read that we have agreement that we should
> have a unified driver facing struct from the responses in this thread
> like this one:
> 
> https://lore.kernel.org/netdev/20231215190126.1040fa12@kernel.org/

Which specific comment made you thinking as above?
I think it definitely need clarifying here, as I read it differently as
you did.

> 
> But I'll let folks correct me if I'm wrong.
> 
>>> type, and only specific code that needs a page or devmem (like
>>> tcp_receive_zerocopy or tcp_recvmsg_dmabuf) will be the ones that
>>> unpack the netmem and get the underlying page or devmem, using
>>> skb_frag_page() or something like skb_frag_dmabuf(), etc.
>>>
>>> As Jason says repeatedly, I'm not allowed to blindly cast a netmem to
>>> a page and assume netmem==page. Netmem can only be cast to a page
>>> after checking the low bits and verifying the netmem is actually a
>>
>> I thought it would be best to avoid casting a netmem or devmem to a
>> page in the driver, I think the main argument is that it is hard
>> to audit very single driver doing a checking before doing the casting
>> in the future? and we can do better auditting if the casting is limited
>> to a few core functions in the networking core.
>>
> 
> Correct, the drivers should never cast directly, but helpers like
> skb_frag_page() must check that the netmem is a page before doing a
> cast.
> 
>>> page. I think any suggestions that blindly cast a netmem to page
>>> without the checks will get nacked by Jason & Christian, so the
>>> checking in the specific cases where the code needs to know the
>>> underlying memory type seems necessary.
>>>
>>> IMO I'm not sure the checking is expensive. With likely/unlikely &
>>> static branches the checks should be very minimal or a straight no-op.
>>> For example in RFC v2 where we were doing a lot of checks for devmem
>>> (we don't do that anymore for RFCv5), I had run the page_pool perf
>>> tests and proved there is little to no perf regression:
>>
>> For MAX_SKB_FRAGS being 17, it means we may have 17 additional checking
>> overhead for the drivers not supporting devmem, not to mention we may
>> have bigger value for MAX_SKB_FRAGS if BIG TCP is enable.
>>
> 
> With static branch the checks should be complete no-ops unless the
> user's set up enabled devmem.

What if the user does set up enabled devmem and still want to enable
page_pool for normal page in the same system?

Is there a reason I don't know, which stops you from keeping the old
helper and introducing a new helper if it is needed for the new netmem
thing?

> 
>> Even there is no notiable performance degradation for a specific case,
>> we should avoid the overhead as much as possible for the existing use
>> case when supporting a new use case.
>>
>>>
>>> https://lore.kernel.org/netdev/CAHS8izM4w2UETAwfnV7w+ZzTMxLkz+FKO+xTgRdtYKzV8RzqXw@mail.gmail.com/
>>
>> The above test case does not even seems to be testing a code path calling
>> skb_frag_page() as my understanding.
>>
>>>
> 
> 
> 

      reply	other threads:[~2024-01-05  8:40 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14  2:05 [RFC PATCH net-next v1 0/4] Abstract page from net stack Mina Almasry
2023-12-14  2:05 ` [RFC PATCH net-next v1 1/4] vsock/virtio: use skb_frag_page() helper Mina Almasry
2023-12-14  6:27   ` David Ahern
2023-12-14  8:19   ` Stefano Garzarella
2023-12-14  2:05 ` [RFC PATCH net-next v1 2/4] net: introduce abstraction for network memory Mina Almasry
2023-12-14  6:34   ` David Ahern
2023-12-16  2:51   ` Jakub Kicinski
2023-12-16 22:10     ` Mina Almasry
2023-12-17  1:45       ` David Ahern
2023-12-17  8:14         ` Mina Almasry
2023-12-18 22:06           ` Jakub Kicinski
2023-12-18 22:38             ` Mina Almasry
2023-12-19 17:27               ` Shakeel Butt
2023-12-14  2:05 ` [RFC PATCH net-next v1 3/4] net: add netmem_t to skb_frag_t Mina Almasry
2023-12-14  2:05 ` [RFC PATCH net-next v1 4/4] net: page_pool: use netmem_t instead of struct page in API Mina Almasry
2023-12-14 12:05   ` Yunsheng Lin
2023-12-14 16:27     ` Mina Almasry
2023-12-15  2:11       ` Shakeel Butt
2023-12-15 11:04         ` Yunsheng Lin
2023-12-15 16:47           ` Shakeel Butt
2023-12-16  3:01         ` Jakub Kicinski
2023-12-16 19:46           ` Shakeel Butt
2023-12-16 22:06             ` Mina Almasry
2023-12-20  3:01               ` Mina Almasry
2023-12-21 11:32                 ` Yunsheng Lin
2023-12-21 21:22                   ` Mina Almasry
2023-12-22  6:42                     ` Yunsheng Lin
2024-01-02 16:14                       ` Mina Almasry
2024-01-03  9:47                         ` Yunsheng Lin
2024-01-03 18:38                           ` Mina Almasry
2024-01-04  8:48                             ` Yunsheng Lin
2024-01-04 18:24                               ` Mina Almasry
2024-01-05  8:40                                 ` Yunsheng Lin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=894177e8-6403-e31c-e246-d9234218626b@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=Mark-MC.Lee@mediatek.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.torgue@foss.st.com \
    --cc=almasrymina@google.com \
    --cc=andrii@kernel.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=doshir@vmware.com \
    --cc=edumazet@google.com \
    --cc=gakula@marvell.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=hkelam@marvell.com \
    --cc=horatiu.vultur@microchip.com \
    --cc=hpa@zytor.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jaswinder.singh@linaro.org \
    --cc=jeroendb@google.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jgross@suse.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joabreu@synopsys.com \
    --cc=john.fastabend@gmail.com \
    --cc=john@phrozen.org \
    --cc=jolsa@kernel.org \
    --cc=justinstitt@google.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kvalo@kernel.org \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lorenzo@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=matthias.bgg@gmail.com \
    --cc=mcoquelin.stm32@gmail.com \
    --cc=mengyuanlou@net-swift.com \
    --cc=mic@digikod.net \
    --cc=michael.chan@broadcom.com \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=mw@semihalf.com \
    --cc=nathan@kernel.org \
    --cc=nbd@nbd.name \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=pv-drivers@vmware.com \
    --cc=r-gunasekaran@ti.com \
    --cc=rafael@kernel.org \
    --cc=rogerq@kernel.org \
    --cc=ryder.lee@mediatek.com \
    --cc=s-vadapalli@ti.com \
    --cc=saeedm@nvidia.com \
    --cc=salil.mehta@huawei.com \
    --cc=sbhatta@marvell.com \
    --cc=sdf@google.com \
    --cc=sean.wang@mediatek.com \
    --cc=sgarzare@redhat.com \
    --cc=sgoutham@marvell.com \
    --cc=shailend@google.com \
    --cc=shakeelb@google.com \
    --cc=shayne.chen@mediatek.com \
    --cc=shenwei.wang@nxp.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=sstabellini@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=wei.fang@nxp.com \
    --cc=wei.liu@kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=x86@kernel.org \
    --cc=xiaoning.wang@nxp.com \
    --cc=yisen.zhuang@huawei.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox