From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
Yunsheng Lin <linyunsheng@huawei.com>
Cc: brouer@redhat.com, Jakub Kicinski <kuba@kernel.org>,
davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org,
Lorenzo Bianconi <lorenzo@kernel.org>,
Yisen Zhuang <yisen.zhuang@huawei.com>,
Salil Mehta <salil.mehta@huawei.com>,
Eric Dumazet <edumazet@google.com>,
Sunil Goutham <sgoutham@marvell.com>,
Geetha sowjanya <gakula@marvell.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
hariprasad <hkelam@marvell.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, Felix Fietkau <nbd@nbd.name>,
Ryder Lee <ryder.lee@mediatek.com>,
Shayne Chen <shayne.chen@mediatek.com>,
Sean Wang <sean.wang@mediatek.com>, Kalle Valo <kvalo@kernel.org>,
Matthias Brugger <matthias.bgg@gmail.com>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-mediatek@lists.infradead.org
Subject: Re: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag
Date: Fri, 16 Jun 2023 20:59:12 +0200 [thread overview]
Message-ID: <72ccf224-7b45-76c5-5ca9-83e25112c9c6@redhat.com> (raw)
In-Reply-To: <CAKgT0UeZfbxDYaeUntrQpxHmwCh6zy0dEpjxghiCNxPxv=kdoQ@mail.gmail.com>
On 16/06/2023 17.01, Alexander Duyck wrote:
> On Fri, Jun 16, 2023 at 5:21 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/6/16 2:26, Alexander Duyck wrote:
>>> On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote:
[...]
>>>>
>>>> I like your patches as they isolate the drivers from having to make the
>>>> fragmentation decisions based on the system page size (4k vs 64k but
>>>> we're hearing more and more about ARM w/ 16k pages). For that use case
>>>> this is great.
+1
[...]
>>>
>>> In the case of the standard page size being 4K a standard page would
>>> just have to take on the CPU overhead of the atomic_set and
>>> atomic_read for pp_ref_count (new name) which should be minimal as on
>>> most sane systems those just end up being a memory write and read.
>>
>> If I understand you correctly, I think what you are trying to do
>> may break some of Jesper' benchmarking:)
>>
>> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
>
> So? If it breaks an out-of-tree benchmark the benchmark can always be
> fixed.
It doesn't matter if this is out-of-tree (I should have upstreamed it
when AKPM asked me to.)
Point is don't break my page_pool fast-path!!! :-P
> The point is enabling a use case that can add value across the
> board instead of trying to force the community to support a niche use
> case.
I'm all for creating a new API, lets call it netmem, that takes care of
this use-case.
I'm *not* okay with this new API slowing down the page_pool fast-path.
Why not multiplex on a MEM_TYPE, like XDP_MEM_TYPE is prepared for?!?
Meaning the caller can choose which is the correct API call.
(thus, we can stay away from adding code to fast-path case)
See below, copy-paste of code that shows what I mean by multiplex on a
MEM_TYPE.
>
> Ideally we should get away from using the pages directly for most
> cases in page pool. In my mind the page pool should start operating
> more like __get_free_pages where what you get is a virtual address
> instead of the actual page. That way we could start abstracting it
> away and eventually get to something more like a true page_pool api
> instead of what feels like a set of add-ons for the page allocator.
Yes, I agree with Alex Duyck here.
Like when I looked at veth proposed changes, it also felt like a virtual
address would be better than a page.
addr = netmem_alloc(rq->page_pool, &truesize);
> Although at the end of the day this still feels more like we are just
> reimplementing slab so it is hard for me to say this is necessarily
> the best solution either.
Yes, we have to be careful not to re-implement the MM layer in network
land ;-)
(below code copy-paste broke whitespaces)
$ git show
commit fe38c642d629f8361f76b25aa8732e5e331d0925 (HEAD -> pp_rm_workqueue04)
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri Jun 16 20:54:08 2023 +0200
page_pool: code examplifying multiplexing on mem_type
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
diff --git a/include/net/xdp.h b/include/net/xdp.h
index d1c5381fc95f..c02ac82a1d79 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -42,6 +42,7 @@ enum xdp_mem_type {
MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */
MEM_TYPE_PAGE_POOL,
MEM_TYPE_XSK_BUFF_POOL,
+ MEM_TYPE_PP_NETMEM,
MEM_TYPE_MAX,
};
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index d03448a4c411..68be76efef00 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -353,7 +353,7 @@ static void page_pool_set_pp_info(struct page_pool
*pool,
struct page *page)
{
page->pp = pool;
- page->pp_magic |= PP_SIGNATURE;
+ page->pp_magic |= PP_SIGNATURE | (MEM_TYPE_PAGE_POOL << 8);
if (pool->p.init_callback)
pool->p.init_callback(page, pool->p.init_arg);
}
@@ -981,6 +981,7 @@ bool page_pool_return_skb_page(struct page *page,
bool napi_safe)
struct napi_struct *napi;
struct page_pool *pp;
bool allow_direct;
+ int mem_type;
page = compound_head(page);
@@ -991,9 +992,10 @@ bool page_pool_return_skb_page(struct page *page,
bool napi_safe)
* and page_is_pfmemalloc() is checked in __page_pool_put_page()
* to avoid recycling the pfmemalloc page.
*/
- if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
+ if (unlikely((page->pp_magic & ~0xF03UL) != PP_SIGNATURE))
return false;
+ mem_type = (page->pp_magic & 0xF00) >> 8;
pp = page->pp;
/* Allow direct recycle if we have reasons to believe that we are
@@ -1009,7 +1011,10 @@ bool page_pool_return_skb_page(struct page *page,
bool napi_safe)
* The page will be returned to the pool here regardless of the
* 'flipped' fragment being in use or not.
*/
- page_pool_put_full_page(pp, page, allow_direct);
+ if (mem_type == MEM_TYPE_PP_NETMEM)
+ pp_netmem_put_page(pp, page, allow_direct);
+ else
+ page_pool_put_full_page(pp, page, allow_direct);
return true;
}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 41e5ca8643ec..dc4bfbe8f002 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -380,6 +380,11 @@ void __xdp_return(void *data, struct xdp_mem_info
*mem, bool napi_direct,
struct page *page;
switch (mem->type) {
+ case MEM_TYPE_PP_NETMEM:
+ if (napi_direct && xdp_return_frame_no_direct())
+ napi_direct = false;
+ pp_netmem_put(page->pp, data, napi_direct);
+ break;
case MEM_TYPE_PAGE_POOL:
page = virt_to_head_page(data);
if (napi_direct && xdp_return_frame_no_direct())
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-06-16 18:59 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230612130256.4572-1-linyunsheng@huawei.com>
[not found] ` <20230612130256.4572-5-linyunsheng@huawei.com>
2023-06-14 17:19 ` [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag Jakub Kicinski
2023-06-15 7:17 ` Yunsheng Lin
2023-06-15 16:51 ` Jakub Kicinski
2023-06-15 18:26 ` Alexander Duyck
2023-06-16 12:20 ` Yunsheng Lin
2023-06-16 15:01 ` Alexander Duyck
2023-06-16 18:59 ` Jesper Dangaard Brouer [this message]
2023-06-16 19:21 ` Jakub Kicinski
2023-06-16 20:42 ` Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag) Jesper Dangaard Brouer
2023-06-19 18:07 ` Jakub Kicinski
2023-06-20 15:12 ` Jesper Dangaard Brouer
2023-06-20 15:39 ` Jakub Kicinski
2023-06-30 2:27 ` Mina Almasry
2023-07-03 4:20 ` David Ahern
2023-07-03 6:22 ` Mina Almasry
2023-07-03 14:45 ` David Ahern
2023-07-03 17:13 ` Eric Dumazet
2023-07-03 17:23 ` David Ahern
2023-07-06 1:19 ` Mina Almasry
2023-07-03 17:15 ` Eric Dumazet
2023-07-03 17:25 ` David Ahern
2023-07-03 21:43 ` Jason Gunthorpe
2023-07-06 1:17 ` Mina Almasry
2023-07-10 17:44 ` Jason Gunthorpe
2023-07-10 23:02 ` Mina Almasry
2023-07-10 23:49 ` Jason Gunthorpe
2023-07-11 0:45 ` Mina Almasry
2023-07-11 13:11 ` Jason Gunthorpe
2023-07-11 17:24 ` Mina Almasry
2023-07-11 4:27 ` Christoph Hellwig
2023-07-11 4:59 ` Jakub Kicinski
2023-07-11 5:04 ` Christoph Hellwig
2023-07-11 12:05 ` Jason Gunthorpe
2023-07-11 16:00 ` Jakub Kicinski
2023-07-11 16:20 ` David Ahern
2023-07-11 16:32 ` Jakub Kicinski
2023-07-11 17:06 ` Mina Almasry
2023-07-11 20:39 ` Jakub Kicinski
2023-07-11 21:39 ` David Ahern
2023-07-12 3:42 ` Mina Almasry
2023-07-12 7:55 ` Christian König
2023-07-12 13:03 ` Jason Gunthorpe
2023-07-12 13:35 ` Christian König
2023-07-12 22:41 ` Mina Almasry
2023-07-12 13:01 ` Jason Gunthorpe
2023-07-12 20:16 ` Mina Almasry
2023-07-12 23:57 ` Jason Gunthorpe
2023-07-13 7:56 ` Christian König
2023-07-14 14:55 ` Mina Almasry
2023-07-14 15:18 ` David Ahern
2023-07-17 2:05 ` Mina Almasry
2023-07-17 3:08 ` David Ahern
2023-07-14 15:55 ` Jason Gunthorpe
2023-07-17 1:53 ` Mina Almasry
2023-07-11 16:42 ` Jason Gunthorpe
2023-07-11 17:06 ` Jakub Kicinski
2023-07-11 18:52 ` Jason Gunthorpe
2023-07-11 20:34 ` Jakub Kicinski
2023-07-11 23:56 ` Jason Gunthorpe
2023-07-11 6:52 ` Dan Williams
2023-07-06 16:50 ` Jakub Kicinski
2023-06-17 12:19 ` [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag Yunsheng Lin
2023-06-15 13:59 ` Alexander Lobakin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=72ccf224-7b45-76c5-5ca9-83e25112c9c6@redhat.com \
--to=jbrouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=angelogioacchino.delregno@collabora.com \
--cc=brouer@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=hawk@kernel.org \
--cc=hkelam@marvell.com \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=kvalo@kernel.org \
--cc=leon@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=lorenzo@kernel.org \
--cc=matthias.bgg@gmail.com \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=ryder.lee@mediatek.com \
--cc=saeedm@nvidia.com \
--cc=salil.mehta@huawei.com \
--cc=sbhatta@marvell.com \
--cc=sean.wang@mediatek.com \
--cc=sgoutham@marvell.com \
--cc=shayne.chen@mediatek.com \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).