* [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
@ 2023-05-12 13:08 Lorenzo Bianconi
2023-05-12 13:43 ` Alexander Lobakin
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Lorenzo Bianconi @ 2023-05-12 13:08 UTC (permalink / raw)
To: netdev
Cc: lorenzo.bianconi, bpf, davem, edumazet, kuba, pabeni, ast, daniel,
hawk, john.fastabend, linyunsheng
In order to reduce page_pool memory footprint, rely on
page_pool_dev_alloc_frag routine and reduce buffer size
(VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
(XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
supported MTU is now reduced to 36350B.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 614f3e3efab0..0e648703cccf 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -31,9 +31,12 @@
#define DRV_NAME "veth"
#define DRV_VERSION "1.0"
-#define VETH_XDP_FLAG BIT(0)
-#define VETH_RING_SIZE 256
-#define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+#define VETH_XDP_FLAG BIT(0)
+#define VETH_RING_SIZE 256
+#define VETH_XDP_PACKET_HEADROOM 192
+#define VETH_XDP_HEADROOM (VETH_XDP_PACKET_HEADROOM + \
+ NET_IP_ALIGN)
+#define VETH_PAGE_POOL_FRAG_SIZE 2048
#define VETH_XDP_TX_BULK_SIZE 16
#define VETH_XDP_BATCH 16
@@ -736,7 +739,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
if (skb_shared(skb) || skb_head_is_locked(skb) ||
skb_shinfo(skb)->nr_frags ||
skb_headroom(skb) < XDP_PACKET_HEADROOM) {
- u32 size, len, max_head_size, off;
+ u32 size, len, max_head_size, off, pp_off;
struct sk_buff *nskb;
struct page *page;
int i, head_off;
@@ -747,17 +750,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
*
* Make sure we have enough space for linear and paged area
*/
- max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE -
+ max_head_size = SKB_WITH_OVERHEAD(VETH_PAGE_POOL_FRAG_SIZE -
VETH_XDP_HEADROOM);
- if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
+ if (skb->len >
+ VETH_PAGE_POOL_FRAG_SIZE * MAX_SKB_FRAGS + max_head_size)
goto drop;
/* Allocate skb head */
- page = page_pool_dev_alloc_pages(rq->page_pool);
+ page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
+ VETH_PAGE_POOL_FRAG_SIZE);
if (!page)
goto drop;
- nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+ nskb = napi_build_skb(page_address(page) + pp_off,
+ VETH_PAGE_POOL_FRAG_SIZE);
if (!nskb) {
page_pool_put_full_page(rq->page_pool, page, true);
goto drop;
@@ -782,15 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
len = skb->len - off;
for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
- page = page_pool_dev_alloc_pages(rq->page_pool);
+ page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
+ VETH_PAGE_POOL_FRAG_SIZE);
if (!page) {
consume_skb(nskb);
goto drop;
}
- size = min_t(u32, len, PAGE_SIZE);
- skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
- if (skb_copy_bits(skb, off, page_address(page),
+ size = min_t(u32, len, VETH_PAGE_POOL_FRAG_SIZE);
+ skb_add_rx_frag(nskb, i, page, pp_off, size,
+ VETH_PAGE_POOL_FRAG_SIZE);
+ if (skb_copy_bits(skb, off,
+ page_address(page) + pp_off,
size)) {
consume_skb(nskb);
goto drop;
@@ -1035,6 +1044,7 @@ static int veth_create_page_pool(struct veth_rq *rq)
struct page_pool_params pp_params = {
.order = 0,
.pool_size = VETH_RING_SIZE,
+ .flags = PP_FLAG_PAGE_FRAG,
.nid = NUMA_NO_NODE,
.dev = &rq->dev->dev,
};
@@ -1634,13 +1644,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
goto err;
}
- max_mtu = SKB_WITH_OVERHEAD(PAGE_SIZE - VETH_XDP_HEADROOM) -
+ max_mtu = SKB_WITH_OVERHEAD(VETH_PAGE_POOL_FRAG_SIZE -
+ VETH_XDP_HEADROOM) -
peer->hard_header_len;
/* Allow increasing the max_mtu if the program supports
* XDP fragments.
*/
if (prog->aux->xdp_has_frags)
- max_mtu += PAGE_SIZE * MAX_SKB_FRAGS;
+ max_mtu += VETH_PAGE_POOL_FRAG_SIZE * MAX_SKB_FRAGS;
if (peer->mtu > max_mtu) {
NL_SET_ERR_MSG_MOD(extack, "Peer MTU is too large to set XDP");
--
2.40.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-12 13:08 [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer Lorenzo Bianconi
@ 2023-05-12 13:43 ` Alexander Lobakin
2023-05-12 14:14 ` Lorenzo Bianconi
2023-05-15 11:10 ` Yunsheng Lin
2023-05-16 16:11 ` Jesper Dangaard Brouer
2 siblings, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2023-05-12 13:43 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: netdev, lorenzo.bianconi, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend, linyunsheng
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Fri, 12 May 2023 15:08:13 +0200
> In order to reduce page_pool memory footprint, rely on
> page_pool_dev_alloc_frag routine and reduce buffer size
> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> supported MTU is now reduced to 36350B.
I thought we're stepping away from page splitting bit by bit O_o
Primarily for the reasons you mentioned / worked around here: it creates
several significant limitations and at least on 64-bit systems it
doesn't scale anymore. 192 bytes of headroom is less than what XDP
expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
reallocates heads only to have 256+ there?), 384 bytes of shinfo can
change anytime and even now page split simply blocks you from increasing
MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
BTW Intel drivers suffer from the very same things due solely to page
split (and I'm almost done with converting at least some of them to Page
Pool and 1 page per buffer model), I don't recommend deliberately
falling into that pit =\ :D
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
> 1 file changed, 25 insertions(+), 14 deletions(-)
[...]
Thanks,
Olek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-12 13:43 ` Alexander Lobakin
@ 2023-05-12 14:14 ` Lorenzo Bianconi
2023-05-15 16:36 ` Alexander Lobakin
0 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Bianconi @ 2023-05-12 14:14 UTC (permalink / raw)
To: Alexander Lobakin
Cc: netdev, lorenzo.bianconi, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend, linyunsheng
[-- Attachment #1: Type: text/plain, Size: 3510 bytes --]
> From: Lorenzo Bianconi <lorenzo@kernel.org>
> Date: Fri, 12 May 2023 15:08:13 +0200
>
> > In order to reduce page_pool memory footprint, rely on
> > page_pool_dev_alloc_frag routine and reduce buffer size
> > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > supported MTU is now reduced to 36350B.
>
> I thought we're stepping away from page splitting bit by bit O_o
do you mean to driver private page_split implementation? AFAIK we are not
stepping away from page_pool page split implementation (or maybe I missed it :))
> Primarily for the reasons you mentioned / worked around here: it creates
> several significant limitations and at least on 64-bit systems it
> doesn't scale anymore. 192 bytes of headroom is less than what XDP
> expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
> reallocates heads only to have 256+ there?), 384 bytes of shinfo can
> change anytime and even now page split simply blocks you from increasing
> MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
> BTW Intel drivers suffer from the very same things due solely to page
> split (and I'm almost done with converting at least some of them to Page
> Pool and 1 page per buffer model), I don't recommend deliberately
> falling into that pit =\ :D
I am not sure about the 192 vs 256 bytes of headroom (this is why I sent this
patch as RFC, my main goal is to discuss about this requirement). In the
previous discussion [0] we deferred this implementation since if we do not
reduce requested xdp headroom, we will not be able to fit two 1500B frames
into a single page (for skb_shared_info size [1]) and we introduce a performance
penalty.
Regards,
Lorenzo
[0] https://lore.kernel.org/netdev/6298f73f7cc7391c7c4a52a6a89b1ae21488bda1.1682188837.git.lorenzo@kernel.org/
[1] $ pahole -C skb_shared_info vmlinux.o
struct skb_shared_info {
__u8 flags; /* 0 1 */
__u8 meta_len; /* 1 1 */
__u8 nr_frags; /* 2 1 */
__u8 tx_flags; /* 3 1 */
unsigned short gso_size; /* 4 2 */
unsigned short gso_segs; /* 6 2 */
struct sk_buff * frag_list; /* 8 8 */
struct skb_shared_hwtstamps hwtstamps; /* 16 8 */
unsigned int gso_type; /* 24 4 */
u32 tskey; /* 28 4 */
atomic_t dataref; /* 32 4 */
unsigned int xdp_frags_size; /* 36 4 */
void * destructor_arg; /* 40 8 */
skb_frag_t frags[17]; /* 48 272 */
/* size: 320, cachelines: 5, members: 14 */
};
>
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> > drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
> > 1 file changed, 25 insertions(+), 14 deletions(-)
> [...]
>
> Thanks,
> Olek
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-12 13:08 [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer Lorenzo Bianconi
2023-05-12 13:43 ` Alexander Lobakin
@ 2023-05-15 11:10 ` Yunsheng Lin
2023-05-15 11:24 ` Lorenzo Bianconi
2023-05-16 16:11 ` Jesper Dangaard Brouer
2 siblings, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2023-05-15 11:10 UTC (permalink / raw)
To: Lorenzo Bianconi, netdev
Cc: lorenzo.bianconi, bpf, davem, edumazet, kuba, pabeni, ast, daniel,
hawk, john.fastabend
On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> In order to reduce page_pool memory footprint, rely on
> page_pool_dev_alloc_frag routine and reduce buffer size
> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
Is there any performance improvement beside the memory saving? As it
should reduce TLB miss, I wonder if the TLB miss reducing can even
out the cost of the extra frag reference count handling for the
frag support?
> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> supported MTU is now reduced to 36350B.
Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
and use different frag size depending on the mtu or packet size?
Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
page if the requested frag size is larger than a specified size too.
I will try to implement it if the above idea makes sense.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-15 11:10 ` Yunsheng Lin
@ 2023-05-15 11:24 ` Lorenzo Bianconi
2023-05-15 13:11 ` Maciej Fijalkowski
2023-05-16 12:55 ` Yunsheng Lin
0 siblings, 2 replies; 14+ messages in thread
From: Lorenzo Bianconi @ 2023-05-15 11:24 UTC (permalink / raw)
To: Yunsheng Lin
Cc: Lorenzo Bianconi, netdev, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend
[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]
> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> > In order to reduce page_pool memory footprint, rely on
> > page_pool_dev_alloc_frag routine and reduce buffer size
> > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>
> Is there any performance improvement beside the memory saving? As it
> should reduce TLB miss, I wonder if the TLB miss reducing can even
> out the cost of the extra frag reference count handling for the
> frag support?
reducing the requested headroom to 192 (from 256) we have a nice improvement in
the 1500B frame case while it is mostly the same in the case of paged skb
(e.g. MTU 8000B).
>
> > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > supported MTU is now reduced to 36350B.
>
> Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
> and use different frag size depending on the mtu or packet size?
>
> Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
> page if the requested frag size is larger than a specified size too.
> I will try to implement it if the above idea makes sense.
>
since there are no significant differences between full page and fragmented page
implementation if the MTU is over the page boundary, does it worth to do so?
(at least for the veth use-case).
Regards,
Lorenzo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-15 11:24 ` Lorenzo Bianconi
@ 2023-05-15 13:11 ` Maciej Fijalkowski
2023-05-16 22:52 ` Lorenzo Bianconi
2023-05-16 12:55 ` Yunsheng Lin
1 sibling, 1 reply; 14+ messages in thread
From: Maciej Fijalkowski @ 2023-05-15 13:11 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: Yunsheng Lin, Lorenzo Bianconi, netdev, bpf, davem, edumazet,
kuba, pabeni, ast, daniel, hawk, john.fastabend
On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
> > On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> > > In order to reduce page_pool memory footprint, rely on
> > > page_pool_dev_alloc_frag routine and reduce buffer size
> > > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> >
> > Is there any performance improvement beside the memory saving? As it
> > should reduce TLB miss, I wonder if the TLB miss reducing can even
> > out the cost of the extra frag reference count handling for the
> > frag support?
>
> reducing the requested headroom to 192 (from 256) we have a nice improvement in
> the 1500B frame case while it is mostly the same in the case of paged skb
> (e.g. MTU 8000B).
Can you define 'nice improvement' ? ;)
Show us numbers or improvement in %.
>
> >
> > > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > > supported MTU is now reduced to 36350B.
> >
> > Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
> > and use different frag size depending on the mtu or packet size?
> >
> > Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
> > page if the requested frag size is larger than a specified size too.
> > I will try to implement it if the above idea makes sense.
> >
>
> since there are no significant differences between full page and fragmented page
> implementation if the MTU is over the page boundary, does it worth to do so?
> (at least for the veth use-case).
>
> Regards,
> Lorenzo
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-12 14:14 ` Lorenzo Bianconi
@ 2023-05-15 16:36 ` Alexander Lobakin
0 siblings, 0 replies; 14+ messages in thread
From: Alexander Lobakin @ 2023-05-15 16:36 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: netdev, lorenzo.bianconi, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend, linyunsheng
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Fri, 12 May 2023 16:14:48 +0200
>> From: Lorenzo Bianconi <lorenzo@kernel.org>
>> Date: Fri, 12 May 2023 15:08:13 +0200
>>
>>> In order to reduce page_pool memory footprint, rely on
>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
>>> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
>>> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
>>> supported MTU is now reduced to 36350B.
>>
>> I thought we're stepping away from page splitting bit by bit O_o
>
> do you mean to driver private page_split implementation? AFAIK we are not
> stepping away from page_pool page split implementation (or maybe I missed it :))
Page split in general. Since early-mid 2021, Page Pool with 1 page per
frame shows results comparable to drivers doing page split, but it
doesn't have such MTU / headroom / ... limitations.
>
>> Primarily for the reasons you mentioned / worked around here: it creates
>> several significant limitations and at least on 64-bit systems it
>> doesn't scale anymore. 192 bytes of headroom is less than what XDP
>> expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
>> reallocates heads only to have 256+ there?), 384 bytes of shinfo can
>> change anytime and even now page split simply blocks you from increasing
>> MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
>> BTW Intel drivers suffer from the very same things due solely to page
>> split (and I'm almost done with converting at least some of them to Page
>> Pool and 1 page per buffer model), I don't recommend deliberately
>> falling into that pit =\ :D
>
> I am not sure about the 192 vs 256 bytes of headroom (this is why I sent this
> patch as RFC, my main goal is to discuss about this requirement). In the
> previous discussion [0] we deferred this implementation since if we do not
> reduce requested xdp headroom, we will not be able to fit two 1500B frames
> into a single page (for skb_shared_info size [1]) and we introduce a performance
> penalty.
Yes, that's what I'm talking about. In order to fit two 1500-byte frames
onto one page on x86_64, you need to have at most 192 bytes of headroom
(192 + 320 of shinfo = 512 per frame / 1024 per page), but XDP requires
256. And then you have one more problem from the other side, I mean
shinfo size. It can change anytime because it's not UAPI or ABI or
whatever and nobody can say "hey, don't touch it, you break my page
split", at the same time with page splitting you're not able to increase
MAX_SKB_FRAGS.
And for MTU > 1536 this is all worthless, just a waste of cycles. With 1
page per frame you can have up to 3.5k per fragment.
You mentioned memory footprint. Do you have any exact numbers to show
this can help significantly?
Because I have PP on my home router with 16k-sized pages and 128 Mb of
RAM and there's no memory shortage there :D I realize it doesn't mean
anything and I'm mostly joking mentioning this, but still.
>
> Regards,
> Lorenzo
>
> [0] https://lore.kernel.org/netdev/6298f73f7cc7391c7c4a52a6a89b1ae21488bda1.1682188837.git.lorenzo@kernel.org/
> [1] $ pahole -C skb_shared_info vmlinux.o
> struct skb_shared_info {
> __u8 flags; /* 0 1 */
> __u8 meta_len; /* 1 1 */
> __u8 nr_frags; /* 2 1 */
> __u8 tx_flags; /* 3 1 */
> unsigned short gso_size; /* 4 2 */
> unsigned short gso_segs; /* 6 2 */
> struct sk_buff * frag_list; /* 8 8 */
> struct skb_shared_hwtstamps hwtstamps; /* 16 8 */
> unsigned int gso_type; /* 24 4 */
> u32 tskey; /* 28 4 */
> atomic_t dataref; /* 32 4 */
> unsigned int xdp_frags_size; /* 36 4 */
> void * destructor_arg; /* 40 8 */
> skb_frag_t frags[17]; /* 48 272 */
>
> /* size: 320, cachelines: 5, members: 14 */
> };
>
>>
>>>
>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>> ---
>>> drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
>>> 1 file changed, 25 insertions(+), 14 deletions(-)
>> [...]
>>
>> Thanks,
>> Olek
Thanks,
Olek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-15 11:24 ` Lorenzo Bianconi
2023-05-15 13:11 ` Maciej Fijalkowski
@ 2023-05-16 12:55 ` Yunsheng Lin
1 sibling, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-05-16 12:55 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: Lorenzo Bianconi, netdev, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend
On 2023/5/15 19:24, Lorenzo Bianconi wrote:
>> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
>>> In order to reduce page_pool memory footprint, rely on
>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>
>> Is there any performance improvement beside the memory saving? As it
>> should reduce TLB miss, I wonder if the TLB miss reducing can even
>> out the cost of the extra frag reference count handling for the
>> frag support?
>
> reducing the requested headroom to 192 (from 256) we have a nice improvement in
> the 1500B frame case while it is mostly the same in the case of paged skb
> (e.g. MTU 8000B).
>
>>
>>> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
>>> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
>>> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
>>> supported MTU is now reduced to 36350B.
>>
>> Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
>> and use different frag size depending on the mtu or packet size?
>>
>> Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
>> page if the requested frag size is larger than a specified size too.
>> I will try to implement it if the above idea makes sense.
>>
>
> since there are no significant differences between full page and fragmented page
> implementation if the MTU is over the page boundary, does it worth to do so?
> (at least for the veth use-case).
Yes, as there is no significant differences between full page and fragmented page
implementation, unifying the interface is trivial, I have sent a RFC for that.
Using that interface may solve the supported mtu reducing problem at least.
>
> Regards,
> Lorenzo
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-12 13:08 [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer Lorenzo Bianconi
2023-05-12 13:43 ` Alexander Lobakin
2023-05-15 11:10 ` Yunsheng Lin
@ 2023-05-16 16:11 ` Jesper Dangaard Brouer
2 siblings, 0 replies; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-16 16:11 UTC (permalink / raw)
To: Lorenzo Bianconi, netdev
Cc: brouer, lorenzo.bianconi, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend, linyunsheng, Alexander Duyck,
Eric Dumazet
Cc. Alex Duyck + Eric, please criticize my idea below.
On 12/05/2023 15.08, Lorenzo Bianconi wrote:
> In order to reduce page_pool memory footprint, rely on
> page_pool_dev_alloc_frag routine and reduce buffer size
> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> supported MTU is now reduced to 36350B.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
> 1 file changed, 25 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 614f3e3efab0..0e648703cccf 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -31,9 +31,12 @@
> #define DRV_NAME "veth"
> #define DRV_VERSION "1.0"
>
> -#define VETH_XDP_FLAG BIT(0)
> -#define VETH_RING_SIZE 256
> -#define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
> +#define VETH_XDP_FLAG BIT(0)
> +#define VETH_RING_SIZE 256
> +#define VETH_XDP_PACKET_HEADROOM 192
> +#define VETH_XDP_HEADROOM (VETH_XDP_PACKET_HEADROOM + \
> + NET_IP_ALIGN)
> +#define VETH_PAGE_POOL_FRAG_SIZE 2048
>
> #define VETH_XDP_TX_BULK_SIZE 16
> #define VETH_XDP_BATCH 16
> @@ -736,7 +739,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> if (skb_shared(skb) || skb_head_is_locked(skb) ||
> skb_shinfo(skb)->nr_frags ||
> skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> - u32 size, len, max_head_size, off;
> + u32 size, len, max_head_size, off, pp_off;
> struct sk_buff *nskb;
> struct page *page;
> int i, head_off;
> @@ -747,17 +750,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> *
> * Make sure we have enough space for linear and paged area
> */
> - max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE -
> + max_head_size = SKB_WITH_OVERHEAD(VETH_PAGE_POOL_FRAG_SIZE -
> VETH_XDP_HEADROOM);
> - if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
> + if (skb->len >
> + VETH_PAGE_POOL_FRAG_SIZE * MAX_SKB_FRAGS + max_head_size)
> goto drop;
>
> /* Allocate skb head */
> - page = page_pool_dev_alloc_pages(rq->page_pool);
It seems wasteful to allocate a full page PAGE_SIZE.
> + page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
> + VETH_PAGE_POOL_FRAG_SIZE);
Allocating PAGE_SIZE/2 isn't much better.
At this point we already know the skb->len (and skb_headlen).
Why don't we allocated the size that we need?
See page_frag_alloc() system invented by Eric and Duyck.
> if (!page)
> goto drop;
>
> - nskb = napi_build_skb(page_address(page), PAGE_SIZE);
> + nskb = napi_build_skb(page_address(page) + pp_off,
> + VETH_PAGE_POOL_FRAG_SIZE);
> if (!nskb) {
> page_pool_put_full_page(rq->page_pool, page, true);
> goto drop;
> @@ -782,15 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> len = skb->len - off;
>
> for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
> - page = page_pool_dev_alloc_pages(rq->page_pool);
> + page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
> + VETH_PAGE_POOL_FRAG_SIZE);
> if (!page) {
> consume_skb(nskb);
> goto drop;
> }
>
> - size = min_t(u32, len, PAGE_SIZE);
> - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
> - if (skb_copy_bits(skb, off, page_address(page),
> + size = min_t(u32, len, VETH_PAGE_POOL_FRAG_SIZE);
> + skb_add_rx_frag(nskb, i, page, pp_off, size,
> + VETH_PAGE_POOL_FRAG_SIZE);
> + if (skb_copy_bits(skb, off,
> + page_address(page) + pp_off,
> size)) {
> consume_skb(nskb);
> goto drop;
[...]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-15 13:11 ` Maciej Fijalkowski
@ 2023-05-16 22:52 ` Lorenzo Bianconi
2023-05-17 9:41 ` Yunsheng Lin
2023-05-17 14:58 ` Jakub Kicinski
0 siblings, 2 replies; 14+ messages in thread
From: Lorenzo Bianconi @ 2023-05-16 22:52 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Lorenzo Bianconi, Yunsheng Lin, netdev, bpf, davem, edumazet,
kuba, pabeni, ast, daniel, hawk, john.fastabend
[-- Attachment #1: Type: text/plain, Size: 2740 bytes --]
> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
> > > On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> > > > In order to reduce page_pool memory footprint, rely on
> > > > page_pool_dev_alloc_frag routine and reduce buffer size
> > > > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> > >
> > > Is there any performance improvement beside the memory saving? As it
> > > should reduce TLB miss, I wonder if the TLB miss reducing can even
> > > out the cost of the extra frag reference count handling for the
> > > frag support?
> >
> > reducing the requested headroom to 192 (from 256) we have a nice improvement in
> > the 1500B frame case while it is mostly the same in the case of paged skb
> > (e.g. MTU 8000B).
>
> Can you define 'nice improvement' ? ;)
> Show us numbers or improvement in %.
I am testing this RFC patch in the scenario reported below:
iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
- 6.4.0-rc1 net-next:
MTU 1500B: ~ 7.07 Gbps
MTU 8000B: ~ 14.7 Gbps
- 6.4.0-rc1 net-next + page_pool frag support in veth:
MTU 1500B: ~ 8.57 Gbps
MTU 8000B: ~ 14.5 Gbps
side note: it seems there is a regression between 6.2.15 and 6.4.0-rc1 net-next
(even without latest veth page_pool patches) in the throughput I can get in the
scenario above, but I have not looked into it yet.
- 6.2.15:
MTU 1500B: ~ 7.91 Gbps
MTU 8000B: ~ 14.1 Gbps
- 6.4.0-rc1 net-next w/o commits [0],[1],[2]
MTU 1500B: ~ 6.38 Gbps
MTU 8000B: ~ 13.2 Gbps
Regards,
Lorenzo
[0] 0ebab78cbcbf net: veth: add page_pool for page recycling
[1] 4fc418053ec7 net: veth: add page_pool stats
[2] 9d142ed484a3 net: veth: rely on napi_build_skb in veth_convert_skb_to_xdp_buff
>
> >
> > >
> > > > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > > > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > > > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > > > supported MTU is now reduced to 36350B.
> > >
> > > Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
> > > and use different frag size depending on the mtu or packet size?
> > >
> > > Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
> > > page if the requested frag size is larger than a specified size too.
> > > I will try to implement it if the above idea makes sense.
> > >
> >
> > since there are no significant differences between full page and fragmented page
> > implementation if the MTU is over the page boundary, does it worth to do so?
> > (at least for the veth use-case).
> >
> > Regards,
> > Lorenzo
> >
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-16 22:52 ` Lorenzo Bianconi
@ 2023-05-17 9:41 ` Yunsheng Lin
2023-05-17 14:17 ` Lorenzo Bianconi
2023-05-17 14:58 ` Jakub Kicinski
1 sibling, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2023-05-17 9:41 UTC (permalink / raw)
To: Lorenzo Bianconi, Maciej Fijalkowski
Cc: Lorenzo Bianconi, netdev, bpf, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend
On 2023/5/17 6:52, Lorenzo Bianconi wrote:
>> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
>>>> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
>>>>> In order to reduce page_pool memory footprint, rely on
>>>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>>>
>>>> Is there any performance improvement beside the memory saving? As it
>>>> should reduce TLB miss, I wonder if the TLB miss reducing can even
>>>> out the cost of the extra frag reference count handling for the
>>>> frag support?
>>>
>>> reducing the requested headroom to 192 (from 256) we have a nice improvement in
>>> the 1500B frame case while it is mostly the same in the case of paged skb
>>> (e.g. MTU 8000B).
>>
>> Can you define 'nice improvement' ? ;)
>> Show us numbers or improvement in %.
>
> I am testing this RFC patch in the scenario reported below:
>
> iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
>
> - 6.4.0-rc1 net-next:
> MTU 1500B: ~ 7.07 Gbps
> MTU 8000B: ~ 14.7 Gbps
>
> - 6.4.0-rc1 net-next + page_pool frag support in veth:
> MTU 1500B: ~ 8.57 Gbps
> MTU 8000B: ~ 14.5 Gbps
>
Thanks for sharing the data.
Maybe using the new frag interface introduced in [1] bring
back the performance for the MTU 8000B case.
1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@huawei.com/
I drafted a patch for veth to use the new frag interface, maybe that
will show how veth can make use of it. Would you give it a try to see
if there is any performance improvment for MTU 8000B case? Thanks.
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
skb_shinfo(skb)->nr_frags ||
skb_headroom(skb) < XDP_PACKET_HEADROOM) {
u32 size, len, max_head_size, off;
+ struct page_pool_frag *pp_frag;
struct sk_buff *nskb;
- struct page *page;
int i, head_off;
/* We need a private copy of the skb and data buffers since
@@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
goto drop;
+ size = min_t(u32, skb->len, max_head_size);
+ size += VETH_XDP_HEADROOM;
+ size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
/* Allocate skb head */
- page = page_pool_dev_alloc_pages(rq->page_pool);
- if (!page)
+ pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
+ if (!pp_frag)
goto drop;
- nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+ nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
+ pp_frag->truesize);
if (!nskb) {
- page_pool_put_full_page(rq->page_pool, page, true);
+ page_pool_put_full_page(rq->page_pool, pp_frag->page,
+ true);
goto drop;
}
@@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
len = skb->len - off;
for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
- page = page_pool_dev_alloc_pages(rq->page_pool);
- if (!page) {
+ size = min_t(u32, len, PAGE_SIZE);
+
+ pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
+ if (!pp_frag) {
consume_skb(nskb);
goto drop;
}
- size = min_t(u32, len, PAGE_SIZE);
- skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
- if (skb_copy_bits(skb, off, page_address(page),
- size)) {
+ skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
+ size, pp_frag->truesize);
+ if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
+ pp_frag->offset, size)) {
consume_skb(nskb);
goto drop;
}
@@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
return err;
}
+ page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
+
return 0;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-17 9:41 ` Yunsheng Lin
@ 2023-05-17 14:17 ` Lorenzo Bianconi
2023-05-18 1:16 ` Yunsheng Lin
0 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Bianconi @ 2023-05-17 14:17 UTC (permalink / raw)
To: Yunsheng Lin
Cc: Lorenzo Bianconi, Maciej Fijalkowski, netdev, bpf, davem,
edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend
[-- Attachment #1: Type: text/plain, Size: 5810 bytes --]
> On 2023/5/17 6:52, Lorenzo Bianconi wrote:
> >> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
> >>>> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> >>>>> In order to reduce page_pool memory footprint, rely on
> >>>>> page_pool_dev_alloc_frag routine and reduce buffer size
> >>>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> >>>>
> >>>> Is there any performance improvement beside the memory saving? As it
> >>>> should reduce TLB miss, I wonder if the TLB miss reducing can even
> >>>> out the cost of the extra frag reference count handling for the
> >>>> frag support?
> >>>
> >>> reducing the requested headroom to 192 (from 256) we have a nice improvement in
> >>> the 1500B frame case while it is mostly the same in the case of paged skb
> >>> (e.g. MTU 8000B).
> >>
> >> Can you define 'nice improvement' ? ;)
> >> Show us numbers or improvement in %.
> >
> > I am testing this RFC patch in the scenario reported below:
> >
> > iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
> >
> > - 6.4.0-rc1 net-next:
> > MTU 1500B: ~ 7.07 Gbps
> > MTU 8000B: ~ 14.7 Gbps
> >
> > - 6.4.0-rc1 net-next + page_pool frag support in veth:
> > MTU 1500B: ~ 8.57 Gbps
> > MTU 8000B: ~ 14.5 Gbps
> >
>
> Thanks for sharing the data.
> Maybe using the new frag interface introduced in [1] bring
> back the performance for the MTU 8000B case.
>
> 1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@huawei.com/
>
>
> I drafted a patch for veth to use the new frag interface, maybe that
> will show how veth can make use of it. Would you give it a try to see
> if there is any performance improvment for MTU 8000B case? Thanks.
>
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> skb_shinfo(skb)->nr_frags ||
> skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> u32 size, len, max_head_size, off;
> + struct page_pool_frag *pp_frag;
> struct sk_buff *nskb;
> - struct page *page;
> int i, head_off;
>
> /* We need a private copy of the skb and data buffers since
> @@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
> goto drop;
>
> + size = min_t(u32, skb->len, max_head_size);
> + size += VETH_XDP_HEADROOM;
> + size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> /* Allocate skb head */
> - page = page_pool_dev_alloc_pages(rq->page_pool);
> - if (!page)
> + pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
> + if (!pp_frag)
> goto drop;
>
> - nskb = napi_build_skb(page_address(page), PAGE_SIZE);
> + nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
> + pp_frag->truesize);
> if (!nskb) {
> - page_pool_put_full_page(rq->page_pool, page, true);
> + page_pool_put_full_page(rq->page_pool, pp_frag->page,
> + true);
> goto drop;
> }
>
> @@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> len = skb->len - off;
>
> for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
> - page = page_pool_dev_alloc_pages(rq->page_pool);
> - if (!page) {
> + size = min_t(u32, len, PAGE_SIZE);
> +
> + pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
> + if (!pp_frag) {
> consume_skb(nskb);
> goto drop;
> }
>
> - size = min_t(u32, len, PAGE_SIZE);
> - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
> - if (skb_copy_bits(skb, off, page_address(page),
> - size)) {
> + skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
> + size, pp_frag->truesize);
> + if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
> + pp_frag->offset, size)) {
> consume_skb(nskb);
> goto drop;
> }
> @@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
> return err;
> }
IIUC the code here we are using a variable length for linear part (at most one page)
while we will always use a full page (exept for the last fragment) for the paged
area, correct? I have not tested it yet but I do not think we will get a significant
improvement since if we set MTU to 8000B in my tests we get mostly the same throughput
(14.5 Gbps vs 14.7 Gbps) if we use page_pool fragment or page_pool full page.
Am I missing something?
What we are discussing with Jesper is try to allocate a order 3 page from the pool and
rely page_pool fragment, similar to page_frag_cache is doing. I will look into it if
there are no strong 'red flags'.
Regards,
Lorenzo
>
> + page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
> +
> return 0;
> }
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-16 22:52 ` Lorenzo Bianconi
2023-05-17 9:41 ` Yunsheng Lin
@ 2023-05-17 14:58 ` Jakub Kicinski
1 sibling, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-05-17 14:58 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: Maciej Fijalkowski, Lorenzo Bianconi, Yunsheng Lin, netdev, bpf,
davem, edumazet, pabeni, ast, daniel, hawk, john.fastabend
On Wed, 17 May 2023 00:52:25 +0200 Lorenzo Bianconi wrote:
> I am testing this RFC patch in the scenario reported below:
>
> iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
>
> - 6.4.0-rc1 net-next:
> MTU 1500B: ~ 7.07 Gbps
> MTU 8000B: ~ 14.7 Gbps
>
> - 6.4.0-rc1 net-next + page_pool frag support in veth:
> MTU 1500B: ~ 8.57 Gbps
> MTU 8000B: ~ 14.5 Gbps
>
> side note: it seems there is a regression between 6.2.15 and 6.4.0-rc1 net-next
> (even without latest veth page_pool patches) in the throughput I can get in the
> scenario above, but I have not looked into it yet.
>
> - 6.2.15:
> MTU 1500B: ~ 7.91 Gbps
> MTU 8000B: ~ 14.1 Gbps
>
> - 6.4.0-rc1 net-next w/o commits [0],[1],[2]
> MTU 1500B: ~ 6.38 Gbps
> MTU 8000B: ~ 13.2 Gbps
If the benchmark is iperf, wouldn't working towards preserving GSO
status across XDP (assuming prog is multi-buf-capable) be the most
beneficial optimization?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer
2023-05-17 14:17 ` Lorenzo Bianconi
@ 2023-05-18 1:16 ` Yunsheng Lin
0 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-05-18 1:16 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: Lorenzo Bianconi, Maciej Fijalkowski, netdev, bpf, davem,
edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend
On 2023/5/17 22:17, Lorenzo Bianconi wrote:
>> Maybe using the new frag interface introduced in [1] bring
>> back the performance for the MTU 8000B case.
>>
>> 1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@huawei.com/
>>
>>
>> I drafted a patch for veth to use the new frag interface, maybe that
>> will show how veth can make use of it. Would you give it a try to see
>> if there is any performance improvment for MTU 8000B case? Thanks.
>>
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>> skb_shinfo(skb)->nr_frags ||
>> skb_headroom(skb) < XDP_PACKET_HEADROOM) {
>> u32 size, len, max_head_size, off;
>> + struct page_pool_frag *pp_frag;
>> struct sk_buff *nskb;
>> - struct page *page;
>> int i, head_off;
>>
>> /* We need a private copy of the skb and data buffers since
>> @@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>> if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
>> goto drop;
>>
>> + size = min_t(u32, skb->len, max_head_size);
>> + size += VETH_XDP_HEADROOM;
>> + size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +
>> /* Allocate skb head */
>> - page = page_pool_dev_alloc_pages(rq->page_pool);
>> - if (!page)
>> + pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
>> + if (!pp_frag)
>> goto drop;
>>
>> - nskb = napi_build_skb(page_address(page), PAGE_SIZE);
>> + nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
>> + pp_frag->truesize);
>> if (!nskb) {
>> - page_pool_put_full_page(rq->page_pool, page, true);
>> + page_pool_put_full_page(rq->page_pool, pp_frag->page,
>> + true);
>> goto drop;
>> }
>>
>> @@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>> len = skb->len - off;
>>
>> for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
>> - page = page_pool_dev_alloc_pages(rq->page_pool);
>> - if (!page) {
>> + size = min_t(u32, len, PAGE_SIZE);
>> +
>> + pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
>> + if (!pp_frag) {
>> consume_skb(nskb);
>> goto drop;
>> }
>>
>> - size = min_t(u32, len, PAGE_SIZE);
>> - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
>> - if (skb_copy_bits(skb, off, page_address(page),
>> - size)) {
>> + skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
>> + size, pp_frag->truesize);
>> + if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
>> + pp_frag->offset, size)) {
>> consume_skb(nskb);
>> goto drop;
>> }
>> @@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
>> return err;
>> }
>
> IIUC the code here we are using a variable length for linear part (at most one page)
> while we will always use a full page (exept for the last fragment) for the paged
More correctly, it does not care if the data is in linear part or in paged area.
We copy the data to new skb using least possible fragment and most memory saving
depending on head/tail room size and the page size/order, as skb_copy_bits() hides
the date layout differenence for it's caller.
> area, correct? I have not tested it yet but I do not think we will get a significant
> improvement since if we set MTU to 8000B in my tests we get mostly the same throughput
> (14.5 Gbps vs 14.7 Gbps) if we use page_pool fragment or page_pool full page.
> Am I missing something?
I don't expect significant improvement too, but I do expect a 'nice improvement' for
performance and memory saving depending on how you view 'nice improvement':)
> What we are discussing with Jesper is try to allocate a order 3 page from the pool and
> rely page_pool fragment, similar to page_frag_cache is doing. I will look into it if
> there are no strong 'red flags'.
Thanks.
Yes, if we do not really care about memory usage, using order 3 page should give more
performance improvement.
As my understanding, improvement mentioned above is also applied to order 3 page.
>
> Regards,
> Lorenzo
>
>>
>> + page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
>> +
>> return 0;
>> }
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-05-18 1:16 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-12 13:08 [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer Lorenzo Bianconi
2023-05-12 13:43 ` Alexander Lobakin
2023-05-12 14:14 ` Lorenzo Bianconi
2023-05-15 16:36 ` Alexander Lobakin
2023-05-15 11:10 ` Yunsheng Lin
2023-05-15 11:24 ` Lorenzo Bianconi
2023-05-15 13:11 ` Maciej Fijalkowski
2023-05-16 22:52 ` Lorenzo Bianconi
2023-05-17 9:41 ` Yunsheng Lin
2023-05-17 14:17 ` Lorenzo Bianconi
2023-05-18 1:16 ` Yunsheng Lin
2023-05-17 14:58 ` Jakub Kicinski
2023-05-16 12:55 ` Yunsheng Lin
2023-05-16 16:11 ` Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).