* [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-14 3:12 [PATCH 0/5] virtio_net: enable premapped mode by default Xuan Zhuo
@ 2024-10-14 3:12 ` Xuan Zhuo
2024-10-17 13:42 ` Paolo Abeni
2024-10-18 7:41 ` Jason Wang
2024-10-14 3:12 ` [PATCH 2/5] virtio_net: introduce vi->mode Xuan Zhuo
` (4 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-14 3:12 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
virtualization, Si-Wei Liu
When the frag just got a page, then may lead to regression on VM.
Specially if the sysctl net.core.high_order_alloc_disable value is 1,
then the frag always get a page when do refill.
Which could see reliable crashes or scp failure (scp a file 100M in size
to VM):
The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur.
Here, when the frag size is not enough, we reduce the buffer len to fix
this problem.
Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f8131f92a392..59a99bbaf852 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
void *buf, *head;
dma_addr_t addr;
- if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
- return NULL;
-
head = page_address(alloc_frag->page);
if (rq->do_dma) {
@@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
len = SKB_DATA_ALIGN(len) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
+ return -ENOMEM;
+
buf = virtnet_rq_alloc(rq, len, gfp);
if (unlikely(!buf))
return -ENOMEM;
@@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
*/
len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
+ if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
+ return -ENOMEM;
+
+ if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
+ len -= sizeof(struct virtnet_rq_dma);
+
buf = virtnet_rq_alloc(rq, len + room, gfp);
if (unlikely(!buf))
return -ENOMEM;
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-14 3:12 ` [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc Xuan Zhuo
@ 2024-10-17 13:42 ` Paolo Abeni
2024-10-19 16:34 ` Michael S. Tsirkin
2024-10-25 2:35 ` Xuan Zhuo
2024-10-18 7:41 ` Jason Wang
1 sibling, 2 replies; 21+ messages in thread
From: Paolo Abeni @ 2024-10-17 13:42 UTC (permalink / raw)
To: Xuan Zhuo, netdev
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, virtualization,
Si-Wei Liu
On 10/14/24 05:12, Xuan Zhuo wrote:
> When the frag just got a page, then may lead to regression on VM.
> Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> then the frag always get a page when do refill.
>
> Which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
>
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur.
>
> Here, when the frag size is not enough, we reduce the buffer len to fix
> this problem.
>
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
This looks like a fix that should target the net tree, but the following
patches looks like net-next material. Any special reason to bundle them
together?
Also, please explicitly include the the target tree in the subj on next
submissions, thanks!
Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-17 13:42 ` Paolo Abeni
@ 2024-10-19 16:34 ` Michael S. Tsirkin
2024-10-25 2:35 ` Xuan Zhuo
1 sibling, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2024-10-19 16:34 UTC (permalink / raw)
To: Paolo Abeni
Cc: Xuan Zhuo, netdev, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, virtualization,
Si-Wei Liu
On Thu, Oct 17, 2024 at 03:42:59PM +0200, Paolo Abeni wrote:
>
>
> On 10/14/24 05:12, Xuan Zhuo wrote:
> > When the frag just got a page, then may lead to regression on VM.
> > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > then the frag always get a page when do refill.
> >
> > Which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur.
> >
> > Here, when the frag size is not enough, we reduce the buffer len to fix
> > this problem.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> This looks like a fix that should target the net tree, but the following
> patches looks like net-next material. Any special reason to bundle them
> together?
>
> Also, please explicitly include the the target tree in the subj on next
> submissions, thanks!
>
> Paolo
The issue is that net-next is not rebased on net, so if patches 2-5 land
in next and 1 only in net, next will be broken. I think the simplest fix
is just to have 1 on net-next, backport it to net too, and git will
figure it out. Right?
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-17 13:42 ` Paolo Abeni
2024-10-19 16:34 ` Michael S. Tsirkin
@ 2024-10-25 2:35 ` Xuan Zhuo
2024-10-25 13:22 ` Simon Horman
1 sibling, 1 reply; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-25 2:35 UTC (permalink / raw)
To: Paolo Abeni
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, virtualization,
Si-Wei Liu, netdev
On Thu, 17 Oct 2024 15:42:59 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
>
>
> On 10/14/24 05:12, Xuan Zhuo wrote:
> > When the frag just got a page, then may lead to regression on VM.
> > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > then the frag always get a page when do refill.
> >
> > Which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur.
> >
> > Here, when the frag size is not enough, we reduce the buffer len to fix
> > this problem.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> This looks like a fix that should target the net tree, but the following
> patches looks like net-next material. Any special reason to bundle them
> together?
Sorry, I forgot to add net-next as a target tree.
This may look like a fix. But the feature was disabled in the last Linux
version. So the bug cannot be triggered, so we don't need to push to the net
tree.
Thanks.
>
> Also, please explicitly include the the target tree in the subj on next
> submissions, thanks!
>
> Paolo
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-25 2:35 ` Xuan Zhuo
@ 2024-10-25 13:22 ` Simon Horman
0 siblings, 0 replies; 21+ messages in thread
From: Simon Horman @ 2024-10-25 13:22 UTC (permalink / raw)
To: Xuan Zhuo
Cc: Paolo Abeni, Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, virtualization,
Si-Wei Liu, netdev
On Fri, Oct 25, 2024 at 10:35:53AM +0800, Xuan Zhuo wrote:
> On Thu, 17 Oct 2024 15:42:59 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
> >
> >
> > On 10/14/24 05:12, Xuan Zhuo wrote:
> > > When the frag just got a page, then may lead to regression on VM.
> > > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > > then the frag always get a page when do refill.
> > >
> > > Which could see reliable crashes or scp failure (scp a file 100M in size
> > > to VM):
> > >
> > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > everything is fine. However, if the frag is only one page and the
> > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > overflow may occur.
> > >
> > > Here, when the frag size is not enough, we reduce the buffer len to fix
> > > this problem.
> > >
> > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > This looks like a fix that should target the net tree, but the following
> > patches looks like net-next material. Any special reason to bundle them
> > together?
>
> Sorry, I forgot to add net-next as a target tree.
>
> This may look like a fix. But the feature was disabled in the last Linux
> version. So the bug cannot be triggered, so we don't need to push to the net
> tree.
I think it would be useful to be clear in the commit message, use of tags,
and target tree regarding fixes and non-fixes.
Please describe in the commit message why this is not fixing a bug, as you
have described above. And please do not include Fixes tags in patches that
are not bug fixes, which seems to be the case here.
If you want to refer to the patch that introduced the problem, you can use
the following syntax, in the commit message, before the tags. Unlike Fixes
tags, this may be line wrapped.
This problem is not a bug fix for net because... It was was introduced by
commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api").
Reported-by: ...
...
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-14 3:12 ` [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc Xuan Zhuo
2024-10-17 13:42 ` Paolo Abeni
@ 2024-10-18 7:41 ` Jason Wang
2024-10-25 2:40 ` Xuan Zhuo
1 sibling, 1 reply; 21+ messages in thread
From: Jason Wang @ 2024-10-18 7:41 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
Si-Wei Liu
On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> When the frag just got a page, then may lead to regression on VM.
> Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> then the frag always get a page when do refill.
>
> Which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
>
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur.
>
> Here, when the frag size is not enough, we reduce the buffer len to fix
> this problem.
>
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Though this may fix the problem and we need it now, I would like to
have some tweaks in the future. Basically because of the security
implication for sharing driver metadata with the device (most IOMMU
can only do protection at the granule at the page). So we may end up
with device-triggerable behaviour. For safety, we should use driver
private memory for DMA metadata.
> ---
> drivers/net/virtio_net.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f8131f92a392..59a99bbaf852 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> void *buf, *head;
> dma_addr_t addr;
>
> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> - return NULL;
> -
> head = page_address(alloc_frag->page);
>
> if (rq->do_dma) {
> @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> len = SKB_DATA_ALIGN(len) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> + return -ENOMEM;
> +
> buf = virtnet_rq_alloc(rq, len, gfp);
> if (unlikely(!buf))
> return -ENOMEM;
> @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> */
> len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>
> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> + return -ENOMEM;
This makes me think of another question, how could we guarantee len +
room is less or equal to PAGE_SIZE. Especially considering we need
extra head and tailroom for XDP? If we can't it is a bug:
"""
/**
* skb_page_frag_refill - check that a page_frag contains enough room
* @sz: minimum size of the fragment we want to get
* @pfrag: pointer to page_frag
* @gfp: priority for memory allocation
*
* Note: While this allocator tries to use high order pages, there is
* no guarantee that allocations succeed. Therefore, @sz MUST be
* less or equal than PAGE_SIZE.
*/
"""
> +
> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> + len -= sizeof(struct virtnet_rq_dma);
Any reason we need to check alloc_frag->offset?
> +
> buf = virtnet_rq_alloc(rq, len + room, gfp);
Btw, as pointed out in previous review, we should have a consistent API:
1) hide the alloc_frag like virtnet_rq_alloc()
or
2) pass the alloc_frag to virtnet_rq_alloc()
> if (unlikely(!buf))
> return -ENOMEM;
> --
> 2.32.0.3.g01195cf9f
>
Thanks
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc
2024-10-18 7:41 ` Jason Wang
@ 2024-10-25 2:40 ` Xuan Zhuo
0 siblings, 0 replies; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-25 2:40 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
Si-Wei Liu
On Fri, 18 Oct 2024 15:41:41 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > When the frag just got a page, then may lead to regression on VM.
> > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > then the frag always get a page when do refill.
> >
> > Which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur.
> >
> > Here, when the frag size is not enough, we reduce the buffer len to fix
> > this problem.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> Though this may fix the problem and we need it now, I would like to
> have some tweaks in the future. Basically because of the security
> implication for sharing driver metadata with the device (most IOMMU
> can only do protection at the granule at the page). So we may end up
> with device-triggerable behaviour. For safety, we should use driver
> private memory for DMA metadata.
>
> > ---
> > drivers/net/virtio_net.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f8131f92a392..59a99bbaf852 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > void *buf, *head;
> > dma_addr_t addr;
> >
> > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > - return NULL;
> > -
> > head = page_address(alloc_frag->page);
> >
> > if (rq->do_dma) {
> > @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > len = SKB_DATA_ALIGN(len) +
> > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >
> > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > + return -ENOMEM;
> > +
> > buf = virtnet_rq_alloc(rq, len, gfp);
> > if (unlikely(!buf))
> > return -ENOMEM;
> > @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > */
> > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >
> > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > + return -ENOMEM;
>
> This makes me think of another question, how could we guarantee len +
> room is less or equal to PAGE_SIZE. Especially considering we need
> extra head and tailroom for XDP? If we can't it is a bug:
get_mergeable_buf_len() do this.
>
> """
> /**
> * skb_page_frag_refill - check that a page_frag contains enough room
> * @sz: minimum size of the fragment we want to get
> * @pfrag: pointer to page_frag
> * @gfp: priority for memory allocation
> *
> * Note: While this allocator tries to use high order pages, there is
> * no guarantee that allocations succeed. Therefore, @sz MUST be
> * less or equal than PAGE_SIZE.
> */
> """
>
> > +
> > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > + len -= sizeof(struct virtnet_rq_dma);
>
> Any reason we need to check alloc_frag->offset?
We just need to check when the page of the alloc frag is new.
In this case, we need to allocate space to store dma meta.
If the offset > 0, then we do not need to allocate extra space,
so it is safe.
>
> > +
> > buf = virtnet_rq_alloc(rq, len + room, gfp);
>
> Btw, as pointed out in previous review, we should have a consistent API:
>
> 1) hide the alloc_frag like virtnet_rq_alloc()
>
> or
>
> 2) pass the alloc_frag to virtnet_rq_alloc()
Now we need to check len+room before calling skb_page_frag_refill()
so we must move virtnet_rq_alloc() outside virtnet_rq_alloc().
Thanks
>
> > if (unlikely(!buf))
> > return -ENOMEM;
> > --
> > 2.32.0.3.g01195cf9f
> >
>
> Thanks
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/5] virtio_net: introduce vi->mode
2024-10-14 3:12 [PATCH 0/5] virtio_net: enable premapped mode by default Xuan Zhuo
2024-10-14 3:12 ` [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc Xuan Zhuo
@ 2024-10-14 3:12 ` Xuan Zhuo
2024-10-18 7:48 ` Jason Wang
2024-10-14 3:12 ` [PATCH 3/5] virtio_net: big mode skip the unmap check Xuan Zhuo
` (3 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-14 3:12 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
virtualization
Now, if we want to judge the rx work mode, we have to use such codes:
1. merge mode: vi->mergeable_rx_bufs
2. big mode: vi->big_packets && !vi->mergeable_rx_bufs
3. small: !vi->big_packets && !vi->mergeable_rx_bufs
This is inconvenient and abstract, and we also have this use case:
if (vi->mergeable_rx_bufs)
....
else if (vi->big_packets)
....
else
For this case, I think switch-case is the better choice.
So here I introduce vi->mode to record the virtio-net work mode.
That is helpful to judge the work mode and choose the branches.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 61 +++++++++++++++++++++++++++++++---------
1 file changed, 47 insertions(+), 14 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 59a99bbaf852..14809b614d62 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -385,6 +385,12 @@ struct control_buf {
virtio_net_ctrl_ack status;
};
+enum virtnet_mode {
+ VIRTNET_MODE_SMALL,
+ VIRTNET_MODE_MERGE,
+ VIRTNET_MODE_BIG
+};
+
struct virtnet_info {
struct virtio_device *vdev;
struct virtqueue *cvq;
@@ -414,6 +420,8 @@ struct virtnet_info {
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
+ enum virtnet_mode mode;
+
/* Host supports rss and/or hash report */
bool has_rss;
bool has_rss_hash_report;
@@ -643,12 +651,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
static void virtnet_rq_free_buf(struct virtnet_info *vi,
struct receive_queue *rq, void *buf)
{
- if (vi->mergeable_rx_bufs)
+ switch (vi->mode) {
+ case VIRTNET_MODE_SMALL:
+ case VIRTNET_MODE_MERGE:
put_page(virt_to_head_page(buf));
- else if (vi->big_packets)
+ break;
+ case VIRTNET_MODE_BIG:
give_pages(rq, buf);
- else
- put_page(virt_to_head_page(buf));
+ break;
+ }
}
static void enable_delayed_refill(struct virtnet_info *vi)
@@ -1315,7 +1326,8 @@ static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queu
flags = ((struct virtio_net_common_hdr *)(xdp->data - vi->hdr_len))->hdr.flags;
- if (!vi->mergeable_rx_bufs)
+ /* We only support small and merge mode. */
+ if (vi->mode == VIRTNET_MODE_SMALL)
skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
else
skb = virtnet_receive_xsk_merge(dev, vi, rq, xdp, xdp_xmit, stats);
@@ -2389,13 +2401,20 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
*/
flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
- if (vi->mergeable_rx_bufs)
+ switch (vi->mode) {
+ case VIRTNET_MODE_MERGE:
skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
stats);
- else if (vi->big_packets)
+ break;
+
+ case VIRTNET_MODE_BIG:
skb = receive_big(dev, vi, rq, buf, len, stats);
- else
+ break;
+
+ case VIRTNET_MODE_SMALL:
skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
+ break;
+ }
if (unlikely(!skb))
return;
@@ -2580,12 +2599,19 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
}
do {
- if (vi->mergeable_rx_bufs)
+ switch (vi->mode) {
+ case VIRTNET_MODE_MERGE:
err = add_recvbuf_mergeable(vi, rq, gfp);
- else if (vi->big_packets)
+ break;
+
+ case VIRTNET_MODE_BIG:
err = add_recvbuf_big(vi, rq, gfp);
- else
+ break;
+
+ case VIRTNET_MODE_SMALL:
err = add_recvbuf_small(vi, rq, gfp);
+ break;
+ }
if (err)
break;
@@ -2703,7 +2729,7 @@ static int virtnet_receive_packets(struct virtnet_info *vi,
int packets = 0;
void *buf;
- if (!vi->big_packets || vi->mergeable_rx_bufs) {
+ if (vi->mode != VIRTNET_MODE_BIG) {
void *ctx;
while (packets < budget &&
(buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
@@ -5510,7 +5536,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
/* In big_packets mode, xdp cannot work, so there is no need to
* initialize xsk of rq.
*/
- if (vi->big_packets && !vi->mergeable_rx_bufs)
+ if (vi->mode == VIRTNET_MODE_BIG)
return -ENOENT;
if (qid >= vi->curr_queue_pairs)
@@ -6007,7 +6033,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
vqs_info = kcalloc(total_vqs, sizeof(*vqs_info), GFP_KERNEL);
if (!vqs_info)
goto err_vqs_info;
- if (!vi->big_packets || vi->mergeable_rx_bufs) {
+ if (vi->mode != VIRTNET_MODE_BIG) {
ctx = kcalloc(total_vqs, sizeof(*ctx), GFP_KERNEL);
if (!ctx)
goto err_ctx;
@@ -6480,6 +6506,13 @@ static int virtnet_probe(struct virtio_device *vdev)
virtnet_set_big_packets(vi, mtu);
+ if (vi->mergeable_rx_bufs)
+ vi->mode = VIRTNET_MODE_MERGE;
+ else if (vi->big_packets)
+ vi->mode = VIRTNET_MODE_BIG;
+ else
+ vi->mode = VIRTNET_MODE_SMALL;
+
if (vi->any_header_sg)
dev->needed_headroom = vi->hdr_len;
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] virtio_net: introduce vi->mode
2024-10-14 3:12 ` [PATCH 2/5] virtio_net: introduce vi->mode Xuan Zhuo
@ 2024-10-18 7:48 ` Jason Wang
2024-10-25 2:47 ` Xuan Zhuo
0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2024-10-18 7:48 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization
On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, if we want to judge the rx work mode, we have to use such codes:
>
> 1. merge mode: vi->mergeable_rx_bufs
> 2. big mode: vi->big_packets && !vi->mergeable_rx_bufs
> 3. small: !vi->big_packets && !vi->mergeable_rx_bufs
>
> This is inconvenient and abstract, and we also have this use case:
>
> if (vi->mergeable_rx_bufs)
> ....
> else if (vi->big_packets)
> ....
> else
>
> For this case, I think switch-case is the better choice.
>
> So here I introduce vi->mode to record the virtio-net work mode.
> That is helpful to judge the work mode and choose the branches.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 61 +++++++++++++++++++++++++++++++---------
> 1 file changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 59a99bbaf852..14809b614d62 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -385,6 +385,12 @@ struct control_buf {
> virtio_net_ctrl_ack status;
> };
>
> +enum virtnet_mode {
> + VIRTNET_MODE_SMALL,
> + VIRTNET_MODE_MERGE,
> + VIRTNET_MODE_BIG
> +};
I'm not sure if this can ease or not.
[...]
> + if (vi->mergeable_rx_bufs)
> + vi->mode = VIRTNET_MODE_MERGE;
> + else if (vi->big_packets)
> + vi->mode = VIRTNET_MODE_BIG;
Maybe we can just say big_packets doesn't mean big mode.
> + else
> + vi->mode = VIRTNET_MODE_SMALL;
> +
> if (vi->any_header_sg)
> dev->needed_headroom = vi->hdr_len;
Anyhow this seems not a fix so it should be a separate series than patch 1?
Thanks
>
> --
> 2.32.0.3.g01195cf9f
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] virtio_net: introduce vi->mode
2024-10-18 7:48 ` Jason Wang
@ 2024-10-25 2:47 ` Xuan Zhuo
0 siblings, 0 replies; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-25 2:47 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization
On Fri, 18 Oct 2024 15:48:38 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Now, if we want to judge the rx work mode, we have to use such codes:
> >
> > 1. merge mode: vi->mergeable_rx_bufs
> > 2. big mode: vi->big_packets && !vi->mergeable_rx_bufs
> > 3. small: !vi->big_packets && !vi->mergeable_rx_bufs
> >
> > This is inconvenient and abstract, and we also have this use case:
> >
> > if (vi->mergeable_rx_bufs)
> > ....
> > else if (vi->big_packets)
> > ....
> > else
> >
> > For this case, I think switch-case is the better choice.
> >
> > So here I introduce vi->mode to record the virtio-net work mode.
> > That is helpful to judge the work mode and choose the branches.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 61 +++++++++++++++++++++++++++++++---------
> > 1 file changed, 47 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 59a99bbaf852..14809b614d62 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -385,6 +385,12 @@ struct control_buf {
> > virtio_net_ctrl_ack status;
> > };
> >
> > +enum virtnet_mode {
> > + VIRTNET_MODE_SMALL,
> > + VIRTNET_MODE_MERGE,
> > + VIRTNET_MODE_BIG
> > +};
>
> I'm not sure if this can ease or not.
>
> [...]
>
> > + if (vi->mergeable_rx_bufs)
> > + vi->mode = VIRTNET_MODE_MERGE;
> > + else if (vi->big_packets)
> > + vi->mode = VIRTNET_MODE_BIG;
>
> Maybe we can just say big_packets doesn't mean big mode.
>
> > + else
> > + vi->mode = VIRTNET_MODE_SMALL;
> > +
> > if (vi->any_header_sg)
> > dev->needed_headroom = vi->hdr_len;
>
> Anyhow this seems not a fix so it should be a separate series than patch 1?
I think this series is not about fixing the problem, the feature was disabled in
the last Linux version. This series is about turning the pre-mapped mode of rx
back on.
This commit tries to make things easier. The current code is very unintuitive
when we try to switch or check the virtio-net rx mode
Thanks.
>
> Thanks
>
> >
> > --
> > 2.32.0.3.g01195cf9f
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/5] virtio_net: big mode skip the unmap check
2024-10-14 3:12 [PATCH 0/5] virtio_net: enable premapped mode by default Xuan Zhuo
2024-10-14 3:12 ` [PATCH 1/5] virtio-net: fix overflow inside virtnet_rq_alloc Xuan Zhuo
2024-10-14 3:12 ` [PATCH 2/5] virtio_net: introduce vi->mode Xuan Zhuo
@ 2024-10-14 3:12 ` Xuan Zhuo
2024-10-14 3:12 ` [PATCH 4/5] virtio_net: enable premapped mode for merge and small by default Xuan Zhuo
` (2 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-14 3:12 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
virtualization
The virtio-net big mode did not enable premapped mode,
so we did not need to check the unmap. And the subsequent
commit will remove the failover code for failing enable
premapped for merge and small mode. So we need to remove
the checking do_dma code in the big mode path.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 14809b614d62..cd90e77881df 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -998,7 +998,7 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
return;
}
- if (rq->do_dma)
+ if (vi->mode != VIRTNET_MODE_BIG)
virtnet_rq_unmap(rq, buf, 0);
virtnet_rq_free_buf(vi, rq, buf);
@@ -2738,7 +2738,7 @@ static int virtnet_receive_packets(struct virtnet_info *vi,
}
} else {
while (packets < budget &&
- (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
+ (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
receive_buf(vi, rq, buf, len, NULL, xdp_xmit, stats);
packets++;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 4/5] virtio_net: enable premapped mode for merge and small by default
2024-10-14 3:12 [PATCH 0/5] virtio_net: enable premapped mode by default Xuan Zhuo
` (2 preceding siblings ...)
2024-10-14 3:12 ` [PATCH 3/5] virtio_net: big mode skip the unmap check Xuan Zhuo
@ 2024-10-14 3:12 ` Xuan Zhuo
2024-10-18 8:00 ` Jason Wang
2024-10-14 3:12 ` [PATCH 5/5] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-10-14 4:57 ` [PATCH 0/5] virtio_net: enable premapped mode by default Michael S. Tsirkin
5 siblings, 1 reply; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-14 3:12 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
virtualization
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.
In premapped mod, we can perform only one dma operation for the pages of
the alloc frag. This is beneficial for the iommu device.
kernel command line: intel_iommu=on iommu.passthrough=0
| strict=0 | strict=1
Before | 775496pps | 428614pps
After | 1109316pps | 742853pps
In the 6.11, we disabled this feature because a regress [1].
Now, we fix the problem and re-enable it.
[1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cd90e77881df..8cf24b7b58bd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -6133,6 +6133,21 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
return -ENOMEM;
}
+static void virtnet_rq_set_premapped(struct virtnet_info *vi)
+{
+ int i;
+
+ /* disable for big mode */
+ if (vi->mode == VIRTNET_MODE_BIG)
+ return;
+
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ /* error should never happen */
+ BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
+ vi->rq[i].do_dma = true;
+ }
+}
+
static int init_vqs(struct virtnet_info *vi)
{
int ret;
@@ -6146,6 +6161,8 @@ static int init_vqs(struct virtnet_info *vi)
if (ret)
goto err_free;
+ virtnet_rq_set_premapped(vi);
+
cpus_read_lock();
virtnet_set_affinity(vi);
cpus_read_unlock();
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 4/5] virtio_net: enable premapped mode for merge and small by default
2024-10-14 3:12 ` [PATCH 4/5] virtio_net: enable premapped mode for merge and small by default Xuan Zhuo
@ 2024-10-18 8:00 ` Jason Wang
2024-10-25 2:53 ` Xuan Zhuo
0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2024-10-18 8:00 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization
On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Currently, the virtio core will perform a dma operation for each
> buffer. Although, the same page may be operated multiple times.
>
> In premapped mod, we can perform only one dma operation for the pages of
> the alloc frag. This is beneficial for the iommu device.
>
> kernel command line: intel_iommu=on iommu.passthrough=0
>
> | strict=0 | strict=1
> Before | 775496pps | 428614pps
> After | 1109316pps | 742853pps
>
> In the 6.11, we disabled this feature because a regress [1].
>
> Now, we fix the problem and re-enable it.
>
> [1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index cd90e77881df..8cf24b7b58bd 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -6133,6 +6133,21 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> return -ENOMEM;
> }
>
> +static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> +{
> + int i;
> +
> + /* disable for big mode */
> + if (vi->mode == VIRTNET_MODE_BIG)
> + return;
Nitpick: I would like such a check to be done at the caller.
But anyhow the patch looks good
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
> +
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + /* error should never happen */
> + BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
> + vi->rq[i].do_dma = true;
> + }
> +}
> +
> static int init_vqs(struct virtnet_info *vi)
> {
> int ret;
> @@ -6146,6 +6161,8 @@ static int init_vqs(struct virtnet_info *vi)
> if (ret)
> goto err_free;
>
> + virtnet_rq_set_premapped(vi);
> +
> cpus_read_lock();
> virtnet_set_affinity(vi);
> cpus_read_unlock();
> --
> 2.32.0.3.g01195cf9f
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 4/5] virtio_net: enable premapped mode for merge and small by default
2024-10-18 8:00 ` Jason Wang
@ 2024-10-25 2:53 ` Xuan Zhuo
0 siblings, 0 replies; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-25 2:53 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization
On Fri, 18 Oct 2024 16:00:07 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Currently, the virtio core will perform a dma operation for each
> > buffer. Although, the same page may be operated multiple times.
> >
> > In premapped mod, we can perform only one dma operation for the pages of
> > the alloc frag. This is beneficial for the iommu device.
> >
> > kernel command line: intel_iommu=on iommu.passthrough=0
> >
> > | strict=0 | strict=1
> > Before | 775496pps | 428614pps
> > After | 1109316pps | 742853pps
> >
> > In the 6.11, we disabled this feature because a regress [1].
> >
> > Now, we fix the problem and re-enable it.
> >
> > [1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cd90e77881df..8cf24b7b58bd 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -6133,6 +6133,21 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> > return -ENOMEM;
> > }
> >
> > +static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > +{
> > + int i;
> > +
> > + /* disable for big mode */
> > + if (vi->mode == VIRTNET_MODE_BIG)
> > + return;
>
> Nitpick: I would like such a check to be done at the caller.
I am ok, if you like.
Thanks.
>
> But anyhow the patch looks good
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
> Thanks
>
> > +
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + /* error should never happen */
> > + BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
> > + vi->rq[i].do_dma = true;
> > + }
> > +}
> > +
> > static int init_vqs(struct virtnet_info *vi)
> > {
> > int ret;
> > @@ -6146,6 +6161,8 @@ static int init_vqs(struct virtnet_info *vi)
> > if (ret)
> > goto err_free;
> >
> > + virtnet_rq_set_premapped(vi);
> > +
> > cpus_read_lock();
> > virtnet_set_affinity(vi);
> > cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 5/5] virtio_net: rx remove premapped failover code
2024-10-14 3:12 [PATCH 0/5] virtio_net: enable premapped mode by default Xuan Zhuo
` (3 preceding siblings ...)
2024-10-14 3:12 ` [PATCH 4/5] virtio_net: enable premapped mode for merge and small by default Xuan Zhuo
@ 2024-10-14 3:12 ` Xuan Zhuo
2024-10-18 8:01 ` Jason Wang
2024-10-14 4:57 ` [PATCH 0/5] virtio_net: enable premapped mode by default Michael S. Tsirkin
5 siblings, 1 reply; 21+ messages in thread
From: Xuan Zhuo @ 2024-10-14 3:12 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
virtualization
Now, the premapped mode can be enabled unconditionally.
So we can remove the failover code for merge and small mode.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 80 +++++++++++++++++-----------------------
1 file changed, 33 insertions(+), 47 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8cf24b7b58bd..4d3e35b02478 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -356,9 +356,6 @@ struct receive_queue {
struct xdp_rxq_info xsk_rxq_info;
struct xdp_buff **xsk_buffs;
-
- /* Do dma by self */
- bool do_dma;
};
/* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -899,7 +896,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
void *buf;
buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
- if (buf && rq->do_dma)
+ if (buf)
virtnet_rq_unmap(rq, buf, *len);
return buf;
@@ -912,11 +909,6 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
u32 offset;
void *head;
- if (!rq->do_dma) {
- sg_init_one(rq->sg, buf, len);
- return;
- }
-
head = page_address(rq->alloc_frag.page);
offset = buf - head;
@@ -939,44 +931,42 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
head = page_address(alloc_frag->page);
- if (rq->do_dma) {
- dma = head;
-
- /* new pages */
- if (!alloc_frag->offset) {
- if (rq->last_dma) {
- /* Now, the new page is allocated, the last dma
- * will not be used. So the dma can be unmapped
- * if the ref is 0.
- */
- virtnet_rq_unmap(rq, rq->last_dma, 0);
- rq->last_dma = NULL;
- }
+ dma = head;
- dma->len = alloc_frag->size - sizeof(*dma);
+ /* new pages */
+ if (!alloc_frag->offset) {
+ if (rq->last_dma) {
+ /* Now, the new page is allocated, the last dma
+ * will not be used. So the dma can be unmapped
+ * if the ref is 0.
+ */
+ virtnet_rq_unmap(rq, rq->last_dma, 0);
+ rq->last_dma = NULL;
+ }
- addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
- dma->len, DMA_FROM_DEVICE, 0);
- if (virtqueue_dma_mapping_error(rq->vq, addr))
- return NULL;
+ dma->len = alloc_frag->size - sizeof(*dma);
- dma->addr = addr;
- dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
+ addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
+ dma->len, DMA_FROM_DEVICE, 0);
+ if (virtqueue_dma_mapping_error(rq->vq, addr))
+ return NULL;
- /* Add a reference to dma to prevent the entire dma from
- * being released during error handling. This reference
- * will be freed after the pages are no longer used.
- */
- get_page(alloc_frag->page);
- dma->ref = 1;
- alloc_frag->offset = sizeof(*dma);
+ dma->addr = addr;
+ dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
- rq->last_dma = dma;
- }
+ /* Add a reference to dma to prevent the entire dma from
+ * being released during error handling. This reference
+ * will be freed after the pages are no longer used.
+ */
+ get_page(alloc_frag->page);
+ dma->ref = 1;
+ alloc_frag->offset = sizeof(*dma);
- ++dma->ref;
+ rq->last_dma = dma;
}
+ ++dma->ref;
+
buf = head + alloc_frag->offset;
get_page(alloc_frag->page);
@@ -2452,8 +2442,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
if (err < 0) {
- if (rq->do_dma)
- virtnet_rq_unmap(rq, buf, 0);
+ virtnet_rq_unmap(rq, buf, 0);
put_page(virt_to_head_page(buf));
}
@@ -2573,8 +2562,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
ctx = mergeable_len_to_ctx(len + room, headroom);
err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
if (err < 0) {
- if (rq->do_dma)
- virtnet_rq_unmap(rq, buf, 0);
+ virtnet_rq_unmap(rq, buf, 0);
put_page(virt_to_head_page(buf));
}
@@ -5948,7 +5936,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
int i;
for (i = 0; i < vi->max_queue_pairs; i++)
if (vi->rq[i].alloc_frag.page) {
- if (vi->rq[i].do_dma && vi->rq[i].last_dma)
+ if (vi->rq[i].last_dma)
virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
put_page(vi->rq[i].alloc_frag.page);
}
@@ -6141,11 +6129,9 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
if (vi->mode == VIRTNET_MODE_BIG)
return;
- for (i = 0; i < vi->max_queue_pairs; i++) {
+ for (i = 0; i < vi->max_queue_pairs; i++)
/* error should never happen */
BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
- vi->rq[i].do_dma = true;
- }
}
static int init_vqs(struct virtnet_info *vi)
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 5/5] virtio_net: rx remove premapped failover code
2024-10-14 3:12 ` [PATCH 5/5] virtio_net: rx remove premapped failover code Xuan Zhuo
@ 2024-10-18 8:01 ` Jason Wang
0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2024-10-18 8:01 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization
On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, the premapped mode can be enabled unconditionally.
>
> So we can remove the failover code for merge and small mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Let's be more verbose here. For example, the virtnet_rq_xxx() helper
would be only used if the mode is using pre mapping.
And it might be worth adding a check to prevent misusing of the API.
Others look good to me.
Thanks
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] virtio_net: enable premapped mode by default
2024-10-14 3:12 [PATCH 0/5] virtio_net: enable premapped mode by default Xuan Zhuo
` (4 preceding siblings ...)
2024-10-14 3:12 ` [PATCH 5/5] virtio_net: rx remove premapped failover code Xuan Zhuo
@ 2024-10-14 4:57 ` Michael S. Tsirkin
2024-10-16 7:55 ` Darren Kenny
5 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2024-10-14 4:57 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Jason Wang, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
Darren Kenny, Si-Wei Liu
On Mon, Oct 14, 2024 at 11:12:29AM +0800, Xuan Zhuo wrote:
> In the last linux version, we disabled this feature to fix the
> regress[1].
>
> The patch set is try to fix the problem and re-enable it.
>
> More info: http://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com
>
> Thanks.
>
> [1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
Darren, you previously reported crashes with a patch very similar to 1/5.
Can you please test this patchset and report whether they
are still observed?
If yes, any data on how to reproduce would be very benefitial for Xuan
Zhuo.
> Xuan Zhuo (5):
> virtio-net: fix overflow inside virtnet_rq_alloc
> virtio_net: introduce vi->mode
> virtio_net: big mode skip the unmap check
> virtio_net: enable premapped mode for merge and small by default
> virtio_net: rx remove premapped failover code
>
> drivers/net/virtio_net.c | 168 ++++++++++++++++++++++++---------------
> 1 file changed, 105 insertions(+), 63 deletions(-)
>
> --
> 2.32.0.3.g01195cf9f
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 0/5] virtio_net: enable premapped mode by default
2024-10-14 4:57 ` [PATCH 0/5] virtio_net: enable premapped mode by default Michael S. Tsirkin
@ 2024-10-16 7:55 ` Darren Kenny
2024-10-18 14:59 ` Darren Kenny
0 siblings, 1 reply; 21+ messages in thread
From: Darren Kenny @ 2024-10-16 7:55 UTC (permalink / raw)
To: Michael S. Tsirkin, Xuan Zhuo
Cc: netdev, Jason Wang, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
Si-Wei Liu
Hi Michael,
On Monday, 2024-10-14 at 00:57:41 -04, Michael S. Tsirkin wrote:
> On Mon, Oct 14, 2024 at 11:12:29AM +0800, Xuan Zhuo wrote:
>> In the last linux version, we disabled this feature to fix the
>> regress[1].
>>
>> The patch set is try to fix the problem and re-enable it.
>>
>> More info: http://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com
>>
>> Thanks.
>>
>> [1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>
> Darren, you previously reported crashes with a patch very similar to 1/5.
> Can you please test this patchset and report whether they
> are still observed?
> If yes, any data on how to reproduce would be very benefitial for Xuan
> Zhuo.
>
I aim to get to this in the next week, but I don't currently have
access to a system to test it, it will take a few days at least before I
can get one.
Thanks,
Darren.
>
>> Xuan Zhuo (5):
>> virtio-net: fix overflow inside virtnet_rq_alloc
>> virtio_net: introduce vi->mode
>> virtio_net: big mode skip the unmap check
>> virtio_net: enable premapped mode for merge and small by default
>> virtio_net: rx remove premapped failover code
>>
>> drivers/net/virtio_net.c | 168 ++++++++++++++++++++++++---------------
>> 1 file changed, 105 insertions(+), 63 deletions(-)
>>
>> --
>> 2.32.0.3.g01195cf9f
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] virtio_net: enable premapped mode by default
2024-10-16 7:55 ` Darren Kenny
@ 2024-10-18 14:59 ` Darren Kenny
2024-10-19 16:32 ` Michael S. Tsirkin
0 siblings, 1 reply; 21+ messages in thread
From: Darren Kenny @ 2024-10-18 14:59 UTC (permalink / raw)
To: Michael S. Tsirkin, Xuan Zhuo
Cc: netdev, Jason Wang, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
Si-Wei Liu
Hi Michael / Xuan Zhuo,
On Wednesday, 2024-10-16 at 08:55:21 +01, Darren Kenny wrote:
> Hi Michael,
>
> On Monday, 2024-10-14 at 00:57:41 -04, Michael S. Tsirkin wrote:
>> On Mon, Oct 14, 2024 at 11:12:29AM +0800, Xuan Zhuo wrote:
>>> In the last linux version, we disabled this feature to fix the
>>> regress[1].
>>>
>>> The patch set is try to fix the problem and re-enable it.
>>>
>>> More info: http://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com
>>>
>>> Thanks.
>>>
>>> [1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>>
>> Darren, you previously reported crashes with a patch very similar to 1/5.
>> Can you please test this patchset and report whether they
>> are still observed?
>> If yes, any data on how to reproduce would be very benefitial for Xuan
>> Zhuo.
>>
>
> I aim to get to this in the next week, but I don't currently have
> access to a system to test it, it will take a few days at least before I
> can get one.
I finally a managed to get access to a system to test this on, and it
looks like things are working with this patch-set. So...
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Thanks,
Darren.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] virtio_net: enable premapped mode by default
2024-10-18 14:59 ` Darren Kenny
@ 2024-10-19 16:32 ` Michael S. Tsirkin
0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2024-10-19 16:32 UTC (permalink / raw)
To: Darren Kenny
Cc: Xuan Zhuo, netdev, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
virtualization, Si-Wei Liu
On Fri, Oct 18, 2024 at 03:59:14PM +0100, Darren Kenny wrote:
> Hi Michael / Xuan Zhuo,
>
> On Wednesday, 2024-10-16 at 08:55:21 +01, Darren Kenny wrote:
> > Hi Michael,
> >
> > On Monday, 2024-10-14 at 00:57:41 -04, Michael S. Tsirkin wrote:
> >> On Mon, Oct 14, 2024 at 11:12:29AM +0800, Xuan Zhuo wrote:
> >>> In the last linux version, we disabled this feature to fix the
> >>> regress[1].
> >>>
> >>> The patch set is try to fix the problem and re-enable it.
> >>>
> >>> More info: http://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com
> >>>
> >>> Thanks.
> >>>
> >>> [1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> >>
> >> Darren, you previously reported crashes with a patch very similar to 1/5.
> >> Can you please test this patchset and report whether they
> >> are still observed?
> >> If yes, any data on how to reproduce would be very benefitial for Xuan
> >> Zhuo.
> >>
> >
> > I aim to get to this in the next week, but I don't currently have
> > access to a system to test it, it will take a few days at least before I
> > can get one.
>
> I finally a managed to get access to a system to test this on, and it
> looks like things are working with this patch-set. So...
>
> Tested-by: Darren Kenny <darren.kenny@oracle.com>
>
> Thanks,
>
> Darren.
Thanks a lot Darren!
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread