From: "Michael S. Tsirkin" <mst@redhat.com>
To: Si-Wei Liu <si-wei.liu@oracle.com>
Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Jason Wang" <jasowang@redhat.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Paolo Abeni" <pabeni@redhat.com>,
virtualization@lists.linux.dev,
"Darren Kenny" <darren.kenny@oracle.com>,
netdev@vger.kernel.org, "Jakub Kicinski" <kuba@kernel.org>,
"Boris Ostrovsky" <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH net] virtio-net: fix overflow inside virtnet_rq_alloc
Date: Tue, 20 Aug 2024 16:09:06 -0400 [thread overview]
Message-ID: <20240820160342-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <4df66dea-ee7d-640d-0e25-5e27a5ec8899@oracle.com>
On Tue, Aug 20, 2024 at 12:44:46PM -0700, Si-Wei Liu wrote:
>
>
> On 8/20/2024 12:19 AM, Xuan Zhuo wrote:
> > leads to regression on VM with the sysctl value of:
> >
> > - net.core.high_order_alloc_disable=1
> >
> > which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur. In this case, if an overflow is possible, I adjust
> > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > the first buffer of the frag is affected.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c6af18948092..e5286a6da863 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > void *buf, *head;
> > dma_addr_t addr;
> > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > - return NULL;
> > -
> > head = page_address(alloc_frag->page);
> > dma = head;
> > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > len = SKB_DATA_ALIGN(len) +
> > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > + return -ENOMEM;
> > +
> Do you want to document the assumption that small packet case won't end up
> crossing the page frag boundary unlike the mergeable case? Add a comment
> block to explain or a WARN_ON() check against potential overflow would work
> with me.
>
> > buf = virtnet_rq_alloc(rq, len, gfp);
> > if (unlikely(!buf))
> > return -ENOMEM;
> > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > */
> > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > + return -ENOMEM;
> > +
> > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > + len -= sizeof(struct virtnet_rq_dma);
> > +
> This could address my previous concern for possibly regressing every buffer
> size for the mergeable case, thanks. Though I still don't get why carving up
> a small chunk from page_frag for storing the virtnet_rq_dma metadata, this
> would cause perf regression on certain MTU size
4Kbyte MTU exactly?
> that happens to end up with
> one more base page (and an extra descriptor as well) to be allocated
> compared to the previous code without the extra virtnet_rq_dma content. How
> hard would it be to allocate a dedicated struct to store the related
> information without affecting the (size of) datapath pages?
>
> FWIW, out of the code review perspective, I've looked up the past
> conversations but didn't see comprehensive benchmark was done before
> removing the old code and making premap the sole default mode. Granted this
> would reduce the footprint of additional code and the associated maintaining
> cost immediately, but I would assume at least there should have been
> thorough performance runs upfront to guarantee no regression is seen with
> every possible use case, or the negative effect is comparatively negligible
> even though there's slight regression in some limited case. If that kind of
> perf measurement hadn't been done before getting accepted/merged, I think at
> least it should allow both modes to coexist for a while such that every user
> could gauge the performance effect.
>
> Thanks,
> -Siwei
>
> > buf = virtnet_rq_alloc(rq, len + room, gfp);
> > if (unlikely(!buf))
> > return -ENOMEM;
next prev parent reply other threads:[~2024-08-20 20:09 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-20 7:19 [PATCH net] virtio-net: fix overflow inside virtnet_rq_alloc Xuan Zhuo
2024-08-20 15:12 ` Alexander Lobakin
2024-08-20 16:50 ` Michael S. Tsirkin
2024-08-21 16:47 ` Darren Kenny
2024-08-28 19:57 ` Si-Wei Liu
2024-08-29 6:18 ` Xuan Zhuo
2024-08-30 0:54 ` Xuan Zhuo
2024-08-20 19:44 ` Si-Wei Liu
2024-08-20 20:09 ` Michael S. Tsirkin [this message]
2024-08-20 23:05 ` Si-Wei Liu
2024-08-20 20:11 ` Michael S. Tsirkin
2024-08-27 3:38 ` Jason Wang
2024-08-28 11:11 ` Xuan Zhuo
2024-08-29 4:51 ` Jason Wang
2024-08-29 6:21 ` Xuan Zhuo
2024-08-29 7:26 ` Xuan Zhuo
2024-08-29 7:35 ` Michael S. Tsirkin
2024-08-29 7:38 ` Xuan Zhuo
2024-08-29 7:48 ` Michael S. Tsirkin
2024-09-06 8:43 ` Michael S. Tsirkin
2024-09-06 8:53 ` Xuan Zhuo
2024-09-06 9:08 ` Michael S. Tsirkin
2024-09-06 9:25 ` Xuan Zhuo
2024-09-06 9:44 ` Michael S. Tsirkin
2024-09-06 9:46 ` Xuan Zhuo
2024-09-06 9:53 ` Michael S. Tsirkin
2024-09-07 3:16 ` Takero Funaki
2024-09-08 10:18 ` Michael S. Tsirkin
2024-09-08 17:56 ` Takero Funaki
2024-09-09 9:05 ` Darren Kenny
2024-09-09 8:38 ` Jason Wang
2024-09-09 8:43 ` Xuan Zhuo
2024-09-10 6:18 ` Jason Wang
2024-09-10 7:20 ` Xuan Zhuo
2024-09-08 19:40 ` Michael S. Tsirkin
2024-09-09 3:08 ` Xuan Zhuo
2024-09-09 8:47 ` Jason Wang
2024-09-09 8:51 ` Xuan Zhuo
2024-09-10 6:17 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240820160342-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=boris.ostrovsky@oracle.com \
--cc=darren.kenny@oracle.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=si-wei.liu@oracle.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.