From: "Michael S. Tsirkin" <mst@redhat.com>
To: Wencheng Yang <east.moutain.yang@gmail.com>
Cc: qemu-devel@nongnu.org, jasowang@redhat.com, qemu-stable@nongnu.org
Subject: Re: [PATCH v4] virtio-net: Notify the guest with the latest available idx
Date: Tue, 25 Jun 2024 06:17:24 -0400 [thread overview]
Message-ID: <20240625061013-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240625094851.5252-1-east.moutain.yang@gmail.com>
On Tue, Jun 25, 2024 at 05:48:51PM +0800, Wencheng Yang wrote:
> From: thomas <east.moutain.yang@gmail.com>
>
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check.
>
> It will be lucky if the available buffer size becomes okay
> after the double-check, then the host can send the packet to
> the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, notify the guest
> with the latest available idx seen by the host, similiar to
> the action taken by the host after the first check, else
> viritio-net would stall at the host side forever.
>
> The case below can reproduce the stall.
>
> Guest 0
> +--------+
> | iperf |
> ---------------> | server |
> Host | +--------+
> +--------+ | ...
> | iperf |----
> | client |---- Guest n
> +--------+ | +--------+
> | | iperf |
> ---------------> | server |
> +--------+
>
> Boot many guests from qemu with virtio network:
> qemu ... -netdev tap,id=net_x \
> -device virtio-net-pci-non-transitional,\
> iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
>
> Each guest acts as iperf server with commands below:
> iperf3 -s -D -i 10 -p 8001
> iperf3 -s -D -i 10 -p 8002
>
> The host as iperf client:
> iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
> iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
>
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from host.
>
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
>
> receive packets ---
> | |
> v |
> free buf virtnet_poll
> | |
> v |
> add buf to avail ring ---
> |
> | need kick the host?
> | NAPI continues
> v
> receive packets ---
> | |
> v |
> free buf virtnet_poll
> | |
> v |
> add buf to avail ring ---
> |
> v
> ... ...
>
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
>
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
>
> The host may stall forever at the sequences below:
>
> Host Guest
> ------------ -----------
> fetch buf, send packet receive packet ---
> ... ... |
> fetch buf, send packet add buf |
> ... add buf virtnet_poll
> buf not enough avail idx-> add buf |
> read avail idx add buf |
> add buf ---
> receive packet ---
> write event idx ... |
> waiting for kick add buf virtnet_poll
> ... |
> ---
> no more packet, exit NAPI
>
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buf.
> step 1: The buf is not enough, for example, a big packet
> needs 5 buf, but the available buf count is 3.
> The host read current avail idx.
> step 2: The guest adds some buf, then checks whether the
> host is waiting for kick signal, not at this time.
> The used ring is not empty, the guest continues
> the second loop of NAPI.
> step 3: The host write the avail idx readed from avail
> ring to used ring as event idx via
> virtio_queue_set_notification(q->rx_vq, 1).
> step 4: At the end of the second loop of NAPI, recheck
> whether kick is needed, as the event idx in the
> used ring written by the host is beyound the
> range of kick condition, the guest will not
> send kick signal to the host.
>
> The patch notifies the guest with the latest avail idx seen
> by the host in the double-check, which increases the
> probability of hitting kick condition, but logically
> speaking, it can't resolve the issue. It's kind of
> optimization agianst patch 06b12970174.
So let's try to resolve the issue instead.
> Changelog:
> v4:
> - Correct spelling mistake in the subject
> - Describe the issue that virtio-net is blocked at host side
>
> v3:
> - Add virtio-net tag in the subject
> - Refine commit log
>
> v2:
> - Add SOB tag at the end of the commit message
> - Place Fixes tag at the end of the commit message
>
> v1:
> - Initial patch
>
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
> ---
> hw/net/virtio-net.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..23c6c8c898 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1654,6 +1654,7 @@ static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
> if (virtio_queue_empty(q->rx_vq) ||
> (n->mergeable_rx_bufs &&
> !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> + virtio_queue_set_notification(q->rx_vq, 1);
> return 0;
> }
Fundamentally, this is why e.g. vhost_enable_notify is a bool
and callers do things like:
... if (unlikely(vhost_enable_notify(&net->dev, vq))) {
vhost_disable_notify(&net->dev, vq);
continue;
}
break;
> }
> --
> 2.39.0
prev parent reply other threads:[~2024-06-25 10:17 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-25 9:48 [PATCH v4] virtio-net: Notify the guest with the latest available idx Wencheng Yang
2024-06-25 10:17 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240625061013-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=east.moutain.yang@gmail.com \
--cc=jasowang@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).