From: Eugenio Perez Martin <eperezma@redhat.com>
To: Hawkins Jiawei <yin31149@gmail.com>
Cc: jasowang@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
18801353760@163.com
Subject: Re: [PATCH v3 6/7] vdpa: Avoid forwarding large CVQ command failures
Date: Thu, 17 Aug 2023 13:08:52 +0200 [thread overview]
Message-ID: <CAJaqyWeb5Xu0JaPCg_S_Ay_92NpwsSZR808rTYWcuU1_Aw1EeA@mail.gmail.com> (raw)
In-Reply-To: <267e15e4eed2d7aeb9887f193da99a13d22a2f1d.1688743107.git.yin31149@gmail.com>
On Fri, Jul 7, 2023 at 5:28 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
>
> Due to the size limitation of the out buffer sent to the vdpa device,
> which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ
> command is truncated in QEMU. As a result, the vdpa device rejects
> this flawd CVQ command.
>
> However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET
> CVQ command has a variable length, which may exceed
> vhost_vdpa_net_cvq_cmd_len() if the guest sets more than
> `MAC_TABLE_ENTRIES` MAC addresses for the filter table.
>
> This patch solves this problem by following steps:
>
> * Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(),
> which represents the size of the buffer that is allocated and mmaped.
> This ensures that everything works correctly as long as the guest
> sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
> sizeof(struct virtio_net_ctrl_hdr)
> - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.
> Considering the highly unlikely scenario for the guest setting
> more than that number of MAC addresses for the filter table, this
> should work fine for the majority of cases.
>
> * If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(),
> instead of directly sending this CVQ command, QEMU should send
> a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally,
> a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including
> (`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and
> (`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided
> to the device model.
> By doing so, the vdpa device turns promiscuous mode on, aligning
> with the VirtIO standard. The device model marks
> `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
> which aligns with the state of the vdpa device.
>
> Note that the bug cannot be triggered at the moment, since
> VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ.
>
> Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> net/vhost-vdpa.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 161 insertions(+), 1 deletion(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a84eb088a0..a4ff6c52b7 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -916,6 +916,148 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
> .check_peer_type = vhost_vdpa_check_peer_type,
> };
>
> +/*
> + * Forward the excessive VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command to
> + * vdpa device.
> + *
> + * Considering that QEMU cannot send the entire filter table to the
> + * vdpa device, it should send the VIRTIO_NET_CTRL_RX_PROMISC CVQ
> + * command to enable promiscuous mode to receive all packets,
> + * according to VirtIO standard, "Since there are no guarantees,
> + * it can use a hash filter or silently switch to allmulti or
> + * promiscuous mode if it is given too many addresses.".
> + *
> + * Since QEMU ignores MAC addresses beyond `MAC_TABLE_ENTRIES` and
> + * marks `n->mac_table.x_overflow` accordingly, it should have
> + * the same effect on the device model to receive
> + * (`MAC_TABLE_ENTRIES` + 1) or more non-multicast MAC addresses.
> + * The same applies to multicast MAC addresses.
> + *
> + * Therefore, QEMU can provide the device model with a fake
> + * VIRTIO_NET_CTRL_MAC_TABLE_SET command with (`MAC_TABLE_ENTRIES` + 1)
> + * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast
> + * MAC addresses. This ensures that the device model marks
> + * `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
> + * allowing all packets to be received, which aligns with the
> + * state of the vdpa device.
> + */
> +static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s,
> + VirtQueueElement *elem,
> + struct iovec *out)
> +{
> + struct virtio_net_ctrl_mac mac_data, *mac_ptr;
> + struct virtio_net_ctrl_hdr *hdr_ptr;
> + uint32_t cursor;
> + ssize_t r;
> +
> + /* parse the non-multicast MAC address entries from CVQ command */
> + cursor = sizeof(*hdr_ptr);
> + r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
> + &mac_data, sizeof(mac_data));
> + if (unlikely(r != sizeof(mac_data))) {
> + /*
> + * If the CVQ command is invalid, we should simulate the vdpa device
> + * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> + */
> + *s->status = VIRTIO_NET_ERR;
> + return sizeof(*s->status);
> + }
> + cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
> +
> + /* parse the multicast MAC address entries from CVQ command */
> + r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
> + &mac_data, sizeof(mac_data));
> + if (r != sizeof(mac_data)) {
> + /*
> + * If the CVQ command is invalid, we should simulate the vdpa device
> + * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> + */
> + *s->status = VIRTIO_NET_ERR;
> + return sizeof(*s->status);
> + }
> + cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
> +
> + /* validate the CVQ command */
> + if (iov_size(elem->out_sg, elem->out_num) != cursor) {
> + /*
> + * If the CVQ command is invalid, we should simulate the vdpa device
> + * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> + */
> + *s->status = VIRTIO_NET_ERR;
> + return sizeof(*s->status);
> + }
> +
> + /*
> + * According to VirtIO standard, "Since there are no guarantees,
> + * it can use a hash filter or silently switch to allmulti or
> + * promiscuous mode if it is given too many addresses.".
> + *
> + * Therefore, considering that QEMU is unable to send the entire
> + * filter table to the vdpa device, it should send the
> + * VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode
> + */
> + r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1);
> + if (unlikely(r < 0)) {
> + return r;
> + }
> + if (*s->status != VIRTIO_NET_OK) {
> + return sizeof(*s->status);
> + }
> +
> + /*
> + * QEMU should also send a fake VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ
> + * command to the device model, including (`MAC_TABLE_ENTRIES` + 1)
> + * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1)
> + * multicast MAC addresses.
> + *
> + * By doing so, the device model can mark `n->mac_table.uni_overflow`
> + * and `n->mac_table.multi_overflow`, enabling all packets to be
> + * received, which aligns with the state of the vdpa device.
> + */
> + cursor = 0;
> + uint32_t fake_uni_entries = MAC_TABLE_ENTRIES + 1,
> + fake_mul_entries = MAC_TABLE_ENTRIES + 1,
> + fake_cvq_size = sizeof(struct virtio_net_ctrl_hdr) +
> + sizeof(mac_data) + fake_uni_entries * ETH_ALEN +
> + sizeof(mac_data) + fake_mul_entries * ETH_ALEN;
> +
> + assert(fake_cvq_size < vhost_vdpa_net_cvq_cmd_page_len());
> + out->iov_len = fake_cvq_size;
> +
> + /* pack the header for fake CVQ command */
> + hdr_ptr = out->iov_base + cursor;
> + hdr_ptr->class = VIRTIO_NET_CTRL_MAC;
> + hdr_ptr->cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
> + cursor += sizeof(*hdr_ptr);
> +
> + /*
> + * Pack the non-multicast MAC addresses part for fake CVQ command.
> + *
> + * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
> + * addresses provieded in CVQ command. Therefore, only the entries
> + * field need to be prepared in the CVQ command.
> + */
> + mac_ptr = out->iov_base + cursor;
> + mac_ptr->entries = cpu_to_le32(fake_uni_entries);
> + cursor += sizeof(*mac_ptr) + fake_uni_entries * ETH_ALEN;
> +
> + /*
> + * Pack the multicast MAC addresses part for fake CVQ command.
> + *
> + * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
> + * addresses provieded in CVQ command. Therefore, only the entries
> + * field need to be prepared in the CVQ command.
> + */
> + mac_ptr = out->iov_base + cursor;
> + mac_ptr->entries = cpu_to_le32(fake_mul_entries);
> +
> + /*
> + * Simulating QEMU poll a vdpa device used buffer
> + * for VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> + */
> + return sizeof(*s->status);
> +}
> +
> /**
> * Validate and copy control virtqueue commands.
> *
> @@ -943,7 +1085,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
>
> out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
> s->cvq_cmd_out_buffer,
> - vhost_vdpa_net_cvq_cmd_len());
> + vhost_vdpa_net_cvq_cmd_page_len());
>
> ctrl = s->cvq_cmd_out_buffer;
> if (ctrl->class == VIRTIO_NET_CTRL_ANNOUNCE) {
> @@ -953,6 +1095,24 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> */
> dev_written = sizeof(status);
> *s->status = VIRTIO_NET_OK;
> + } else if (unlikely(ctrl->class == VIRTIO_NET_CTRL_MAC &&
> + ctrl->cmd == VIRTIO_NET_CTRL_MAC_TABLE_SET &&
> + iov_size(elem->out_sg, elem->out_num) > out.iov_len)) {
> + /*
> + * Due to the size limitation of the out buffer sent to the vdpa device,
> + * which is determined by vhost_vdpa_net_cvq_cmd_page_len(), excessive
> + * MAC addresses set by the driver for the filter table can cause
> + * truncation of the CVQ command in QEMU. As a result, the vdpa device
> + * rejects the flawed CVQ command.
> + *
> + * Therefore, QEMU must handle this situation instead of sending
> + * the CVQ command direclty.
> + */
> + dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem,
> + &out);
> + if (unlikely(dev_written < 0)) {
> + goto out;
> + }
> } else {
> dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
> if (unlikely(dev_written < 0)) {
> --
> 2.25.1
>
next prev parent reply other threads:[~2023-08-17 11:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-07 15:27 [PATCH v3 0/7] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support Hawkins Jiawei
2023-07-07 15:27 ` [PATCH v3 1/7] vdpa: Use iovec for vhost_vdpa_net_load_cmd() Hawkins Jiawei
2023-08-17 9:23 ` Eugenio Perez Martin
2023-08-17 12:42 ` Hawkins Jiawei
2023-08-17 14:05 ` Eugenio Perez Martin
2023-08-17 14:47 ` Hawkins Jiawei
2023-07-07 15:27 ` [PATCH v3 2/7] vdpa: Restore MAC address filtering state Hawkins Jiawei
2023-08-17 10:18 ` Eugenio Perez Martin
2023-08-17 12:46 ` Hawkins Jiawei
2023-08-17 14:24 ` Eugenio Perez Martin
2023-07-07 15:27 ` [PATCH v3 3/7] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature Hawkins Jiawei
2023-08-17 10:19 ` Eugenio Perez Martin
2023-07-07 15:27 ` [PATCH v3 4/7] vhost: Fix false positive out-of-bounds Hawkins Jiawei
2023-08-17 10:20 ` Eugenio Perez Martin
2023-07-07 15:27 ` [PATCH v3 5/7] vdpa: Accessing CVQ header through its structure Hawkins Jiawei
2023-08-17 10:33 ` Eugenio Perez Martin
2023-07-07 15:27 ` [PATCH v3 6/7] vdpa: Avoid forwarding large CVQ command failures Hawkins Jiawei
2023-08-17 11:08 ` Eugenio Perez Martin [this message]
2023-07-07 15:27 ` [PATCH v3 7/7] vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ Hawkins Jiawei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJaqyWeb5Xu0JaPCg_S_Ay_92NpwsSZR808rTYWcuU1_Aw1EeA@mail.gmail.com \
--to=eperezma@redhat.com \
--cc=18801353760@163.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yin31149@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).