From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
Hawkins Jiawei <yin31149@gmail.com>,
Jason Wang <jasowang@redhat.com>
Subject: [PULL 63/66] vdpa: Avoid forwarding large CVQ command failures
Date: Mon, 10 Jul 2023 19:05:24 -0400 [thread overview]
Message-ID: <fee364e4b1ad82a78929f3eed114321f77b6f916.1689030052.git.mst@redhat.com> (raw)
In-Reply-To: <cover.1689030052.git.mst@redhat.com>
From: Hawkins Jiawei <yin31149@gmail.com>
Due to the size limitation of the out buffer sent to the vdpa device,
which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ
command is truncated in QEMU. As a result, the vdpa device rejects
this flawd CVQ command.
However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET
CVQ command has a variable length, which may exceed
vhost_vdpa_net_cvq_cmd_len() if the guest sets more than
`MAC_TABLE_ENTRIES` MAC addresses for the filter table.
This patch solves this problem by following steps:
* Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(),
which represents the size of the buffer that is allocated and mmaped.
This ensures that everything works correctly as long as the guest
sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
sizeof(struct virtio_net_ctrl_hdr)
- 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.
Considering the highly unlikely scenario for the guest setting
more than that number of MAC addresses for the filter table, this
should work fine for the majority of cases.
* If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(),
instead of directly sending this CVQ command, QEMU should send
a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally,
a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including
(`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and
(`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided
to the device model.
By doing so, the vdpa device turns promiscuous mode on, aligning
with the VirtIO standard. The device model marks
`n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
which aligns with the state of the vdpa device.
Note that the bug cannot be triggered at the moment, since
VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ.
Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <267e15e4eed2d7aeb9887f193da99a13d22a2f1d.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
net/vhost-vdpa.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 161 insertions(+), 1 deletion(-)
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index eee4b2a09e..7610589a60 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -922,6 +922,148 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
.check_peer_type = vhost_vdpa_check_peer_type,
};
+/*
+ * Forward the excessive VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command to
+ * vdpa device.
+ *
+ * Considering that QEMU cannot send the entire filter table to the
+ * vdpa device, it should send the VIRTIO_NET_CTRL_RX_PROMISC CVQ
+ * command to enable promiscuous mode to receive all packets,
+ * according to VirtIO standard, "Since there are no guarantees,
+ * it can use a hash filter or silently switch to allmulti or
+ * promiscuous mode if it is given too many addresses.".
+ *
+ * Since QEMU ignores MAC addresses beyond `MAC_TABLE_ENTRIES` and
+ * marks `n->mac_table.x_overflow` accordingly, it should have
+ * the same effect on the device model to receive
+ * (`MAC_TABLE_ENTRIES` + 1) or more non-multicast MAC addresses.
+ * The same applies to multicast MAC addresses.
+ *
+ * Therefore, QEMU can provide the device model with a fake
+ * VIRTIO_NET_CTRL_MAC_TABLE_SET command with (`MAC_TABLE_ENTRIES` + 1)
+ * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast
+ * MAC addresses. This ensures that the device model marks
+ * `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
+ * allowing all packets to be received, which aligns with the
+ * state of the vdpa device.
+ */
+static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s,
+ VirtQueueElement *elem,
+ struct iovec *out)
+{
+ struct virtio_net_ctrl_mac mac_data, *mac_ptr;
+ struct virtio_net_ctrl_hdr *hdr_ptr;
+ uint32_t cursor;
+ ssize_t r;
+
+ /* parse the non-multicast MAC address entries from CVQ command */
+ cursor = sizeof(*hdr_ptr);
+ r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
+ &mac_data, sizeof(mac_data));
+ if (unlikely(r != sizeof(mac_data))) {
+ /*
+ * If the CVQ command is invalid, we should simulate the vdpa device
+ * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+ */
+ *s->status = VIRTIO_NET_ERR;
+ return sizeof(*s->status);
+ }
+ cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
+
+ /* parse the multicast MAC address entries from CVQ command */
+ r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
+ &mac_data, sizeof(mac_data));
+ if (r != sizeof(mac_data)) {
+ /*
+ * If the CVQ command is invalid, we should simulate the vdpa device
+ * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+ */
+ *s->status = VIRTIO_NET_ERR;
+ return sizeof(*s->status);
+ }
+ cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
+
+ /* validate the CVQ command */
+ if (iov_size(elem->out_sg, elem->out_num) != cursor) {
+ /*
+ * If the CVQ command is invalid, we should simulate the vdpa device
+ * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+ */
+ *s->status = VIRTIO_NET_ERR;
+ return sizeof(*s->status);
+ }
+
+ /*
+ * According to VirtIO standard, "Since there are no guarantees,
+ * it can use a hash filter or silently switch to allmulti or
+ * promiscuous mode if it is given too many addresses.".
+ *
+ * Therefore, considering that QEMU is unable to send the entire
+ * filter table to the vdpa device, it should send the
+ * VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode
+ */
+ r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1);
+ if (unlikely(r < 0)) {
+ return r;
+ }
+ if (*s->status != VIRTIO_NET_OK) {
+ return sizeof(*s->status);
+ }
+
+ /*
+ * QEMU should also send a fake VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ
+ * command to the device model, including (`MAC_TABLE_ENTRIES` + 1)
+ * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1)
+ * multicast MAC addresses.
+ *
+ * By doing so, the device model can mark `n->mac_table.uni_overflow`
+ * and `n->mac_table.multi_overflow`, enabling all packets to be
+ * received, which aligns with the state of the vdpa device.
+ */
+ cursor = 0;
+ uint32_t fake_uni_entries = MAC_TABLE_ENTRIES + 1,
+ fake_mul_entries = MAC_TABLE_ENTRIES + 1,
+ fake_cvq_size = sizeof(struct virtio_net_ctrl_hdr) +
+ sizeof(mac_data) + fake_uni_entries * ETH_ALEN +
+ sizeof(mac_data) + fake_mul_entries * ETH_ALEN;
+
+ assert(fake_cvq_size < vhost_vdpa_net_cvq_cmd_page_len());
+ out->iov_len = fake_cvq_size;
+
+ /* pack the header for fake CVQ command */
+ hdr_ptr = out->iov_base + cursor;
+ hdr_ptr->class = VIRTIO_NET_CTRL_MAC;
+ hdr_ptr->cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+ cursor += sizeof(*hdr_ptr);
+
+ /*
+ * Pack the non-multicast MAC addresses part for fake CVQ command.
+ *
+ * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
+ * addresses provieded in CVQ command. Therefore, only the entries
+ * field need to be prepared in the CVQ command.
+ */
+ mac_ptr = out->iov_base + cursor;
+ mac_ptr->entries = cpu_to_le32(fake_uni_entries);
+ cursor += sizeof(*mac_ptr) + fake_uni_entries * ETH_ALEN;
+
+ /*
+ * Pack the multicast MAC addresses part for fake CVQ command.
+ *
+ * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
+ * addresses provieded in CVQ command. Therefore, only the entries
+ * field need to be prepared in the CVQ command.
+ */
+ mac_ptr = out->iov_base + cursor;
+ mac_ptr->entries = cpu_to_le32(fake_mul_entries);
+
+ /*
+ * Simulating QEMU poll a vdpa device used buffer
+ * for VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+ */
+ return sizeof(*s->status);
+}
+
/**
* Validate and copy control virtqueue commands.
*
@@ -949,7 +1091,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
s->cvq_cmd_out_buffer,
- vhost_vdpa_net_cvq_cmd_len());
+ vhost_vdpa_net_cvq_cmd_page_len());
ctrl = s->cvq_cmd_out_buffer;
if (ctrl->class == VIRTIO_NET_CTRL_ANNOUNCE) {
@@ -959,6 +1101,24 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
*/
dev_written = sizeof(status);
*s->status = VIRTIO_NET_OK;
+ } else if (unlikely(ctrl->class == VIRTIO_NET_CTRL_MAC &&
+ ctrl->cmd == VIRTIO_NET_CTRL_MAC_TABLE_SET &&
+ iov_size(elem->out_sg, elem->out_num) > out.iov_len)) {
+ /*
+ * Due to the size limitation of the out buffer sent to the vdpa device,
+ * which is determined by vhost_vdpa_net_cvq_cmd_page_len(), excessive
+ * MAC addresses set by the driver for the filter table can cause
+ * truncation of the CVQ command in QEMU. As a result, the vdpa device
+ * rejects the flawed CVQ command.
+ *
+ * Therefore, QEMU must handle this situation instead of sending
+ * the CVQ command direclty.
+ */
+ dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem,
+ &out);
+ if (unlikely(dev_written < 0)) {
+ goto out;
+ }
} else {
dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
if (unlikely(dev_written < 0)) {
--
MST
next prev parent reply other threads:[~2023-07-10 23:09 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-10 23:02 [PULL 00/66] pc,pci,virtio: cleanups, fixes, features Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 01/66] vdpa: Remove status in reset tracing Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 02/66] vhost: register and change IOMMU flag depending on Device-TLB state Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 03/66] virtio-net: pass Device-TLB enable/disable events to vhost Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 04/66] virtio-gpu: refactor generate_edid function to virtio_gpu_base Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 05/66] docs: vhost-user-gpu: add protocol changes for EDID Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 06/66] contrib/vhost-user-gpu: implement get_edid feature Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 07/66] vhost-user-gpu: implement get_edid frontend feature Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 08/66] hw/virtio: Add boilerplate for vhost-user-scmi device Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 09/66] hw/virtio: Add vhost-user-scmi-pci boilerplate Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 10/66] tests/qtest: enable tests for virtio-scmi Michael S. Tsirkin
2023-07-18 12:42 ` Thomas Huth
2023-07-18 12:55 ` Milan Zamazal
2023-07-19 13:02 ` Thomas Huth
2023-07-19 19:56 ` Milan Zamazal
2023-07-20 8:40 ` Thomas Huth
2023-07-20 9:25 ` Milan Zamazal
2023-07-19 20:51 ` Fabiano Rosas
2023-07-20 8:34 ` Milan Zamazal
2023-07-10 23:02 ` [PULL 11/66] machine: Add helpers to get cores/threads per socket Michael S. Tsirkin
2023-07-10 23:02 ` [PULL 12/66] hw/smbios: Fix smbios_smp_sockets caculation Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 13/66] hw/smbios: Fix thread count in type4 Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 14/66] hw/smbios: Fix core " Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 15/66] vhost-user: Change one_time to per_device request Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 16/66] vhost-user: Make RESET_DEVICE a per device message Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 17/66] hw/i386/pc_q35: Resolve redundant q35_host variable Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 18/66] hw/pci-host/q35: Fix double, contradicting .endianness assignment Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 19/66] hw/pci-host/q35: Initialize PCMachineState::bus in board code Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 20/66] hw/pci/pci_host: Introduce PCI_HOST_BYPASS_IOMMU macro Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 21/66] hw/pci-host/q35: Initialize PCI_HOST_BYPASS_IOMMU property from board code Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 22/66] hw/pci-host/q35: Make some property name macros reusable by i440fx Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 23/66] hw/i386/pc_piix: Turn some local variables into initializers Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 24/66] hw/pci-host/i440fx: Add "i440fx" child property in board code Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 25/66] hw/pci-host/i440fx: Replace magic values by existing constants Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 26/66] hw/pci-host/i440fx: Have common names for some local variables Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 27/66] hw/pci-host/i440fx: Move i440fx_realize() into PCII440FXState section Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 28/66] hw/pci-host/i440fx: Make MemoryRegion pointers accessible as properties Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 29/66] hw/pci-host/i440fx: Add PCI_HOST_PROP_IO_MEM property Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 30/66] hw/pci-host/i440fx: Add PCI_HOST_{ABOVE, BELOW}_4G_MEM_SIZE properties Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 31/66] hw/pci-host/i440fx: Add I440FX_HOST_PROP_PCI_TYPE property Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 32/66] hw/pci-host/i440fx: Resolve i440fx_init() Michael S. Tsirkin
2023-07-10 23:03 ` [PULL 33/66] hw/i386/pc_piix: Move i440fx' realize near its qdev_new() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 34/66] hw/pci/pci: Remove multifunction parameter from pci_create_simple_multifunction() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 35/66] hw/pci/pci: Remove multifunction parameter from pci_new_multifunction() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 36/66] pcie: Release references of virtual functions Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 37/66] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 38/66] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 39/66] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 40/66] vhost-vdpa: mute unaligned memory error report Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 41/66] tests/acpi: allow changes in DSDT.noacpihp table blob Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 42/66] tests/acpi/bios-tables-test: use the correct slot on the pcie-root-port Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 43/66] tests/acpi/bios-tables-test: update acpi blob q35/DSDT.noacpihp Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 44/66] tests/qtest/hd-geo-test: fix incorrect pcie-root-port usage and simplify test Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 45/66] hw/pci: warn when PCIe device is plugged into non-zero slot of downstream port Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 46/66] virtio-iommu: Fix 64kB host page size VFIO device assignment Michael S. Tsirkin
2023-07-17 10:50 ` Peter Maydell
2023-07-17 11:51 ` Michael S. Tsirkin
2023-07-17 16:57 ` Eric Auger
2023-07-17 16:56 ` Eric Auger
2023-07-17 17:07 ` Peter Maydell
2023-07-17 17:20 ` Eric Auger
2023-07-10 23:04 ` [PULL 47/66] virtio-iommu: Rework the traces in virtio_iommu_set_page_size_mask() Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 48/66] pcie: Add hotplug detect state register to cmask Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 49/66] vdpa: Fix possible use-after-free for VirtQueueElement Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 50/66] include: attempt to document device_class_set_props Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 51/66] include/hw: document the device_class_set_parent_* fns Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 52/66] hw/virtio: fix typo in VIRTIO_CONFIG_IRQ_IDX comments Michael S. Tsirkin
2023-07-10 23:04 ` [PULL 53/66] include/hw/virtio: document virtio_notify_config Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 54/66] include/hw/virtio: add kerneldoc for virtio_init Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 55/66] include/hw/virtio: document some more usage of notifiers Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 56/66] pcie: Use common ARI next function number Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 57/66] pcie: Specify 0 for ARI next function numbers Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 58/66] vdpa: Use iovec for vhost_vdpa_net_load_cmd() Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 59/66] vdpa: Restore MAC address filtering state Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 60/66] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 61/66] vhost: Fix false positive out-of-bounds Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 62/66] vdpa: Accessing CVQ header through its structure Michael S. Tsirkin
2023-07-10 23:05 ` Michael S. Tsirkin [this message]
2023-07-10 23:05 ` [PULL 64/66] vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 65/66] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature Michael S. Tsirkin
2023-07-10 23:05 ` [PULL 66/66] vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ Michael S. Tsirkin
2023-07-11 10:07 ` [PULL 00/66] pc,pci,virtio: cleanups, fixes, features Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fee364e4b1ad82a78929f3eed114321f77b6f916.1689030052.git.mst@redhat.com \
--to=mst@redhat.com \
--cc=jasowang@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=yin31149@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).