public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed
From: Demi Marie Obenour <demiobenour@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>
Subject: Re: Should there be a mode in which the virtqueue -> MSI mapping is fixed?
Date: Sun, 5 Apr 2026 18:28:56 -0400	[thread overview]
Message-ID: <005c6635-9861-48fd-a2f7-225a743f9f10@gmail.com> (raw)
In-Reply-To: <20260405174833-mutt-send-email-mst@kernel.org>


[-- Attachment #1.1.1: Type: text/plain, Size: 6733 bytes --]

On 4/5/26 17:50, Michael S. Tsirkin wrote:
> On Sun, Apr 05, 2026 at 05:47:19PM -0400, Demi Marie Obenour wrote:
>> On 4/5/26 17:09, Michael S. Tsirkin wrote:
>>> On Sun, Apr 05, 2026 at 04:58:39PM -0400, Demi Marie Obenour wrote:
>>>> On 4/5/26 16:15, Michael S. Tsirkin wrote:
>>>>> On Sun, Apr 05, 2026 at 01:50:25PM -0400, Demi Marie Obenour wrote:
>>>>>> On 4/4/26 20:56, Michael S. Tsirkin wrote:
>>>>>>> On Sat, Apr 04, 2026 at 05:19:41PM -0400, Demi Marie Obenour wrote:
>>>>>>>> Cloud Hypervisor's vhost-user frontend does not implement MSI-X
>>>>>>>> properly [1].  Specifically:
>>>>>>>>
>>>>>>>> 1. Reads from the Pending Bit Array (PBA) always return 0.
>>>>>>>> 2. Changes to the MSI associated with a virtqueue after the device
>>>>>>>>    is activated are ignored.
>>>>>>>>    
>>>>>>>> Amazingly, there have not been any reports of this causing breakage.
>>>>>>>> I have a fix for the first [2], which actually decreases the amount
>>>>>>>> of code.  However, the second is trickier and I'm tempted to not
>>>>>>>> bother unless it causes real-world problems.
>>>>>>>>
>>>>>>>> Are there real-world drivers that will run into either of the above
>>>>>>>> bugs?  Linux seems to only choose anything else as a fallback, which
>>>>>>>> presumably is not triggered.
>>>>>>>>
>>>>>>>> [1]: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/7813
>>>>>>>> [2]: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7963
>>>>>>>
>>>>>>> It will sometimes trigger.
>>>>>>
>>>>>> Would it be possible to provide an example?  A reproducible test
>>>>>> case would be ideal, but conditions under which this will trigger
>>>>>> are also sufficient.
>>>>>
>>>>>
>>>>> I am not sure what does "is activated" mean.
>>>>> For example, on latest Linux:
>>>>>
>>>>>         vq = vp_find_one_vq_msix(vdev, avq->vq_index, vp_modern_avq_done,
>>>>>                                  avq->name, false, true, &allocated_vectors,
>>>>>                                  vector_policy, &vp_dev->admin_vq.info);
>>>>>         if (IS_ERR(vq)) {
>>>>>                 err = PTR_ERR(vq);
>>>>>                 goto error_find;
>>>>>         }
>>>>>
>>>>>         return 0;
>>>>>
>>>>> error_find:
>>>>>         vp_del_vqs(vdev);
>>>>>         return err;
>>>>> }
>>>>>
>>>>>
>>>>> And 
>>>>>
>>>>> static void del_vq(struct virtio_pci_vq_info *info)
>>>>> {
>>>>>         struct virtqueue *vq = info->vq;
>>>>>         struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>>>>         struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>>>>
>>>>>         if (vp_dev->msix_enabled)
>>>>>                 vp_modern_queue_vector(mdev, vq->index,
>>>>>                                        VIRTIO_MSI_NO_VECTOR);
>>>>>
>>>>>         if (!mdev->notify_base)
>>>>>                 pci_iounmap(mdev->pci_dev, (void __force __iomem *)vq->priv);
>>>>>
>>>>>         vring_del_virtqueue(vq);
>>>>> }
>>>>>
>>>>>
>>>>> and this happens after feature negotiation.
>>>>>
>>>>> It's before device_ready, however.
>>>>
>>>> Are there drivers that will change virtqueue => MSI-X vector mappings
>>>> after DRIVER_OK without an intervening reset?  Cloud Hypervisor
>>>> supports this for devices it implements internally, but it ignores
>>>> such changes for vhost-user devices.  Is this going to cause problems
>>>> in practice?
>>>
>>>
>>> Also yes.
>>>
>>> For example, dpdk uses this during cleanup to block interrupts:
>>> drivers/net/virtio/virtio_ethdev.c
>>>
>>>
>>> int
>>> virtio_dev_close(struct rte_eth_dev *dev)
>>> {       
>>>         struct virtio_hw *hw = dev->data->dev_private;
>>>         struct rte_eth_intr_conf *intr_conf = &dev->data->dev_conf.intr_conf;
>>>                                              
>>>         PMD_INIT_LOG(DEBUG, "virtio_dev_close");
>>>         if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>>                 return 0;
>>>
>>>         if (!hw->opened)
>>>                 return 0;
>>>         hw->opened = 0;
>>>
>>>         /* reset the NIC */
>>>         if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
>>>                 VIRTIO_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
>>>         if (intr_conf->rxq)
>>>                 virtio_queues_unbind_intr(dev);
>>>
>>>         if (intr_conf->lsc || intr_conf->rxq) {
>>>                 virtio_intr_disable(dev);
>>>                 rte_intr_efd_disable(dev->intr_handle);
>>>                 rte_intr_vec_list_free(dev->intr_handle);
>>>         }
>>>
>>>         virtio_reset(hw);
>>>         virtio_dev_free_mbufs(dev);
>>>         virtio_free_queues(hw);
>>>         virtio_free_rss(hw);
>>>
>>>         return VIRTIO_OPS(hw)->dev_close(hw);
>>> }
>>
>> What would happen if DPDK received a spurious MSI-X interrupt during
>> this time?
> 
> Could be a UAF?

Indeed it could, provided that DPDK is actually able to get the interrupt.

>> I know that this is non-compliant with the virtio specification.
>> However, a problem that will trigger in practice, or which can be
>> used for an exploit, is far more severe (and thus far more important
>> to fix) than one that does not have practical consequences.
>> -- 
>> Sincerely,
>> Demi Marie Obenour (she/her/hers)
> 
> 
> 
> You will have to read the code, and lots of old versions of it, to
> check.

Makes sense.

Right now, I'm working a spec for a new device (vhost-guest, formally
virtio-vhost-user).  It's a vhost-user server, so it is a virtio
device used by one guest VM (backend) to implement a virtio device
for use by another guest VM (frontend).  Yes, it's confusing!

The spec I am using as a basis is
<https://stefanha.github.io/virtio/vhost-user-slave.html>.  Both its
terminology and the spec itself are outdated and I'm going to be
heavily changing it before submission.  The current (unpublished)
version states that a vhost-guest device has 4 virtqueues in two pairs.
One pair is for requests from frontend to backend and responses
from backend to frontend.  The other is for requests from backend
to frontend and responses from frontend to backend.  However, the
device that the driver is implementing *also* has its own virtqueues,
and those need extra notifications.

The spec I am using as reference states that these notifications are
treated similar to existing virtqueue notifications.  In particular,
the MSI-X vector associated with them can be changed at will.
How important is it to support this in the version of the spec
I submit?  How important is it to allow the same MSI-X vector to be
used for multiple notifications?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2026-04-05 22:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 21:19 Should there be a mode in which the virtqueue -> MSI mapping is fixed? Demi Marie Obenour
2026-04-05  0:56 ` Michael S. Tsirkin
2026-04-05 17:50   ` Demi Marie Obenour
2026-04-05 20:15     ` Michael S. Tsirkin
2026-04-05 20:58       ` Demi Marie Obenour
2026-04-05 21:09         ` Michael S. Tsirkin
2026-04-05 21:47           ` Demi Marie Obenour
2026-04-05 21:50             ` Michael S. Tsirkin
2026-04-05 22:28               ` Demi Marie Obenour [this message]
2026-04-06  9:25                 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=005c6635-9861-48fd-a2f7-225a743f9f10@gmail.com \
    --to=demiobenour@gmail.com \
    --cc=mst@redhat.com \
    --cc=virtio-comment@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox