Re: Should there be a mode in which the virtqueue -> MSI mapping is fixed?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Demi Marie Obenour <demiobenour@gmail.com>
Cc: "virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>
Subject: Re: Should there be a mode in which the virtqueue -> MSI mapping is fixed?
Date: Sun, 5 Apr 2026 17:09:53 -0400	[thread overview]
Message-ID: <20260405170510-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <700f6f7c-53d6-4023-9a5f-1296eacfff37@gmail.com>

On Sun, Apr 05, 2026 at 04:58:39PM -0400, Demi Marie Obenour wrote:
> On 4/5/26 16:15, Michael S. Tsirkin wrote:
> > On Sun, Apr 05, 2026 at 01:50:25PM -0400, Demi Marie Obenour wrote:
> >> On 4/4/26 20:56, Michael S. Tsirkin wrote:
> >>> On Sat, Apr 04, 2026 at 05:19:41PM -0400, Demi Marie Obenour wrote:
> >>>> Cloud Hypervisor's vhost-user frontend does not implement MSI-X
> >>>> properly [1].  Specifically:
> >>>>
> >>>> 1. Reads from the Pending Bit Array (PBA) always return 0.
> >>>> 2. Changes to the MSI associated with a virtqueue after the device
> >>>>    is activated are ignored.
> >>>>    
> >>>> Amazingly, there have not been any reports of this causing breakage.
> >>>> I have a fix for the first [2], which actually decreases the amount
> >>>> of code.  However, the second is trickier and I'm tempted to not
> >>>> bother unless it causes real-world problems.
> >>>>
> >>>> Are there real-world drivers that will run into either of the above
> >>>> bugs?  Linux seems to only choose anything else as a fallback, which
> >>>> presumably is not triggered.
> >>>> 
> >>>> [1]: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/7813
> >>>> [2]: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7963
> >>>
> >>> It will sometimes trigger.
> >>
> >> Would it be possible to provide an example?  A reproducible test
> >> case would be ideal, but conditions under which this will trigger
> >> are also sufficient.
> > 
> > 
> > I am not sure what does "is activated" mean.
> > For example, on latest Linux:
> > 
> >         vq = vp_find_one_vq_msix(vdev, avq->vq_index, vp_modern_avq_done,
> >                                  avq->name, false, true, &allocated_vectors,
> >                                  vector_policy, &vp_dev->admin_vq.info);
> >         if (IS_ERR(vq)) {
> >                 err = PTR_ERR(vq);
> >                 goto error_find;
> >         }
> > 
> >         return 0;
> > 
> > error_find:
> >         vp_del_vqs(vdev);
> >         return err;
> > }
> > 
> > 
> > And 
> > 
> > static void del_vq(struct virtio_pci_vq_info *info)
> > {
> >         struct virtqueue *vq = info->vq;
> >         struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> >         struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> > 
> >         if (vp_dev->msix_enabled)
> >                 vp_modern_queue_vector(mdev, vq->index,
> >                                        VIRTIO_MSI_NO_VECTOR);
> > 
> >         if (!mdev->notify_base)
> >                 pci_iounmap(mdev->pci_dev, (void __force __iomem *)vq->priv);
> > 
> >         vring_del_virtqueue(vq);
> > }
> > 
> > 
> > and this happens after feature negotiation.
> > 
> > It's before device_ready, however.
> 
> Are there drivers that will change virtqueue => MSI-X vector mappings
> after DRIVER_OK without an intervening reset?  Cloud Hypervisor
> supports this for devices it implements internally, but it ignores
> such changes for vhost-user devices.  Is this going to cause problems
> in practice?


Also yes.

For example, dpdk uses this during cleanup to block interrupts:
drivers/net/virtio/virtio_ethdev.c


int
virtio_dev_close(struct rte_eth_dev *dev)
{       
        struct virtio_hw *hw = dev->data->dev_private;
        struct rte_eth_intr_conf *intr_conf = &dev->data->dev_conf.intr_conf;
                                             
        PMD_INIT_LOG(DEBUG, "virtio_dev_close");
        if (rte_eal_process_type() != RTE_PROC_PRIMARY)
                return 0;

        if (!hw->opened)
                return 0;
        hw->opened = 0;

        /* reset the NIC */
        if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
                VIRTIO_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
        if (intr_conf->rxq)
                virtio_queues_unbind_intr(dev);

        if (intr_conf->lsc || intr_conf->rxq) {
                virtio_intr_disable(dev);
                rte_intr_efd_disable(dev->intr_handle);
                rte_intr_vec_list_free(dev->intr_handle);
        }

        virtio_reset(hw);
        virtio_dev_free_mbufs(dev);
        virtio_free_queues(hw);
        virtio_free_rss(hw);

        return VIRTIO_OPS(hw)->dev_close(hw);
}


> 
> If device_ready is what sets DRIVER_OK, and the virtqueue => MSI-X
> vector mappings subsequently stay static until reset, then everything
> should work fine unless I misunderstood the Cloud Hypervisor code.





> >>>> One reason I am asking is that I am working on an updated
> >>>> virtio-vhost-user spec, which I've renamed vhost-guest.  A vhost-guest
> >>>> device implements a vhost-user server, and requires one MSI for each
> >>>> virtqueue of _the device being implemented_.  The existing spec
> >>>> allows the guest to select which MSIs are used, but that seems to
> >>>> be pointless additional complexity.  A simpler option would be to
> >>>> hard-code the MSI assignments:
> >>>>
> >>>> - 0: Configuration change interrupt.
> >>>>
> >>>> - 1..N (inclusive): Queue interrupts for the N virtqueues provided
> >>>>   by the vhost-guest device.
> >>>>
> >>>> - N+1..N+M (inclusive): Buffer availability interrupts for each of
> >>>>   the M virtqueues that the driver is implementing.
> >>>
> >>>
> >>> you can do this, and imply ask drivers to share msi vector values.
> >>
> >> Are there drivers that cannot do this?  Is this the reason that the
> >> virtio spec allows using the same MSI for multiple virtqueues?
> >>
> > 
> > No, it's because of the devices:
> > 1. it's easier for device to detect sharing when it's explicit,
> >    rather than matching msi vectors which can change at any time at all.
> > 2. we thought it might be more portable to non pci transports.
> 
> Makes sense!  Thank you so much for your time and effort!
> -- 
> Sincerely,
> Demi Marie Obenour (she/her/hers)

next prev parent reply	other threads:[~2026-04-05 21:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 21:19 Should there be a mode in which the virtqueue -> MSI mapping is fixed? Demi Marie Obenour
2026-04-05  0:56 ` Michael S. Tsirkin
2026-04-05 17:50   ` Demi Marie Obenour
2026-04-05 20:15     ` Michael S. Tsirkin
2026-04-05 20:58       ` Demi Marie Obenour
2026-04-05 21:09         ` Michael S. Tsirkin [this message]
2026-04-05 21:47           ` Demi Marie Obenour
2026-04-05 21:50             ` Michael S. Tsirkin
2026-04-05 22:28               ` Demi Marie Obenour
2026-04-06  9:25                 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260405170510-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=demiobenour@gmail.com \
    --cc=virtio-comment@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.