All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>,
	"Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"eric.auger.pro@gmail.com" <eric.auger.pro@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"sgarzare@redhat.com" <sgarzare@redhat.com>,
	"lvivier@redhat.com" <lvivier@redhat.com>
Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
Date: Thu, 30 Jan 2025 12:35:18 -0500	[thread overview]
Message-ID: <Z5u4VmyEohMUmCPM@x1.local> (raw)
In-Reply-To: <CACGkMEsJXaPgHU0+YB+NKnDPoOg25RMNiWf-LCOqX1jP9REE8w@mail.gmail.com>

On Mon, Jan 27, 2025 at 08:44:50AM +0800, Jason Wang wrote:
> On Sun, Jan 26, 2025 at 3:56 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Hello, Jason, Eric,
> > >
> > > On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote:
> > > > It might be because neither virtio bus nor virtio-net provides a
> > > > shutdown method.
> > > >
> > > > There used to be requests to provide those to unbreak the kexec.
> > > >
> > > > A quick try might be to provide a .driver.shutdown to
> > > > virtio_net_driver structure and reset the device there as a start.
> > >
> > > I didn't check virtio driver path, but if that's missing it's reasonable to
> > > support it indeed.
> > >
> > > OTOH, even with that, vhost can still hit such DMA issue if it's a
> > > hard-reset, am I right?  IOW, when using QMP command "system-reset".  If my
> > > memory is correct, that's the problem I was working on the VFIO series,
> > > rather than a clean reboot.  And that won't give guest driver chance to run
> > > anything, IIUC.
> >
> > Yes.
> >
> > >
> > > I am wildly suspecting a VT-d write to GCMD to disable it can also appear
> > > if there's a hard reset, then when bootloading the VM the bios (or whatever
> > > firmware at early stage) may want to make sure the VT-d device is
> > > completely off by writting to GCMD. But that's a pure guess.. and that may
> > > or may not matter much on how we fix this problem.
> > >
> > > IOW, I suspect we need to fix both of them,
> > >
> > >   (a) for soft-reset, by making sure drivers properly quiesce DMAs
> > >   proactively when VM gracefully shuts down.
> > >
> > >   (b) for hard-reset, by making sure QEMU reset in proper order.
> > >
> > > One thing to mention is for problem (b) VFIO used to have an extra
> > > challenge on !FLR devices, I discussed it in patch 4's comment there.
> > > Quotting from patch 4 of series:
> > >
> > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com
> > >
> > >      * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs
> > >      *     (reference: resettable_cold_reset_fn())
> > >      *
> > >      *     Currently, vIOMMU devices are created as normal '-device'
> > >      *     cmdlines.  It means in many ways it has the same attributes with
> > >      *     most of the rest devices, even if the rest devices should
> > >      *     logically be under control of the vIOMMU unit.
> > >      *
> > >      *     One side effect of it is vIOMMU devices will be currently put
> > >      *     randomly under qdev tree hierarchy, which is the source of
> > >      *     device reset ordering in current QEMU (depth-first traversal).
> > >      *     It means vIOMMU now can be reset before some devices.  For fully
> > >      *     emulated devices that's not a problem, because the traversal
> > >      *     holds BQL for the whole process.  However it is a problem if DMA
> > >      *     can happen without BQL, like VFIO, vDPA or remote device process.
> > >      *
> > >      *     TODO: one ideal solution can be that we make vIOMMU the parent
> > >      *     of the whole pci host bridge.  Hence vIOMMU can be reset after
> > >      *     all the devices are reset and quiesced.
> > >      *
> > >      * (2) Some devices register its own reset functions
> > >      *
> > >      *     Even if above issue solved, if devices register its own reset
> > >      *     functions for some reason via QEMU reset hooks, vIOMMU can still
> > >      *     be reset before the device. One example is vfio_reset_handler()
> > >      *     where FLR is not supported on the device.
> > >      *
> > >      *     TODO: merge relevant reset functions into the device tree reset
> > >      *     framework.
> > >
> > > So maybe vhost doesn't have problem (2) listed above, and maybe it means
> > > it's still worthwhile thinking more about problem (1), which is to change
> > > the QOM tree to provide a correct topology representation when vIOMMU is
> > > present: so far it should be still a pretty much orphaned object there.. if
> > > QEMU relies on QOM tree topology for reset order, we may need to move it to
> > > the right place sooner or later.
> >
> > Sounds like a non-trivial task, so for a hard reset, maybe we can
> > proceed with Eric's proposal to deal with the reset before the device
> > stops.

The major challenge when I was working on that (as far as I can still
remember..): some devices are created at early stage of QEMU startup, which
can happen before the vIOMMU is created and realized.  Then it can be
challenging to re-parent those devices to be childrens of the vIOMMU, or we
may need a way to create vIOMMU always earlier than those..

> 
> Btw, I actually meant to break the assumption that vhost needs to be
> enabled/disabled after/before vIOMMU. This only works for virtio-net /
> vhost. From the view of vhost, it would work similar to _F_LOG_ALL
> (where there's no assumption on the order of enabling/disabling dirty
> page tracking and device start/stop).

Yes, we can go for a lightweight solution.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-01-30 17:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-20 17:33 [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled Eric Auger
2025-01-21  3:27 ` Jason Wang
2025-01-21  7:15   ` Eric Auger
2025-01-21 16:25   ` Eric Auger
2025-01-22  7:17     ` Jason Wang
2025-01-22  7:55       ` Eric Auger
2025-01-23  1:34         ` Jason Wang
2025-01-23  8:31           ` Eric Auger
2025-01-24  1:48             ` Jason Wang
2025-01-24  2:44             ` Duan, Zhenzhong
2025-01-24  3:30               ` Jason Wang
2025-01-24  3:41                 ` Jason Wang
2025-01-24  4:00                   ` Duan, Zhenzhong
2025-01-24  9:20                     ` Jason Wang
2025-01-24  9:50                       ` Duan, Zhenzhong
2025-01-24 17:56                   ` Eric Auger
2025-01-24 15:18                 ` Peter Xu
2025-01-26  7:56                   ` Jason Wang
2025-01-27  0:44                     ` Jason Wang
2025-01-30 17:35                       ` Peter Xu [this message]
2025-01-24 17:47               ` Eric Auger
2025-01-26  7:09                 ` Duan, Zhenzhong
2025-01-31  9:55               ` Eric Auger
2025-02-20 15:25                 ` Michael S. Tsirkin
2025-02-20 15:57                   ` Eric Auger
2025-02-20 23:27                     ` Michael S. Tsirkin
2025-01-21  8:31 ` Laurent Vivier
2025-01-21  8:45   ` Stefano Garzarella
2025-01-21  8:49     ` Eric Auger
2025-01-21  8:48   ` Eric Auger
2025-01-21 16:32   ` Eric Auger
2025-01-21  9:18 ` Duan, Zhenzhong
2025-01-21 10:34   ` Eric Auger
2025-01-21 10:43     ` Duan, Zhenzhong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5u4VmyEohMUmCPM@x1.local \
    --to=peterx@redhat.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.