devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Claire Chang" <tientzu@chromium.org>,
	"Rob Herring" <robh+dt@kernel.org>,
	mpe@ellerman.id.au, "Joerg Roedel" <joro@8bytes.org>,
	"Will Deacon" <will@kernel.org>,
	"Frank Rowand" <frowand.list@gmail.com>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	boris.ostrovsky@oracle.com, jgross@suse.com,
	"Christoph Hellwig" <hch@lst.de>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	heikki.krogerus@linux.intel.com, peterz@infradead.org,
	benh@kernel.crashing.org, grant.likely@arm.com, paulus@samba.org,
	mingo@kernel.org, sstabellini@kernel.org,
	"Saravana Kannan" <saravanak@google.com>,
	xypron.glpk@gmx.de,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	"Bartosz Golaszewski" <bgolaszewski@baylibre.com>,
	xen-devel@lists.xenproject.org,
	"Thierry Reding" <treding@nvidia.com>,
	linux-devicetree <devicetree@vger.kernel.org>,
	linuxppc-dev@lists.ozlabs.org,
	"Nicolas Boichat" <drinkcat@chromium.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"list@263.net:IOMMU DRIVERS" <iommu@lists.linux-foundation.org>,
	"Jim Quinlan" <james.quinlan@broadcom.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	hch@infradead.org, "Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	virtualization@lists.linux.dev, graf@amazon.de
Subject: Re: Using Restricted DMA for virtio-pci
Date: Sun, 30 Mar 2025 16:07:56 +0100	[thread overview]
Message-ID: <09fc164ebcfd893ffd67d1b224d6e1c5e5772ee0.camel@infradead.org> (raw)
In-Reply-To: <20250330093532-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 7731 bytes --]

On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote:
> On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote:
> > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
> > > > 
> > > > I don't mind as such (though I don't understand completely), but since
> > > > this is changing the device anyway, I am a bit confused why you can't
> > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > > > which will DTRT for you, will it not?
> > > 
> > > That would be necessary but not sufficient. ...
> 
> could you explain pls?

There was more to that in the previous email which I elided for this
followup.

https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@infradead.org/

> > My first cut at a proposed spec change looks something like this. I'll
> > post it to the virtio-comment list once I've done some corporate
> > bureaucracy and when the list stops sending me python tracebacks in
> > response to my subscribe request.
> 
> the linux foundation one does this? maybe poke at the admins.
> 
> > In the meantime I'll hack up some QEMU and guest Linux driver support
> > to match.
> > 
> > diff --git a/content.tex b/content.tex
> > index c17ffa6..1e6e1d6 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  Currently these device-independent feature bits are defined:
> >  
> >  \begin{description}
> > +  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
> > +  provides a memory region which is to be used for bounce buffering,
> > +  rather than permitting direct memory access to system memory.
> >    \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
> >    that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> >    flag set, as described in \ref{sec:Basic Facilities of a Virtio
> > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical
> >  addresses to the device.
> >  
> > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
> > +then pass only addresses within the Software IOTLB bounce buffer to the
> > +device.
> > +
> >  A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
> >  
> >  A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
> > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
> >  accepted.
> >  
> > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
> > +provide a Software IOTLB bounce buffer.
> > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
> > +
> >  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> >  buffers in the same order in which they have been available.
> >  
> > diff --git a/transport-pci.tex b/transport-pci.tex
> > index a5c6719..23e0d57 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >  \item ISR Status
> >  \item Device-specific configuration (optional)
> >  \item PCI configuration access
> > +\item SWIOTLB bounce buffer
> >  \end{itemize}
> >  
> >  Each structure can be mapped by a Base Address register (BAR) belonging to
> > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> >  /* Vendor-specific data */
> >  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> > +/* Software IOTLB bounce buffer */
> > +#define VIRTIO_PCI_CAP_SWIOTLB           10
> >  \end{lstlisting}
> >  
> >          Any other value is reserved for future use.
> > @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
> >  The driver MUST qualify the \field{vendor_id} before
> >  interpreting or writing into the Vendor data capability.
> >  
> > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +The optional Software IOTLB bounce buffer capability allows the
> > +device to provide a memory region which can be used by the driver
> > +driver for bounce buffering. This allows a device on the PCI
> > +transport to operate without DMA access to system memory addresses.
> > +
> > +The Software IOTLB region is referenced by the
> > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> > +capability, if negotiated.
> 
> 
> why not? an optimization?
> A mix of swiotlb and system memory might be very challenging from POV
> of ordering.

Conceptually, these addresses are *on* the PCI device. If the device is
accessing addresses which are local to it, they aren't subject to IOMMU
translation/filtering because they never even make it to the PCI bus as
memory transactions.

> 
> > +
> > +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +Devices which present the Software IOTLB bounce buffer capability
> > +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> > +
> > +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +The driver SHOULD use the offered buffer in preference to passing system
> > +memory addresses to the device.
> 
> Even if not using VIRTIO_F_SWIOTLB? Is that really necessary?

That part isn't strictly necessary, but I think it makes sense, for
cases where the SWIOTLB support is an *optimisation* even if it isn't
strictly necessary.

Why might it be an "optimisation"? Well... if we're thinking of a model
like pKVM where the VMM can't just arbitrarily access guest memory,
using the SWIOTLB is a simple way to avoid that (by using the on-board
memory instead, which *can* be shared with the VMM).

But if we want to go to extra lengths to support unenlightened guests,
an implementation might choose to just *disable* the memory protection
if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking
that guest.

Or it might have a complicated emulation/snooping of virtqueues in the
trusted part of the hypervisor so that it knows which addresses the
guest has truly *asked* the VMM to access. (And yes, of course that's
what an IOMMU is for, but when have you seen hardware companies design
a two-stage IOMMU which supports actual PCI passthrough *and* get it
right for the hypervisor to 'snoop' on the stage1 page tables to
support emulated devices too....)

Ultimately I think it was natural to advertise the location of the
buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the
separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious
question of what a device should do if it sees one but *not* the other.

Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually
being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its
equivalent for other transports). But the converse seemed reasonable as
a *hint* even if the use of the SWIOTLB isn't mandatory.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

  reply	other threads:[~2025-03-30 15:08 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
2021-02-09  6:21 ` [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start Claire Chang
2021-02-09  8:40   ` Christoph Hellwig
2021-02-09  6:21 ` [PATCH v4 02/14] swiotlb: Move is_swiotlb_buffer() to swiotlb.c Claire Chang
2021-02-09  6:21 ` [PATCH v4 03/14] swiotlb: Add struct swiotlb Claire Chang
2021-02-09  6:21 ` [PATCH v4 04/14] swiotlb: Refactor swiotlb_late_init_with_tbl Claire Chang
2021-02-09  6:21 ` [PATCH v4 05/14] swiotlb: Add DMA_RESTRICTED_POOL Claire Chang
2021-02-09  6:21 ` [PATCH v4 06/14] swiotlb: Add restricted DMA pool Claire Chang
2021-02-09  6:21 ` [PATCH v4 07/14] swiotlb: Update swiotlb API to gain a struct device argument Claire Chang
2021-02-09  6:21 ` [PATCH v4 08/14] swiotlb: Use restricted DMA pool if available Claire Chang
2021-02-09  6:21 ` [PATCH v4 09/14] swiotlb: Refactor swiotlb_tbl_{map,unmap}_single Claire Chang
2021-02-09  6:21 ` [PATCH v4 10/14] dma-direct: Add a new wrapper __dma_direct_free_pages() Claire Chang
2021-02-09  6:21 ` [PATCH v4 11/14] swiotlb: Add is_dev_swiotlb_force() Claire Chang
2021-02-09  6:21 ` [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support Claire Chang
2021-02-26  4:17   ` Claire Chang
2021-02-26  5:17     ` Christoph Hellwig
2021-02-26  9:35       ` Claire Chang
2021-02-09  6:21 ` [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool Claire Chang
2021-03-10 16:07   ` Will Deacon
2021-03-10 21:40     ` Rob Herring
2021-03-11  5:04       ` Florian Fainelli
2021-02-09  6:21 ` [PATCH v4 14/14] of: Add plumbing for " Claire Chang
2021-04-22  8:20 ` [PATCH v4 00/14] Restricted DMA Claire Chang
2025-03-21 15:38 ` Using Restricted DMA for virtio-pci David Woodhouse
2025-03-21 18:32   ` Michael S. Tsirkin
2025-03-21 18:42     ` David Woodhouse
2025-03-28 17:40       ` David Woodhouse
2025-03-30 13:42         ` Michael S. Tsirkin
2025-03-30 15:07           ` David Woodhouse [this message]
2025-03-30 16:59             ` Michael S. Tsirkin
2025-03-30 20:07               ` David Woodhouse
2025-03-30 17:06       ` Michael S. Tsirkin
2025-03-30 21:27         ` David Woodhouse
2025-03-30 21:48           ` Michael S. Tsirkin
2025-03-31  9:42             ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09fc164ebcfd893ffd67d1b224d6e1c5e5772ee0.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=bgolaszewski@baylibre.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=devicetree@vger.kernel.org \
    --cc=drinkcat@chromium.org \
    --cc=eperezma@redhat.com \
    --cc=frowand.list@gmail.com \
    --cc=graf@amazon.de \
    --cc=grant.likely@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=james.quinlan@broadcom.com \
    --cc=jasowang@redhat.com \
    --cc=jgross@suse.com \
    --cc=joro@8bytes.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=mst@redhat.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rdunlap@infradead.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=saravanak@google.com \
    --cc=sstabellini@kernel.org \
    --cc=tientzu@chromium.org \
    --cc=treding@nvidia.com \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=xypron.glpk@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).