All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: alex.williamson@redhat.com, schnelle@linux.ibm.com,
	pmorel@linux.ibm.com, borntraeger@de.ibm.com, hca@linux.ibm.com,
	gor@linux.ibm.com, gerald.schaefer@linux.ibm.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support
Date: Fri, 11 Dec 2020 15:35:01 +0100	[thread overview]
Message-ID: <20201211153501.7767a603.cohuck@redhat.com> (raw)
In-Reply-To: <ce9d4ef2-2629-59b7-99ed-4c8212cb004f@linux.ibm.com>

On Thu, 10 Dec 2020 10:51:23 -0500
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> On 12/10/20 7:33 AM, Cornelia Huck wrote:
> > On Wed,  9 Dec 2020 15:27:46 -0500
> > Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> >   
> >> Today, ISM devices are completely disallowed for vfio-pci passthrough as
> >> QEMU will reject the device due to an (inappropriate) MSI-X check.
> >> However, in an effort to enable ISM device passthrough, I realized that the
> >> manner in which ISM performs block write operations is highly incompatible
> >> with the way that QEMU s390 PCI instruction interception and
> >> vfio_pci_bar_rw break up I/O operations into 8B and 4B operations -- ISM
> >> devices have particular requirements in regards to the alignment, size and
> >> order of writes performed.  Furthermore, they require that legacy/non-MIO
> >> s390 PCI instructions are used, which is also not guaranteed when the I/O
> >> is passed through the typical userspace channels.  
> > 
> > The part about the non-MIO instructions confuses me. How can MIO
> > instructions be generated with the current code, and why does changing  
> 
> So to be clear, they are not being generated at all in the guest as the 
> necessary facility is reported as unavailable.
> 
> Let's talk about Linux in LPAR / the host kernel:  When hardware that 
> supports MIO instructions is available, all userspace I/O traffic is 
> going to be routed through the MIO variants of the s390 PCI 
> instructions.  This is working well for other device types, but does not 
> work for ISM which does not support these variants.  However, the ISM 
> driver also does not invoke the userspace I/O routines for the kernel, 
> it invokes the s390 PCI layer directly, which in turn ensures the proper 
> PCI instructions are used -- This approach falls apart when the guest 
> ISM driver invokes those routines in the guest -- we (qemu) pass those 
> non-MIO instructions from the guest as memory operations through 
> vfio-pci, traversing through the vfio I/O layer in the guest 
> (vfio_pci_bar_rw and friends), where we then arrive in the host s390 PCI 
> layer -- where the MIO variant is used because the facility is available.
> 
> Per conversations with Niklas (on CC), it's not trivial to decide by the 
> time we reach the s390 PCI I/O layer to switch gears and use the non-MIO 
> instruction set.
> 
> > the write pattern help?  
> 
> The write pattern is a separate issue from non-MIO instruction 
> requirements...  Certain address spaces require specific instructions to 
> be used (so, no substituting PCISTG for PCISTB - that happens too by 
> default for any writes coming into the host s390 PCI layer that are 
> <=8B, and they all are when the PCISTB is broken up into 8B memory 
> operations that travel through vfio_pci_bar_rw, which further breaks 
> those up into 4B operations).  There's also a requirement for some 
> writes that the data, if broken up, be written in a certain order in 
> order to properly trigger events. :(  The ability to pass the entire 
> PCISTB payload vs breaking it into 8B chunks is also significantly faster.

Let me summarize this to make sure I understand this new region
correctly:

- some devices may have relaxed alignment/length requirements for
  pcistb (and friends?)
- some devices may actually require writes to be done in a large chunk
  instead of being broken up (is that a strict subset of the devices
  above?)
- some devices do not support the new MIO instructions (is that a
  subset of the relaxed alignment devices? I'm not familiar with the
  MIO instructions)

The patchsets introduce a new region that (a) is used by QEMU to submit
writes in one go, and (b) makes sure to call into the non-MIO
instructions directly; it's basically killing two birds with one stone
for ISM devices. Are these two requirements (large writes and non-MIO)
always going hand-in-hand, or is ISM just an odd device?

If there's an expectation that the new region will always use the
non-MIO instructions (in addition to the changed write handling), it
should be noted in the description for the region as well.

  parent reply	other threads:[~2020-12-11 14:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 20:27 [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support Matthew Rosato
2020-12-09 20:27 ` [RFC 1/4] s390/pci: track alignment/length strictness for zpci_dev Matthew Rosato
2020-12-10 10:33   ` Cornelia Huck
2020-12-10 15:26     ` Matthew Rosato
2020-12-11 11:37       ` Cornelia Huck
2020-12-09 20:27 ` [RFC 2/4] vfio-pci/zdev: Pass the relaxed alignment flag Matthew Rosato
2020-12-09 20:27 ` [RFC 3/4] s390/pci: Get hardware-reported max store block length Matthew Rosato
2020-12-09 20:27 ` [RFC 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region Matthew Rosato
2020-12-09 20:52 ` [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support Matthew Rosato
2020-12-10 12:33 ` Cornelia Huck
2020-12-10 15:51   ` Matthew Rosato
2020-12-10 16:14     ` Niklas Schnelle
2020-12-11 14:14       ` Cornelia Huck
2020-12-11 14:35     ` Cornelia Huck [this message]
2020-12-11 15:01       ` Matthew Rosato
2020-12-11 15:04         ` Matthew Rosato
2020-12-17 12:59           ` Cornelia Huck
2020-12-17 16:04             ` Matthew Rosato
2020-12-22 16:18               ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201211153501.7767a603.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.