From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
kvm@vger.kernel.org, Eric Auger <eric.auger@redhat.com>
Subject: Re: [RFC PATCH kernel] vfio-pci: Allow write combining
Date: Thu, 30 Nov 2017 15:20:03 +1100 [thread overview]
Message-ID: <20171130042003.GW3023@umbus.fritz.box> (raw)
In-Reply-To: <20171129114746.45d18a09@t450s.home>
[-- Attachment #1: Type: text/plain, Size: 4456 bytes --]
On Wed, Nov 29, 2017 at 11:47:46AM -0700, Alex Williamson wrote:
> On Fri, 24 Nov 2017 15:58:09 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> > On 15/11/17 03:28, Alex Williamson wrote:
> > > On Tue, 14 Nov 2017 13:29:02 +1100
> > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > >
> > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:
> > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
> > >>>
> > >>> We have a new plan on this - I'll discuss it over IRC.
> > >>>
> > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > >>>>>> special and may simply ignore mapping flags in some configs but PPC radix
> > >>>>>> guests still rely on this)
> > >>>
> > >>> AIUI this isn't for radix, but for DPDK things that we need this. Ben
> > >>> talked about it a bit, but I don't know what the outcome was.
> > >>
> > >> So this is not a powerpc specific issue. Other archs similarily want to
> > >> be able to do write combine mappings.
> > >>
> > >> The way sysfs does it is that for prefetchable BARs, it exposes both
> > >> a resourceN and a resourceN_wc file.
> > >>
> > >> For VFIO it's a bit more tricky, maybe we need to game the offset using
> > >> some of it as flags but that's very fishy, or maybe we do some kind of
> > >> ioctl that selects the attributes used for that fd instance for
> > >> subsequent mappings...
> > >>
> > >> I'll let Alex chose what he feels most appropriate here.
> > >
> > > My order of preference would be something like:
> > >
> > > - mmap flags provide some way for the user to specify a wc mapping
> > > within existing regions
> >
> > There are plenty of flags but none really matches, checked with Paul.
>
> Is MAP_NONBLOCK off the table? Why?
>
> > > - some other mechanism of using the existing regions
> >
> > I can only think of madvise but it does not have appropriate flags either.
>
> Is it worth the process to define something that is appropriate? Would
> either of the above be the obvious architectural/implementation choice
> if we could define a flag for it?
>
> > > - additional regions provided for use exclusively with wc attributes
> > > (generalizing PCI BAR wc regions within device specific regions)
> >
> >
> > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so
> > on for other BARs) seems a viable option.
> >
> > However the comment for VFIO_PCI_xxx_REGION_INDEX says:
> >
> > VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
> > /* device specific cap to define content. */
> >
> >
> > which limits me in where I can add new indexes, I cannot just add new _WC
> > indexes to that enum, can I? I cannot see any existing regions above 9 yet
> > though.
>
> The comment explains how to do this, you'd add a device specific region
> with the type identifying it as a PCI MMIO WC region and the sub-type
> probably defining the BAR index.
>
> > > - additional file descriptors provided for wc access
> >
> > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would
> > take a BAR index, check if the BAR is prefetchable and if so - return an fd
> > which the userspace then could mmap(). This is won't break that ABI with 9
> > regions but it is the least favourable in the list...
>
> Do the kernel mechanics require it to be a separate file descriptor? A
> separate fd is my last choice as well, but the interfaces your were
> attempting to use previously seemed to have fd granularity.
>
> > > This isn't at the top of my priority list to figure out the solution,
> > > so whoever implements it will need to provide justification as they
> > > move down the list from more to less preferred solutions. Thanks,
> >
> > I am trying... I was really counting on you guys having this discussed in
> > Prague :(
>
> Should have been there to push your agenda... Thanks,
We discussed it briefly, BenH seemed to think there wasn't a big
difficulty, IIRC, which is why we didn't spend much time on this
(compared to the other issues). So, talk to him.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-11-30 5:04 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-09 2:50 [RFC PATCH kernel] vfio-pci: Allow write combining Alexey Kardashevskiy
2017-10-10 21:55 ` Alex Williamson
2017-10-11 2:05 ` Alexey Kardashevskiy
2017-10-11 2:42 ` Alex Williamson
2017-10-11 2:56 ` Alexey Kardashevskiy
2017-10-11 15:35 ` Benjamin Herrenschmidt
2017-10-16 5:54 ` Alexey Kardashevskiy
2017-10-16 6:00 ` David Gibson
2017-10-16 7:36 ` Alexey Kardashevskiy
2017-10-16 8:01 ` David Gibson
2017-11-06 5:44 ` Alexey Kardashevskiy
2017-11-14 2:23 ` David Gibson
2017-11-14 2:29 ` Benjamin Herrenschmidt
2017-11-14 16:28 ` Alex Williamson
2017-11-24 4:58 ` Alexey Kardashevskiy
2017-11-29 18:47 ` Alex Williamson
2017-11-30 4:20 ` David Gibson [this message]
2017-11-30 20:06 ` Benjamin Herrenschmidt
2017-10-16 8:38 ` Benjamin Herrenschmidt
2017-10-16 11:11 ` Alexey Kardashevskiy
2017-10-18 7:33 ` Benjamin Herrenschmidt
2017-10-18 9:00 ` Alexey Kardashevskiy
2017-10-18 14:21 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171130042003.GW3023@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=eric.auger@redhat.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox