public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	kvm@vger.kernel.org, Eric Auger <eric.auger@redhat.com>
Subject: Re: [RFC PATCH kernel] vfio-pci: Allow write combining
Date: Thu, 30 Nov 2017 15:20:03 +1100	[thread overview]
Message-ID: <20171130042003.GW3023@umbus.fritz.box> (raw)
In-Reply-To: <20171129114746.45d18a09@t450s.home>

[-- Attachment #1: Type: text/plain, Size: 4456 bytes --]

On Wed, Nov 29, 2017 at 11:47:46AM -0700, Alex Williamson wrote:
> On Fri, 24 Nov 2017 15:58:09 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > On 15/11/17 03:28, Alex Williamson wrote:
> > > On Tue, 14 Nov 2017 13:29:02 +1100
> > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > >   
> > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:  
> > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)    
> > >>>
> > >>> We have a new plan on this - I'll discuss it over IRC.
> > >>>     
> > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > >>>>>> special and may simply ignore mapping flags in some configs but PPC radix
> > >>>>>> guests still rely on this)    
> > >>>
> > >>> AIUI this isn't for radix, but for DPDK things that we need this.  Ben
> > >>> talked about it a bit, but I don't know what the outcome was.    
> > >>
> > >> So this is not a powerpc specific issue. Other archs similarily want to
> > >> be able to do write combine mappings.
> > >>
> > >> The way sysfs does it is that for prefetchable BARs, it exposes both
> > >> a resourceN and a resourceN_wc file.
> > >>
> > >> For VFIO it's a bit more tricky, maybe we need to game the offset using
> > >> some of it as flags but that's very fishy, or maybe we do some kind of
> > >> ioctl that selects the attributes used for that fd instance for
> > >> subsequent mappings...
> > >>
> > >> I'll let Alex chose what he feels most appropriate here.  
> > > 
> > > My order of preference would be something like:
> > > 
> > >  - mmap flags provide some way for the user to specify a wc mapping
> > >    within existing regions  
> > 
> > There are plenty of flags but none really matches, checked with Paul.
> 
> Is MAP_NONBLOCK off the table?  Why?
>  
> > >  - some other mechanism of using the existing regions  
> > 
> > I can only think of madvise but it does not have appropriate flags either.
> 
> Is it worth the process to define something that is appropriate?  Would
> either of the above be the obvious architectural/implementation choice
> if we could define a flag for it?
> 
> > >  - additional regions provided for use exclusively with wc attributes
> > >    (generalizing PCI BAR wc regions within device specific regions)  
> > 
> > 
> > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so
> > on for other BARs) seems a viable option.
> > 
> > However the comment for VFIO_PCI_xxx_REGION_INDEX says:
> > 
> >   VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
> >                            /* device specific cap to define content. */
> > 
> > 
> > which limits me in where I can add new indexes, I cannot just add new _WC
> > indexes to that enum, can I? I cannot see any existing regions above 9 yet
> > though.
> 
> The comment explains how to do this, you'd add a device specific region
> with the type identifying it as a PCI MMIO WC region and the sub-type
> probably defining the BAR index.
> 
> > >  - additional file descriptors provided for wc access  
> > 
> > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would
> > take a BAR index, check if the BAR is prefetchable and if so - return an fd
> > which the userspace then could mmap(). This is won't break that ABI with 9
> > regions but it is the least favourable in the list...
> 
> Do the kernel mechanics require it to be a separate file descriptor?  A
> separate fd is my last choice as well, but the interfaces your were
> attempting to use previously seemed to have fd granularity.
> 
> > > This isn't at the top of my priority list to figure out the solution,
> > > so whoever implements it will need to provide justification as they
> > > move down the list from more to less preferred solutions.  Thanks,  
> > 
> > I am trying... I was really counting on you guys having this discussed in
> > Prague :(
> 
> Should have been there to push your agenda...  Thanks,

We discussed it briefly, BenH seemed to think there wasn't a big
difficulty, IIRC, which is why we didn't spend much time on this
(compared to the other issues).  So, talk to him.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2017-11-30  5:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-09  2:50 [RFC PATCH kernel] vfio-pci: Allow write combining Alexey Kardashevskiy
2017-10-10 21:55 ` Alex Williamson
2017-10-11  2:05   ` Alexey Kardashevskiy
2017-10-11  2:42     ` Alex Williamson
2017-10-11  2:56       ` Alexey Kardashevskiy
2017-10-11 15:35         ` Benjamin Herrenschmidt
2017-10-16  5:54           ` Alexey Kardashevskiy
2017-10-16  6:00             ` David Gibson
2017-10-16  7:36               ` Alexey Kardashevskiy
2017-10-16  8:01                 ` David Gibson
2017-11-06  5:44                   ` Alexey Kardashevskiy
2017-11-14  2:23                     ` David Gibson
2017-11-14  2:29                       ` Benjamin Herrenschmidt
2017-11-14 16:28                         ` Alex Williamson
2017-11-24  4:58                           ` Alexey Kardashevskiy
2017-11-29 18:47                             ` Alex Williamson
2017-11-30  4:20                               ` David Gibson [this message]
2017-11-30 20:06                                 ` Benjamin Herrenschmidt
2017-10-16  8:38                 ` Benjamin Herrenschmidt
2017-10-16 11:11                   ` Alexey Kardashevskiy
2017-10-18  7:33                     ` Benjamin Herrenschmidt
2017-10-18  9:00                       ` Alexey Kardashevskiy
2017-10-18 14:21                         ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171130042003.GW3023@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=eric.auger@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox