linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Joerg Roedel <joerg.roedel@amd.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] iommu: making IOMMU sysfs nodes API public
Date: Tue, 19 Feb 2013 18:38:53 +1100	[thread overview]
Message-ID: <20130219073853.GS21067@truffula.fritz.box> (raw)
In-Reply-To: <1361251440.2801.142.camel@bling.home>

[-- Attachment #1: Type: text/plain, Size: 5488 bytes --]

On Mon, Feb 18, 2013 at 10:24:00PM -0700, Alex Williamson wrote:
> On Mon, 2013-02-18 at 17:15 +1100, Alexey Kardashevskiy wrote:
> > On 13/02/13 04:15, Alex Williamson wrote:
> > > On Wed, 2013-02-13 at 01:42 +1100, Alexey Kardashevskiy wrote:
> > >> On 12/02/13 16:07, Alex Williamson wrote:
> > >>> On Tue, 2013-02-12 at 15:06 +1100, Alexey Kardashevskiy wrote:
> > >>>> Having this patch in a tree, adding new nodes in sysfs
> > >>>> for IOMMU groups is going to be easier.
> > >>>>
> > >>>> The first candidate for this change is a "dma-window-size"
> > >>>> property which tells a size of a DMA window of the specific
> > >>>> IOMMU group which can be used later for locked pages accounting.
> > >>>
> > >>> I'm still churning on this one; I'm nervous this would basically creat
> > >>> a /proc free-for-all under /sys/kernel/iommu_group/$GROUP/ where any
> > >>> iommu driver can add random attributes.  That can get ugly for
> > >>> userspace.
> > >>
> > >> Is not it exactly what sysfs is for (unlike /proc)? :)
> > >
> > > Um, I hope it's a little more thought out than /proc.
> > >
> > >>> On the other hand, for the application of userspace knowing how much
> > >>> memory to lock for vfio use of a group, it's an appealing location to
> > >>> get that information.  Something like libvirt would already be poking
> > >>> around here to figure out which devices to bind.  Page limits need to be
> > >>> setup prior to use through vfio, so sysfs is more convenient than
> > >>> through vfio ioctls.
> > >>
> > >> True. DMA window properties do not change since boot so sysfs is the right
> > >> place to expose them.
> > >>
> > >>> But then is dma-window-size just a vfio requirement leaking over into
> > >>> iommu groups?  Can we allow iommu driver based attributes without giving
> > >>> up control of the namespace?  Thanks,
> > >>
> > >> Who are you asking these questions? :)
> > >
> > > Anyone, including you.  Rather than dropping misc files in sysfs to
> > > describe things about the group, I think the better solution in your
> > > case might be a link from the group to an existing sysfs directory
> > > describing the PE.  I believe your PE is rooted in a PCI bridge, so that
> > > presumably already has a representation in sysfs.  Can the aperture size
> > > be determined from something in sysfs for that bridge already?  I'm just
> > > not ready to create a grab bag of sysfs entries for a group yet.
> > > Thanks,
> > 
> > 
> > At the moment there is no information neither in sysfs nor 
> > /proc/device-tree about the dma-window. And adding a sysfs entry per PE 
> > (powerpc partitionable end-point which is often a PHB but not always) just 
> > for VFIO is quite heavy.
> 
> How do you learn the window size and PE extents in the host kernel?
> 
> > We could add a ppc64 subfolder under /sys/kernel/iommu/xxx/ and put the 
> > "dma-window" property there. And replace it with a symlink when and if we 
> > add something for PE later. Would work?

Fwiw, I'd suggest a subfolder named for the type of IOMMU, rather than
"ppc64".

> To be clear, you're suggesting /sys/kernel/iommu_groups/$GROUP/xxx/,
> right?  A subfolder really only limits the scope of the mess, so it's
> not much improvement.  What does the interface look like to make those
> subfolders?
> 
> The problem we're trying to solve is this call flow:
> 
> containerfd = open("/dev/vfio/vfio");
> ioctl(containerfd, VFIO_GET_API_VERSION);
> ioctl(containerfd, VFIO_CHECK_EXTENSION, ...);
> groupfd = open("/dev/vfio/$GROUP");
> ioctl(groupfd, VFIO_GROUP_GET_STATUS);
> ioctl(groupfd, VFIO_GROUP_SET_CONTAINER, &containerfd);
> 
> You wanted to lock all the memory for the DMA window here, before we can
> call VFIO_IOMMU_GET_INFO, but does it need to happen there?  We still
> have a MAP_DMA hook.  We could do it all on the first mapping.

MAP_DMA isn't quite enough, since the guest can also directly cause
mappings using hypercalls directly implemented in KVM.  I think it
would be feasible to lock on the first mapping (either via MAP_DMA, or
H_PUT_TCE) though it would be a bit ugly and require that the first
H_PUT_TCE always bounce out to virtual mode (Alexey, correct me if I'm
wrong here).  IIRC there is also a call to bind the vfio container to
a (qemu assigned) LIOBN, before the guest can use H_PUT_TCE directly,
so that might be another place we could do the lock.

>  It also
> has a flags field that could augment the behavior to trigger page
> locking.

I don't see how the flags help us - we can't have userspace choose to
skip the locked memory accounting.  Or are you suggesting a flag to
open the container in some sort of dummy mode where only GET_INFO is
possible, then re-open with the full locking?

>  Adding the window size to sysfs seems more readily convenient,
> but is it so hard for userspace to open the files and call a couple
> ioctls to get far enough to call IOMMU_GET_INFO?  I'm unconvinced the
> clutter in sysfs more than just a quick fix.  Thanks,

And finally, as Alexey points out, isn't the point here so we know how
much rlimit to give qemu?  Using ioctls we'd need a special tool just
to check the dma window sizes, which seems a bit hideous.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  parent reply	other threads:[~2013-02-19  7:38 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-11 11:54 [PATCH 0/2] vfio on power Alexey Kardashevskiy
2013-02-11 11:54 ` [PATCH 1/2] vfio powerpc: enabled on powernv platform Alexey Kardashevskiy
2013-02-11 22:16   ` Alex Williamson
2013-02-11 23:19     ` Alexey Kardashevskiy
2013-02-12  0:01       ` Alex Williamson
2013-02-25  2:21     ` Paul Mackerras
2013-02-11 11:54 ` [PATCH 2/2] vfio powerpc: implemented IOMMU driver for VFIO Alexey Kardashevskiy
2013-02-11 22:17   ` Alex Williamson
2013-02-11 23:45     ` Alexey Kardashevskiy
2013-02-12  0:25       ` Alex Williamson
2013-02-12  4:06         ` [PATCH] iommu: making IOMMU sysfs nodes API public Alexey Kardashevskiy
2013-02-12  5:07           ` Alex Williamson
2013-02-12 14:42             ` Alexey Kardashevskiy
2013-02-12 17:15               ` Alex Williamson
2013-02-18  6:15                 ` Alexey Kardashevskiy
2013-02-19  5:24                   ` Alex Williamson
2013-02-19  5:48                     ` Alexey Kardashevskiy
2013-02-19 19:53                       ` Alex Williamson
2013-02-19  7:38                     ` David Gibson [this message]
2013-02-19 20:11                       ` Alex Williamson
2013-02-20  2:31                         ` Alexey Kardashevskiy
2013-02-20  3:47                           ` Alex Williamson
2013-02-20  4:20                             ` Alexey Kardashevskiy
2013-02-20  4:33                               ` Alex Williamson
2013-03-18  3:53                                 ` Alexey Kardashevskiy
2013-03-19  2:40                                   ` Alex Williamson
2013-03-19  7:08                                     ` [PATCH] vfio powerpc: implement IOMMU driver for VFIO Alexey Kardashevskiy
2013-03-20 20:45                                       ` Alex Williamson
2013-03-21  0:57                                         ` Alexey Kardashevskiy
2013-03-21  1:29                                           ` Alex Williamson
2013-03-21  1:55                                         ` David Gibson
2013-03-21  3:16                                           ` Alex Williamson
2013-03-25  2:12                                             ` David Gibson
2013-02-22  0:04                         ` [PATCH] iommu: making IOMMU sysfs nodes API public David Gibson
2013-02-22  4:11                           ` Alex Williamson
2013-02-25  0:03                             ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130219073853.GS21067@truffula.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=joerg.roedel@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).