From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org,
Alex Williamson <alex.williamson@redhat.com>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible
Date: Tue, 5 May 2015 21:50:25 +1000 [thread overview]
Message-ID: <20150505115025.GJ14090@voom.redhat.com> (raw)
In-Reply-To: <554317A4.3090300@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 5628 bytes --]
On Fri, May 01, 2015 at 04:05:24PM +1000, Alexey Kardashevskiy wrote:
> On 05/01/2015 02:33 PM, David Gibson wrote:
> >On Thu, Apr 30, 2015 at 07:33:09PM +1000, Alexey Kardashevskiy wrote:
> >>On 04/30/2015 05:22 PM, David Gibson wrote:
> >>>On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> >>>>At the moment only one group per container is supported.
> >>>>POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> >>>>IOMMU group so we can relax this limitation and support multiple groups
> >>>>per container.
> >>>
> >>>It's not obvious why allowing multiple TCE tables per PE has any
> >>>pearing on allowing multiple groups per container.
> >>
> >>
> >>This patchset is a global TCE tables rework (patches 1..30, roughly) with 2
> >>outcomes:
> >>1. reusing the same IOMMU table for multiple groups - patch 31;
> >>2. allowing dynamic create/remove of IOMMU tables - patch 32.
> >>
> >>I can remove this one from the patchset and post it separately later but
> >>since 1..30 aim to support both 1) and 2), I'd think I better keep them all
> >>together (might explain some of changes I do in 1..30).
> >
> >The combined patchset is fine. My comment is because your commit
> >message says that multiple groups are possible *because* 2 TCE tables
> >per group are allowed, and it's not at all clear why one follows from
> >the other.
>
>
> Ah. That's wrong indeed, I'll fix it.
>
>
> >>>>This adds TCE table descriptors to a container and uses iommu_table_group_ops
> >>>>to create/set DMA windows on IOMMU groups so the same TCE tables will be
> >>>>shared between several IOMMU groups.
> >>>>
> >>>>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>[aw: for the vfio related changes]
> >>>>Acked-by: Alex Williamson <alex.williamson@redhat.com>
> >>>>---
> >>>>Changes:
> >>>>v7:
> >>>>* updated doc
> >>>>---
> >>>> Documentation/vfio.txt | 8 +-
> >>>> drivers/vfio/vfio_iommu_spapr_tce.c | 268 ++++++++++++++++++++++++++----------
> >>>> 2 files changed, 199 insertions(+), 77 deletions(-)
> >>>>
> >>>>diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> >>>>index 94328c8..7dcf2b5 100644
> >>>>--- a/Documentation/vfio.txt
> >>>>+++ b/Documentation/vfio.txt
> >>>>@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note
> >>>>
> >>>> This implementation has some specifics:
> >>>>
> >>>>-1) Only one IOMMU group per container is supported as an IOMMU group
> >>>>-represents the minimal entity which isolation can be guaranteed for and
> >>>>-groups are allocated statically, one per a Partitionable Endpoint (PE)
> >>>>+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> >>>>+container is supported as an IOMMU table is allocated at the boot time,
> >>>>+one table per a IOMMU group which is a Partitionable Endpoint (PE)
> >>>> (PE is often a PCI domain but not always).
> >>>
> >>>I thought the more fundamental problem was that different PEs tended
> >>>to use disjoint bus address ranges, so even by duplicating put_tce
> >>>across PEs you couldn't have a common address space.
> >>
> >>
> >>Sorry, I am not following you here.
> >>
> >>By duplicating put_tce, I can have multiple IOMMU groups on the same virtual
> >>PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple groups
> >>per container" does this, the address ranges will the same.
> >
> >Oh, ok. For some reason I thought that (at least on the older
> >machines) the different PEs used different and not easily changeable
> >DMA windows in bus addresses space.
>
>
> They do use different tables (which VFIO does not get to remove/create and
> uses these old helpers - iommu_take/release_ownership), correct. But all
> these windows are mapped at zero on a PE's PCI bus and nothing prevents me
> from updating all these tables with the same TCE values when handling
> H_PUT_TCE. Yes it is slow but it works (bit more details below).
Um.. I'm pretty sure that contradicts what Ben was saying on the
thread.
> >>What I cannot do on p5ioc2 is programming the same table to multiple
> >>physical PHBs (or I could but it is very different than IODA2 and pretty
> >>ugly and might not always be possible because I would have to allocate these
> >>pages from some common pool and face problems like fragmentation).
> >
> >So allowing multiple groups per container should be possible (at the
> >kernel rather than qemu level) by writing the same value to multiple
> >TCE tables. I guess its not worth doing for just the almost-obsolete
> >IOMMUs though.
>
>
> It is done at QEMU level though. As it works now, QEMU opens a group, walks
> through all existing containers and tries attaching a new group there. If it
> succeeded (x86 always; POWER8 after this patch), a TCE table is shared. If
> it failed, QEMU creates another container, attaches it to the same VFIO/PHB
> address space and attaches a group there.
>
> Then the only thing left is repeating ioctl() in vfio_container_ioctl() for
> every container in the VFIO address space; this is what that QEMU patch does
> (the first version of that patch called ioctl() only for the first container
> in the address space).
>
> From the kernel prospective there are 2 isolated containers; I'd like to
> keep it this way.
>
> btw thanks for the detailed review :)
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-05-05 13:09 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-25 12:14 [PATCH kernel v9 00/32] powerpc/iommu/vfio: Enable Dynamic DMA windows Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 01/32] powerpc/iommu: Split iommu_free_table into 2 helpers Alexey Kardashevskiy
2015-04-29 2:03 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 02/32] Revert "powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically" Alexey Kardashevskiy
2015-04-27 21:05 ` Alex Williamson
2015-04-29 2:05 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 03/32] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 04/32] vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 05/32] vfio: powerpc/spapr: Use it_page_size Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 06/32] vfio: powerpc/spapr: Move locked_vm accounting to helpers Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 07/32] vfio: powerpc/spapr: Disable DMA mappings on disabled container Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 08/32] vfio: powerpc/spapr: Moving pinning/unpinning to helpers Alexey Kardashevskiy
2015-04-29 2:14 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 09/32] vfio: powerpc/spapr: Rework groups attaching Alexey Kardashevskiy
2015-04-29 2:16 ` David Gibson
2015-04-30 2:29 ` Alexey Kardashevskiy
2015-04-30 4:05 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 10/32] powerpc/powernv: Do not set "read" flag if direction==DMA_NONE Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 11/32] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 12/32] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group Alexey Kardashevskiy
2015-04-29 2:49 ` David Gibson
2015-04-30 2:30 ` Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 13/32] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control Alexey Kardashevskiy
2015-04-29 3:02 ` David Gibson
2015-04-29 9:19 ` Alexey Kardashevskiy
2015-04-30 4:08 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 14/32] powerpc/iommu: Fix IOMMU ownership control functions Alexey Kardashevskiy
2015-04-29 3:08 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 15/32] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() Alexey Kardashevskiy
2015-04-29 3:18 ` David Gibson
2015-04-30 2:58 ` Alexey Kardashevskiy
2015-04-30 4:16 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE Alexey Kardashevskiy
2015-04-27 21:05 ` Alex Williamson
2015-04-29 3:25 ` David Gibson
2015-04-29 9:00 ` Alexey Kardashevskiy
2015-04-30 4:18 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 17/32] powerpc/powernv: Implement accessor to TCE entry Alexey Kardashevskiy
2015-04-29 4:04 ` David Gibson
2015-04-29 9:02 ` Alexey Kardashevskiy
2015-04-30 0:13 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE Alexey Kardashevskiy
2015-04-29 4:18 ` David Gibson
2015-04-29 9:51 ` Alexey Kardashevskiy
2015-04-30 4:21 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 19/32] powerpc/powernv/ioda2: Rework iommu_table creation Alexey Kardashevskiy
2015-04-29 4:27 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 20/32] powerpc/powernv/ioda2: Introduce pnv_pci_create_table/pnv_pci_free_table Alexey Kardashevskiy
2015-04-29 4:39 ` David Gibson
2015-04-29 9:12 ` Alexey Kardashevskiy
2015-04-30 4:24 ` David Gibson
2015-05-01 10:13 ` Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 21/32] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window Alexey Kardashevskiy
2015-04-29 4:45 ` David Gibson
2015-04-29 9:26 ` Alexey Kardashevskiy
2015-04-30 4:32 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 22/32] powerpc/powernv: Implement multilevel TCE tables Alexey Kardashevskiy
2015-04-29 5:04 ` David Gibson
2015-05-01 9:48 ` Alexey Kardashevskiy
2015-05-05 12:05 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks Alexey Kardashevskiy
2015-04-29 5:30 ` David Gibson
2015-04-29 9:44 ` Alexey Kardashevskiy
2015-04-30 4:37 ` David Gibson
2015-04-30 9:56 ` Alexey Kardashevskiy
2015-05-01 3:36 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 24/32] powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 25/32] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership Alexey Kardashevskiy
2015-04-29 5:39 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table Alexey Kardashevskiy
2015-04-29 6:31 ` David Gibson
2015-05-01 4:01 ` Alexey Kardashevskiy
2015-05-01 4:23 ` David Gibson
2015-05-01 7:12 ` Alexey Kardashevskiy
2015-05-05 12:02 ` David Gibson
2015-05-11 2:11 ` Alexey Kardashevskiy
2015-05-11 4:52 ` Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table Alexey Kardashevskiy
2015-04-29 6:40 ` David Gibson
2015-05-01 4:10 ` Alexey Kardashevskiy
2015-05-01 5:12 ` David Gibson
2015-05-01 6:53 ` Alexey Kardashevskiy
2015-05-05 11:58 ` David Gibson
2015-05-11 2:24 ` Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache Alexey Kardashevskiy
2015-04-29 7:01 ` David Gibson
2015-05-01 11:26 ` Alexey Kardashevskiy
2015-05-05 12:12 ` David Gibson
2015-04-30 6:34 ` David Gibson
2015-04-30 8:25 ` Paul Mackerras
2015-05-01 3:39 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2 Alexey Kardashevskiy
2015-04-30 6:55 ` David Gibson
2015-05-01 4:35 ` Alexey Kardashevskiy
2015-05-01 5:23 ` David Gibson
2015-05-01 6:27 ` Alexey Kardashevskiy
2015-05-05 11:53 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 30/32] vfio: powerpc/spapr: Use 32bit DMA window properties from table_group Alexey Kardashevskiy
2015-04-27 22:18 ` Alex Williamson
2015-04-30 6:58 ` David Gibson
2015-04-25 12:14 ` [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible Alexey Kardashevskiy
2015-04-30 7:22 ` David Gibson
2015-04-30 9:33 ` Alexey Kardashevskiy
2015-05-01 0:46 ` Benjamin Herrenschmidt
2015-05-01 4:44 ` David Gibson
2015-05-01 4:33 ` David Gibson
2015-05-01 6:05 ` Alexey Kardashevskiy
2015-05-05 11:50 ` David Gibson [this message]
2015-05-11 2:26 ` Alexey Kardashevskiy
2015-04-25 12:14 ` [PATCH kernel v9 32/32] vfio: powerpc/spapr: Support Dynamic DMA windows Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150505115025.GJ14090@voom.redhat.com \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).