From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org,
Alex Williamson <alex.williamson@redhat.com>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v10 02/34] powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group
Date: Wed, 13 May 2015 17:26:02 +1000 [thread overview]
Message-ID: <5552FC8A.5080401@ozlabs.ru> (raw)
In-Reply-To: <20150513051831.GA18992@gwshan>
On 05/13/2015 03:18 PM, Gavin Shan wrote:
> On Tue, May 12, 2015 at 01:38:51AM +1000, Alexey Kardashevskiy wrote:
>> The set_iommu_table_base_and_group() name suggests that the function
>> sets table base and add a device to an IOMMU group. However actual
>> table base setting happens in pnv_pci_ioda_dma_dev_setup().
>>
>
> On PHB3, the DMA32 IOMMU table is created during PHB fixup time
> in ppc_md.pcibios_fixup(), which is invoked at end of PCI enumeration.
> The IOMMU table of PCI devices are initialized at same time.
> pnv_pci_ioda_dma_dev_setup() is called when adding PCI device
> or fixing up PCI bus at PCI enumeration time. So the commit logs
> here isn't accurate enough.
Right. I'll remove the second sentence.
> Basically, set_iommu_table_base_and_group() which does two things
> in one function, which is nice. I guess you don't need this function
> any more after DDW is supported and it's the reason to remove it?
Ideally iommu_add_device() should not need any iommu_table pointer at all,
it need an iommu_table_group which is IOMMU which is one pe PE. Not a table
(we can have more than just one or even none).
set_iommu_table_base() sets the table for later use by the platform DMA
code (first half of arch/powerpc/kernel/iommu.c, all these
iommu_alloc/iommu_free). If no device driver was loaded for a device in a
group - calling set_iommu_table_base() is not needed.
KVM/VFIO do not call get_iommu_table_base() and do not use the table_base
(except iommu_add_device()) and there is no point for VFIO-related bits of
IOMMU code to rely on whether set_iommu_table_base() was called or not. It
could make sense when table's life time was exactly the same as PE's
lifetime but with DDW it is not the case anymore.
What we have now is a workaround - device-to-group assignment does not have
to do anything with the iommu_table itself, it just needs it_table_group
pointer. In general, I would like to use common iommu_group_get_for_dev(),
it just won't work for us right now but we can try fixing that code. Or we
can add root devices into groups when we configure PEs
(pnv_pci_ioda2_setup_dma_pe()) and later when we add actual PCI functions
(physical or SRIOV) from tce_iommu_bus_notifier - we can walk up PCI bus
hierarchy till we find a first parent with a group set and add the child to
this group. We can do this later.
So the function might be nice but it was confusing me (and I added it at
the first place). It is easy to miss iommu_add_device() part while reading
the code because it was disguised by "set_iommu_table_base" prefix.
>> The actual purpose for table base setting is to put some reference
>> into a device so later iommu_add_device() can get the IOMMU group
>> reference and the device to the group.
>>
>> At the moment a group cannot be explicitly passed to iommu_add_device()
>> as we want it to work from the bus notifier, we can fix it later and
>> remove confusing calls of set_iommu_table_base().
>>
>> This replaces set_iommu_table_base_and_group() with a couple of
>> set_iommu_table_base() + iommu_add_device() which makes reading the code
>> easier.
>>
>> This adds few comments why set_iommu_table_base() and iommu_add_device()
>> are called where they are called.
>>
>> For IODA1/2, this essentially removes iommu_add_device() call from
>> the pnv_pci_ioda_dma_dev_setup() as it will always fail at this particular
>> place:
>> - for physical PE, the device is already attached by iommu_add_device()
>> in pnv_pci_ioda_setup_dma_pe();
>> - for virtual PE, the sysfs entries are not ready to create all symlinks
>> so actual adding is happening in tce_iommu_bus_notifier.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>> ---
>> Changes:
>> v10:
>> * new to the series
>> ---
>> arch/powerpc/include/asm/iommu.h | 7 -------
>> arch/powerpc/platforms/powernv/pci-ioda.c | 27 +++++++++++++++++++++++----
>> arch/powerpc/platforms/powernv/pci-p5ioc2.c | 3 ++-
>> arch/powerpc/platforms/pseries/iommu.c | 15 ++++++++-------
>> 4 files changed, 33 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
>> index 1e27d63..8353c86 100644
>> --- a/arch/powerpc/include/asm/iommu.h
>> +++ b/arch/powerpc/include/asm/iommu.h
>> @@ -140,13 +140,6 @@ static inline int __init tce_iommu_bus_notifier_init(void)
>> }
>> #endif /* !CONFIG_IOMMU_API */
>>
>> -static inline void set_iommu_table_base_and_group(struct device *dev,
>> - void *base)
>> -{
>> - set_iommu_table_base(dev, base);
>> - iommu_add_device(dev);
>> -}
>> -
>> extern int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
>> struct scatterlist *sglist, int nelems,
>> unsigned long mask,
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 2f092bb..9a77f3c 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -1598,7 +1598,13 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
>>
>> pe = &phb->ioda.pe_array[pdn->pe_number];
>> WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
>> - set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
>> + set_iommu_table_base(&pdev->dev, pe->tce32_table);
>> + /*
>> + * Note: iommu_add_device() will fail here as
>> + * for physical PE: the device is already added by now;
>> + * for virtual PE: sysfs entries are not ready yet and
>> + * tce_iommu_bus_notifier will add the device to a group later.
>> + */
>
> I didn't figure out how the IOMMU table is initialized for PCI device in this
> function during bootup time. At system bootup time, the function is only called
> when applying fixup to PCI bus in pcibios_fixup_bus(). At that time, we don't
> have PE# yet, which is allocated at PHB fixup time (ppc_md.pcibios_fixup_phb).
My understanding is:
pnv_pci_ioda_dma_dev_setup() is called when
1. the device is just discovered
2. PE is configured
3. driver called set_dma_mask
The actual table gets initialized (or allocated + initialized after this
patchset) at 2). So passing anything (NULL or a pointer to uninitialized
iommu_table) to set_iommu_table_base() at 1) does not make any difference -
iommu_alloc() won't be used and won't work by then anyway.
But since 1), 2), 3) all call pnv_pci_ioda_dma_dev_setup() and we do not
know which case it is - we just call set_iommu_table_base() from there in a
hope that iommu_alloc() happens after set_iommu_table_base() was lucky
enough to be called with initialized iommu_table pointer.
--
Alexey
next prev parent reply other threads:[~2015-05-13 7:26 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-11 15:38 [PATCH kernel v10 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows Alexey Kardashevskiy
2015-05-11 15:38 ` [PATCH kernel v10 01/34] powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group Alexey Kardashevskiy
2015-05-12 1:51 ` Gavin Shan
2015-05-11 15:38 ` [PATCH kernel v10 02/34] powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group Alexey Kardashevskiy
2015-05-13 5:18 ` Gavin Shan
2015-05-13 7:26 ` Alexey Kardashevskiy [this message]
2015-05-11 15:38 ` [PATCH kernel v10 03/34] powerpc/powernv/ioda: Clean up IOMMU group registration Alexey Kardashevskiy
2015-05-13 5:21 ` Gavin Shan
2015-05-11 15:38 ` [PATCH kernel v10 04/34] powerpc/iommu: Put IOMMU group explicitly Alexey Kardashevskiy
2015-05-13 5:27 ` Gavin Shan
2015-05-11 15:38 ` [PATCH kernel v10 05/34] powerpc/iommu: Always release iommu_table in iommu_free_table() Alexey Kardashevskiy
2015-05-13 5:33 ` Gavin Shan
2015-05-13 6:30 ` Alexey Kardashevskiy
2015-05-13 12:51 ` Thomas Huth
2015-05-13 23:27 ` Gavin Shan
2015-05-14 2:34 ` Alexey Kardashevskiy
2015-05-14 2:53 ` Alex Williamson
2015-05-14 6:29 ` Alexey Kardashevskiy
2015-05-11 15:38 ` [PATCH kernel v10 06/34] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver Alexey Kardashevskiy
2015-05-13 5:58 ` Gavin Shan
2015-05-13 6:32 ` Alexey Kardashevskiy
2015-05-11 15:38 ` [PATCH kernel v10 07/34] vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page Alexey Kardashevskiy
2015-05-13 6:06 ` Gavin Shan
2015-05-11 15:38 ` [PATCH kernel v10 08/34] vfio: powerpc/spapr: Use it_page_size Alexey Kardashevskiy
2015-05-13 6:12 ` Gavin Shan
2015-05-11 15:38 ` [PATCH kernel v10 09/34] vfio: powerpc/spapr: Move locked_vm accounting to helpers Alexey Kardashevskiy
2015-05-13 6:18 ` Gavin Shan
2015-05-11 15:38 ` [PATCH kernel v10 10/34] vfio: powerpc/spapr: Disable DMA mappings on disabled container Alexey Kardashevskiy
2015-05-13 6:20 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 11/34] vfio: powerpc/spapr: Moving pinning/unpinning to helpers Alexey Kardashevskiy
2015-05-13 6:32 ` Gavin Shan
2015-05-13 7:30 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 12/34] vfio: powerpc/spapr: Rework groups attaching Alexey Kardashevskiy
2015-05-13 23:35 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 13/34] powerpc/powernv: Do not set "read" flag if direction==DMA_NONE Alexey Kardashevskiy
2015-05-14 0:00 ` Gavin Shan
2015-05-14 2:51 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 14/34] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table Alexey Kardashevskiy
2015-05-14 0:23 ` Gavin Shan
2015-05-14 3:07 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 15/34] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() Alexey Kardashevskiy
2015-05-14 0:48 ` Gavin Shan
2015-05-14 3:19 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 16/34] powerpc/spapr: vfio: Replace iommu_table with iommu_table_group Alexey Kardashevskiy
2015-05-13 21:30 ` Alex Williamson
2015-05-14 1:21 ` Gavin Shan
2015-05-14 3:31 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group Alexey Kardashevskiy
2015-05-14 1:52 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 18/34] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control Alexey Kardashevskiy
2015-05-14 2:01 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 19/34] powerpc/iommu: Fix IOMMU ownership control functions Alexey Kardashevskiy
2015-05-14 3:36 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 20/34] powerpc/powernv/ioda2: Move TCE kill register address to PE Alexey Kardashevskiy
2015-05-14 2:10 ` Gavin Shan
2015-05-14 3:39 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 21/34] powerpc/powernv/ioda2: Add TCE invalidation for all attached groups Alexey Kardashevskiy
2015-05-14 2:22 ` Gavin Shan
2015-05-14 3:50 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 22/34] powerpc/powernv: Implement accessor to TCE entry Alexey Kardashevskiy
2015-05-14 2:34 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 23/34] powerpc/iommu/powernv: Release replaced TCE Alexey Kardashevskiy
2015-05-13 15:00 ` Thomas Huth
2015-05-14 3:53 ` Alexey Kardashevskiy
2015-05-15 8:09 ` Thomas Huth
2015-05-11 15:39 ` [PATCH kernel v10 24/34] powerpc/powernv/ioda2: Rework iommu_table creation Alexey Kardashevskiy
2015-05-14 4:14 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 25/34] powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages Alexey Kardashevskiy
2015-05-14 4:31 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 26/34] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window Alexey Kardashevskiy
2015-05-14 5:01 ` Gavin Shan
2015-05-11 15:39 ` [PATCH kernel v10 27/34] powerpc/powernv: Implement multilevel TCE tables Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 28/34] vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API Alexey Kardashevskiy
2015-05-13 21:30 ` Alex Williamson
2015-05-11 15:39 ` [PATCH kernel v10 29/34] powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 30/34] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 31/34] vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 32/34] powerpc/mmu: Add userspace-to-physical addresses translation cache Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 33/34] vfio: powerpc/spapr: Register memory and define IOMMU v2 Alexey Kardashevskiy
2015-05-13 21:30 ` Alex Williamson
2015-05-14 6:08 ` Alexey Kardashevskiy
2015-05-11 15:39 ` [PATCH kernel v10 34/34] vfio: powerpc/spapr: Support Dynamic DMA windows Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5552FC8A.5080401@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=weiyang@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).