From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Joerg Roedel <jroedel@suse.de>,
kvm@vger.kernel.org, Fabiano Rosas <farosas@linux.ibm.com>,
linuxppc-dev@lists.ozlabs.org,
Daniel Henrique Barboza <danielhb413@gmail.com>,
Nicholas Piggin <npiggin@gmail.com>,
Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
kvm-ppc@vger.kernel.org,
Alex Williamson <alex.williamson@redhat.com>,
Oliver O'Halloran <oohall@gmail.com>,
Joel Stanley <joel@jms.id.au>,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Date: Thu, 7 Jul 2022 12:10:02 -0300 [thread overview]
Message-ID: <20220707151002.GB1705032@nvidia.com> (raw)
In-Reply-To: <20220707135552.3688927-1-aik@ozlabs.ru>
On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
>
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
>
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
>
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
>
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
>
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.
> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,
I wouldn't worry about it..
> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
> pr_debug("%s: Adding %s to iommu group %d\n",
> __func__, dev_name(dev), iommu_group_id(table_group->group));
>
> - return iommu_group_add_device(table_group->group, dev);
> + ret = iommu_probe_device(dev);
> + dev_info(dev, "probed with %d\n", ret);
For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?
> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> + return false;
> +}
I think you can NULL this op:
static bool iommu_is_attach_deferred(struct device *dev)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
if (ops->is_attach_deferred)
return ops->is_attach_deferred(dev);
return false;
}
> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> + struct pci_controller *hose;
> + struct pci_dev *pdev;
> +
> + /* Weirdly iommu_device_register() assigns the same ops to all buses */
> + if (!dev_is_pci(dev))
> + return ERR_PTR(-EPERM);
> +
> + pdev = to_pci_dev(dev);
> + hose = pdev->bus->sysdata;
> +
> + if (!hose->controller_ops.device_group)
> + return ERR_PTR(-ENOENT);
> +
> + return hose->controller_ops.device_group(hose, pdev);
> +}
Is this missing a refcount get on the group?
> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> + struct device *dev)
> +{
> + return 0;
> +}
It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.
It will turn into a security problem to allow kernel mappings to leak
past this point.
Jason
next prev parent reply other threads:[~2022-07-07 15:11 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-07 13:55 [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-07-07 15:10 ` Jason Gunthorpe [this message]
2022-07-08 5:00 ` Alexey Kardashevskiy
2022-07-08 6:34 ` Alexey Kardashevskiy
2022-07-08 7:32 ` Tian, Kevin
2022-07-08 9:45 ` Alexey Kardashevskiy
2022-07-08 10:18 ` Tian, Kevin
2022-07-29 2:21 ` Alexey Kardashevskiy
2022-07-29 2:53 ` Oliver O'Halloran
2022-07-29 3:10 ` Tian, Kevin
2022-07-29 3:50 ` Alexey Kardashevskiy
2022-07-29 4:24 ` Tian, Kevin
2022-07-29 12:09 ` Jason Gunthorpe
2022-07-08 11:55 ` Jason Gunthorpe
2022-07-08 13:10 ` Alexey Kardashevskiy
2022-07-08 13:19 ` Jason Gunthorpe
2022-07-08 13:32 ` Alexey Kardashevskiy
2022-07-08 13:59 ` Jason Gunthorpe
2022-07-09 2:58 ` Alexey Kardashevskiy
2022-07-10 6:29 ` Jason Gunthorpe
2022-07-10 12:32 ` Alexey Kardashevskiy
2022-07-11 13:24 ` Alexey Kardashevskiy
2022-07-11 18:46 ` Jason Gunthorpe
2022-07-12 2:27 ` Alexey Kardashevskiy
2022-07-12 5:44 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220707151002.GB1705032@nvidia.com \
--to=jgg@nvidia.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=danielhb413@gmail.com \
--cc=farosas@linux.ibm.com \
--cc=joel@jms.id.au \
--cc=jroedel@suse.de \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=muriloo@linux.ibm.com \
--cc=npiggin@gmail.com \
--cc=oohall@gmail.com \
--cc=robin.murphy@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).