All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
	Robin Murphy <robin.murphy@arm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Joerg Roedel <jroedel@suse.de>, Joel Stanley <joel@jms.id.au>,
	Alex Williamson <alex.williamson@redhat.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	Fabiano Rosas <farosas@linux.ibm.com>,
	Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>
Subject: Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Date: Thu, 07 Jul 2022 15:10:02 +0000	[thread overview]
Message-ID: <20220707151002.GB1705032@nvidia.com> (raw)
In-Reply-To: <20220707135552.3688927-1-aik@ozlabs.ru>

On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
> 
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
> 
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?idU37fcb319d016ce387
> 
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
> 
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
> 
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
> 
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.

> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,

I wouldn't worry about it..

> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>  	pr_debug("%s: Adding %s to iommu group %d\n",
>  		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>  
> -	return iommu_group_add_device(table_group->group, dev);
> +	ret = iommu_probe_device(dev);
> +	dev_info(dev, "probed with %d\n", ret);

For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?

> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> +       return false;
> +}

I think you can NULL this op:

static bool iommu_is_attach_deferred(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);

	if (ops->is_attach_deferred)
		return ops->is_attach_deferred(dev);

	return false;
}

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);
> +
> +	pdev = to_pci_dev(dev);
> +	hose = pdev->bus->sysdata;
> +
> +	if (!hose->controller_ops.device_group)
> +		return ERR_PTR(-ENOENT);
> +
> +	return hose->controller_ops.device_group(hose, pdev);
> +}

Is this missing a refcount get on the group?

> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	return 0;
> +}

It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.

It will turn into a security problem to allow kernel mappings to leak
past this point.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Joerg Roedel <jroedel@suse.de>,
	kvm@vger.kernel.org, Fabiano Rosas <farosas@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
	kvm-ppc@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Joel Stanley <joel@jms.id.au>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Date: Thu, 7 Jul 2022 12:10:02 -0300	[thread overview]
Message-ID: <20220707151002.GB1705032@nvidia.com> (raw)
In-Reply-To: <20220707135552.3688927-1-aik@ozlabs.ru>

On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
> 
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
> 
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
> 
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
> 
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
> 
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
> 
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.

> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,

I wouldn't worry about it..

> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>  	pr_debug("%s: Adding %s to iommu group %d\n",
>  		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>  
> -	return iommu_group_add_device(table_group->group, dev);
> +	ret = iommu_probe_device(dev);
> +	dev_info(dev, "probed with %d\n", ret);

For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?

> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> +       return false;
> +}

I think you can NULL this op:

static bool iommu_is_attach_deferred(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);

	if (ops->is_attach_deferred)
		return ops->is_attach_deferred(dev);

	return false;
}

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);
> +
> +	pdev = to_pci_dev(dev);
> +	hose = pdev->bus->sysdata;
> +
> +	if (!hose->controller_ops.device_group)
> +		return ERR_PTR(-ENOENT);
> +
> +	return hose->controller_ops.device_group(hose, pdev);
> +}

Is this missing a refcount get on the group?

> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	return 0;
> +}

It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.

It will turn into a security problem to allow kernel mappings to leak
past this point.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
	Robin Murphy <robin.murphy@arm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Joerg Roedel <jroedel@suse.de>, Joel Stanley <joel@jms.id.au>,
	Alex Williamson <alex.williamson@redhat.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	Fabiano Rosas <farosas@linux.ibm.com>,
	Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>
Subject: Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Date: Thu, 7 Jul 2022 12:10:02 -0300	[thread overview]
Message-ID: <20220707151002.GB1705032@nvidia.com> (raw)
In-Reply-To: <20220707135552.3688927-1-aik@ozlabs.ru>

On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
> 
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
> 
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
> 
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
> 
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
> 
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
> 
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.

> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,

I wouldn't worry about it..

> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>  	pr_debug("%s: Adding %s to iommu group %d\n",
>  		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>  
> -	return iommu_group_add_device(table_group->group, dev);
> +	ret = iommu_probe_device(dev);
> +	dev_info(dev, "probed with %d\n", ret);

For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?

> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> +       return false;
> +}

I think you can NULL this op:

static bool iommu_is_attach_deferred(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);

	if (ops->is_attach_deferred)
		return ops->is_attach_deferred(dev);

	return false;
}

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);
> +
> +	pdev = to_pci_dev(dev);
> +	hose = pdev->bus->sysdata;
> +
> +	if (!hose->controller_ops.device_group)
> +		return ERR_PTR(-ENOENT);
> +
> +	return hose->controller_ops.device_group(hose, pdev);
> +}

Is this missing a refcount get on the group?

> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	return 0;
> +}

It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.

It will turn into a security problem to allow kernel mappings to leak
past this point.

Jason

  reply	other threads:[~2022-07-07 15:10 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 13:55 [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-07-07 13:55 ` Alexey Kardashevskiy
2022-07-07 13:55 ` Alexey Kardashevskiy
2022-07-07 15:10 ` Jason Gunthorpe [this message]
2022-07-07 15:10   ` Jason Gunthorpe
2022-07-07 15:10   ` Jason Gunthorpe
2022-07-08  4:58   ` Alexey Kardashevskiy
2022-07-08  5:00     ` Alexey Kardashevskiy
2022-07-08  5:00     ` Alexey Kardashevskiy
2022-07-08  6:34     ` Alexey Kardashevskiy
2022-07-08  6:34       ` Alexey Kardashevskiy
2022-07-08  6:34       ` Alexey Kardashevskiy
2022-07-08  7:32       ` Tian, Kevin
2022-07-08  7:32         ` Tian, Kevin
2022-07-08  7:32         ` Tian, Kevin
2022-07-08  9:45         ` Alexey Kardashevskiy
2022-07-08  9:45           ` Alexey Kardashevskiy
2022-07-08  9:45           ` Alexey Kardashevskiy
2022-07-08 10:18           ` Tian, Kevin
2022-07-08 10:18             ` Tian, Kevin
2022-07-08 10:18             ` Tian, Kevin
2022-07-29  2:21         ` Alexey Kardashevskiy
2022-07-29  2:21           ` Alexey Kardashevskiy
2022-07-29  2:21           ` Alexey Kardashevskiy
2022-07-29  2:53           ` Oliver O'Halloran
2022-07-29  2:53             ` Oliver O'Halloran
2022-07-29  2:53             ` Oliver O'Halloran
2022-07-29  3:10             ` Tian, Kevin
2022-07-29  3:10               ` Tian, Kevin
2022-07-29  3:10               ` Tian, Kevin
2022-07-29  3:50               ` Alexey Kardashevskiy
2022-07-29  3:50                 ` Alexey Kardashevskiy
2022-07-29  3:50                 ` Alexey Kardashevskiy
2022-07-29  4:24                 ` Tian, Kevin
2022-07-29  4:24                   ` Tian, Kevin
2022-07-29  4:24                   ` Tian, Kevin
2022-07-29 12:09                   ` Jason Gunthorpe
2022-07-29 12:09                     ` Jason Gunthorpe
2022-07-29 12:09                     ` Jason Gunthorpe
2022-07-08 11:55       ` Jason Gunthorpe
2022-07-08 11:55         ` Jason Gunthorpe
2022-07-08 11:55         ` Jason Gunthorpe
2022-07-08 13:10         ` Alexey Kardashevskiy
2022-07-08 13:10           ` Alexey Kardashevskiy
2022-07-08 13:10           ` Alexey Kardashevskiy
2022-07-08 13:19           ` Jason Gunthorpe
2022-07-08 13:19             ` Jason Gunthorpe
2022-07-08 13:19             ` Jason Gunthorpe
2022-07-08 13:32             ` Alexey Kardashevskiy
2022-07-08 13:32               ` Alexey Kardashevskiy
2022-07-08 13:32               ` Alexey Kardashevskiy
2022-07-08 13:59               ` Jason Gunthorpe
2022-07-08 13:59                 ` Jason Gunthorpe
2022-07-08 13:59                 ` Jason Gunthorpe
2022-07-09  2:58         ` Alexey Kardashevskiy
2022-07-09  2:58           ` Alexey Kardashevskiy
2022-07-09  2:58           ` Alexey Kardashevskiy
2022-07-10  6:29           ` Jason Gunthorpe
2022-07-10  6:29             ` Jason Gunthorpe
2022-07-10  6:29             ` Jason Gunthorpe
2022-07-10 12:32             ` Alexey Kardashevskiy
2022-07-10 12:32               ` Alexey Kardashevskiy
2022-07-10 12:32               ` Alexey Kardashevskiy
2022-07-11 13:24               ` Alexey Kardashevskiy
2022-07-11 13:24                 ` Alexey Kardashevskiy
2022-07-11 13:24                 ` Alexey Kardashevskiy
2022-07-11 18:46                 ` Jason Gunthorpe
2022-07-11 18:46                   ` Jason Gunthorpe
2022-07-11 18:46                   ` Jason Gunthorpe
2022-07-12  2:27                   ` Alexey Kardashevskiy
2022-07-12  2:27                     ` Alexey Kardashevskiy
2022-07-12  2:27                     ` Alexey Kardashevskiy
2022-07-12  5:44                     ` Jason Gunthorpe
2022-07-12  5:44                       ` Jason Gunthorpe
2022-07-12  5:44                       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220707151002.GB1705032@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=danielhb413@gmail.com \
    --cc=farosas@linux.ibm.com \
    --cc=joel@jms.id.au \
    --cc=jroedel@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=muriloo@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=oohall@gmail.com \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.