linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joerg Roedel <jroedel@suse.de>,
	kvm@vger.kernel.org, Fabiano Rosas <farosas@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
	kvm-ppc@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Joel Stanley <joel@jms.id.au>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Date: Fri, 8 Jul 2022 16:34:55 +1000	[thread overview]
Message-ID: <bbe29694-66a3-275b-5a79-71237ad7388f@ozlabs.ru> (raw)
In-Reply-To: <bb8f4c93-6cbc-0106-d4c1-1f3c0751fbba@ozlabs.ru>



On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> 
> 
> On 7/8/22 01:10, Jason Gunthorpe wrote:
>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>> the Type1 VFIO driver. Recent development though has added a coherency
>>> capability check to the generic part of VFIO and essentially disabled
>>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>>
>>> This adds an iommu_ops stub which reports support for cache
>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI 
>>> devices,
>>> this provides minimum code for the probing to not crash.
>>>
>>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>>> after bus_register(&pci_bus_type) which is postcore_initcall
>>> TODO: check if it still works, read sha1, for more details:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>>>
>>> Because setting the ops triggers probing, this does not work well with
>>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>>
>>> Because iommu_probe_device() does not take the group (which is why
>>> we had the helper in the first place), this adds
>>> pci_controller_ops::device_group.
>>>
>>> So, basically there is one iommu_device per PHB and devices are added to
>>> groups indirectly via series of calls inside the IOMMU code.
>>>
>>> pSeries is out of scope here (a minor fix needed for barely supported
>>> platform in regard to VFIO).
>>>
>>> The previous discussion is here:
>>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>>
>> I think this is basically OK, for what it is. It looks like there is
>> more some-day opportunity to make use of the core infrastructure though.
>>
>>> does it make sense to have this many callbacks, or
>>> the generic IOMMU code can safely operate without some
>>> (given I add some more checks for !NULL)? thanks,
>>
>> I wouldn't worry about it..
>>
>>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group 
>>> *table_group, struct device *dev)
>>>       pr_debug("%s: Adding %s to iommu group %d\n",
>>>            __func__, dev_name(dev),  
>>> iommu_group_id(table_group->group));
>>> -    return iommu_group_add_device(table_group->group, dev);
>>> +    ret = iommu_probe_device(dev);
>>> +    dev_info(dev, "probed with %d\n", ret);
>>
>> For another day, but it seems a bit strange to call 
>> iommu_probe_device() like this?
>> Shouldn't one of the existing call sites cover this? The one in
>> of_iommu.c perhaps?
> 
> 
> It looks to me that of_iommu.c expects the iommu setup to happen before 
> linux starts as linux looks for #iommu-cells or iommu-map properties in 
> the device tree. The powernv firmware (aka skiboot) does not do this and 
> it is linux which manages iommu groups.
> 
> 
>>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>>> +{
>>> +       return false;
>>> +}
>>
>> I think you can NULL this op:
>>
>> static bool iommu_is_attach_deferred(struct device *dev)
>> {
>>     const struct iommu_ops *ops = dev_iommu_ops(dev);
>>
>>     if (ops->is_attach_deferred)
>>         return ops->is_attach_deferred(dev);
>>
>>     return false;
>> }
>>
>>> +static struct iommu_group *spapr_tce_iommu_device_group(struct 
>>> device *dev)
>>> +{
>>> +    struct pci_controller *hose;
>>> +    struct pci_dev *pdev;
>>> +
>>> +    /* Weirdly iommu_device_register() assigns the same ops to all 
>>> buses */
>>> +    if (!dev_is_pci(dev))
>>> +        return ERR_PTR(-EPERM);
>>> +
>>> +    pdev = to_pci_dev(dev);
>>> +    hose = pdev->bus->sysdata;
>>> +
>>> +    if (!hose->controller_ops.device_group)
>>> +        return ERR_PTR(-ENOENT);
>>> +
>>> +    return hose->controller_ops.device_group(hose, pdev);
>>> +}
>>
>> Is this missing a refcount get on the group?
>>
>>> +
>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>> +                      struct device *dev)
>>> +{
>>> +    return 0;
>>> +}
>>
>> It is important when this returns that the iommu translation is all
>> emptied. There should be no left over translations from the DMA API at
>> this point. I have no idea how power works in this regard, but it
>> should be explained why this is safe in a comment at a minimum.
>>
>  > It will turn into a security problem to allow kernel mappings to leak
>  > past this point.
>  >
> 
> I've added for v2 checking for no valid mappings for a device (or, more 
> precisely, in the associated iommu_group), this domain does not need 
> checking, right?


Uff, not that simple. Looks like once a device is in a group, its 
dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess then 
there is a way to set those to NULL or do something similar to let
dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is 
fine to do nothing as tce_iommu_take_ownership() and 
tce_iommu_take_ownership_ddw() take care of not having active DMA 
mappings. Thanks,


> 
> In general, is "domain" something from hardware or it is a software 
> concept? Thanks,
> 
> 
>> Jason
> 

-- 
Alexey

  reply	other threads:[~2022-07-08  6:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 13:55 [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-07-07 15:10 ` Jason Gunthorpe
2022-07-08  5:00   ` Alexey Kardashevskiy
2022-07-08  6:34     ` Alexey Kardashevskiy [this message]
2022-07-08  7:32       ` Tian, Kevin
2022-07-08  9:45         ` Alexey Kardashevskiy
2022-07-08 10:18           ` Tian, Kevin
2022-07-29  2:21         ` Alexey Kardashevskiy
2022-07-29  2:53           ` Oliver O'Halloran
2022-07-29  3:10             ` Tian, Kevin
2022-07-29  3:50               ` Alexey Kardashevskiy
2022-07-29  4:24                 ` Tian, Kevin
2022-07-29 12:09                   ` Jason Gunthorpe
2022-07-08 11:55       ` Jason Gunthorpe
2022-07-08 13:10         ` Alexey Kardashevskiy
2022-07-08 13:19           ` Jason Gunthorpe
2022-07-08 13:32             ` Alexey Kardashevskiy
2022-07-08 13:59               ` Jason Gunthorpe
2022-07-09  2:58         ` Alexey Kardashevskiy
2022-07-10  6:29           ` Jason Gunthorpe
2022-07-10 12:32             ` Alexey Kardashevskiy
2022-07-11 13:24               ` Alexey Kardashevskiy
2022-07-11 18:46                 ` Jason Gunthorpe
2022-07-12  2:27                   ` Alexey Kardashevskiy
2022-07-12  5:44                     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbe29694-66a3-275b-5a79-71237ad7388f@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=danielhb413@gmail.com \
    --cc=farosas@linux.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=joel@jms.id.au \
    --cc=jroedel@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=muriloo@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=oohall@gmail.com \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).