From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
Jonathan Corbet <corbet@lwn.net>,
David Woodhouse <dwmw2@infradead.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Joerg Roedel <joro@8bytes.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Miguel Ojeda <ojeda@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Shuah Khan <shuah@kernel.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Tom Rix <trix@redhat.com>, Will Deacon <will@kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
Cornelia Huck <cohuck@redhat.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
David Gibson <david@gibson.dropbear.id.au>,
Eric Auger <eric.auger@redhat.com>,
Eric Farman <farman@linux.ibm.com>,
Jason Wang <jasowang@redhat.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
"Martins, Joao" <joao.m.martins@oracle.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Matthew Rosato <mjrosato@linux.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH v3 12/15] iommufd: Add kAPI toward external drivers for physical devices
Date: Mon, 7 Nov 2022 13:54:42 -0400 [thread overview]
Message-ID: <Y2lGYkO9X0neQTP5@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB527617D35DE59BD0EF10F7098C3A9@BN9PR11MB5276.namprd11.prod.outlook.com>
On Sat, Nov 05, 2022 at 07:17:38AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, October 26, 2022 2:12 AM
> >
> > +/**
> > + * iommufd_device_bind - Bind a physical device to an iommu fd
> > + * @ictx: iommufd file descriptor
> > + * @dev: Pointer to a physical PCI device struct
> > + * @id: Output ID number to return to userspace for this device
> > + *
> > + * A successful bind establishes an ownership over the device and returns
> > + * struct iommufd_device pointer, otherwise returns error pointer.
> > + *
> > + * A driver using this API must set driver_managed_dma and must not touch
> > + * the device until this routine succeeds and establishes ownership.
> > + *
> > + * Binding a PCI device places the entire RID under iommufd control.
> > + *
>
> Then what about non-PCI device? clearer to say that calling this interface
> just places the entire physical device under iommufd control
We don't know what other bus types will do with this API. The above is
guidance specifically for PCI, other bus types should add their own
guidance as it evolves.
> > + * FIXME: This is conceptually broken for iommufd since we want to
> > allow
> > + * userspace to change the domains, eg switch from an identity IOAS
> > to a
> > + * DMA IOAs. There is currently no way to create a MSI window that
>
> IOAs -> IOAS
Done
>
> > + rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start);
> > + if (rc && rc != -ENODEV)
> > + return rc;
>
> I know this is copied from VFIO but a comment is appreciated why
> -ENODEV is considered sane to move forward.
Huh. I actually don't know. It looks like it is here to detect that
the static inline for !CONFIG_IOMMU_DMA returns it.
However, if we have !CONFIG_IOMMU_DMA then I think we also don't get
interrupts at all. Ie iommu_dma_prepare_msi() becomes a NOP and
nothing will map the ITS page into the iomm_domain.
So let's drop the ENODEV check, it looks wrong.
> > + /*
> > + * Otherwise the platform has a MSI window that is not isolated. For
> > + * historical compat with VFIO allow a module parameter to ignore
> > the
> > + * insecurity.
> > + */
> > + if (!(flags &
> > IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT))
> > + return -EPERM;
>
> Throw out a warning similar to vfio.
Ah... That is kind of gross.. But OK, we can fix it later if someone
comes up with a driver that can safely use
IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT.
@@ -167,6 +167,11 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev,
*/
if (!(flags & IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT))
return -EPERM;
+ else
+ dev_warn(
+ idev->dev,
+ "Device interrupts cannot be isolated by the IOMMU, this platform in insecure. Use an \"allow_unsafe_interrupts\" module >
+
return 0;
> > + rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt,
> > idev->dev,
> > + idev->group,
> > &sw_msi_start);
> > + if (rc)
> > + goto out_iova;
>
> goto out_unlock since the function internally already called
> __iopt_remove_reserved_iova() upon error.
OK
> > + /*
> > + * There is no differentiation when domains are allocated, so any
> > domain
> > + * that is willing to attach to the device is interchangeable with any
> > + * other.
> > + */
> > + mutex_lock(&ioas->mutex);
> > + list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) {
> > + if (!hwpt->auto_domain ||
> > + !refcount_inc_not_zero(&hwpt->obj.users))
>
> users cannot be zero at this point.
>
> a new hwpt is added to the list with users==1 in attach.
>
> detach first removes the hwpt and then decrement users.
>
> Both are conducted by holding ioas->mutex.
> > + continue;
> > +
> > + rc = iommufd_device_do_attach(idev, hwpt, flags);
> > + refcount_dec(&hwpt->obj.users);
>
> with above I also wonder whether refcount_inc/dec are just
> redundant here...
Yes, it seems so (in ealier versions this was not true, but it is now):
@@ -267,12 +272,11 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
*/
mutex_lock(&ioas->mutex);
list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) {
- if (!hwpt->auto_domain ||
- !refcount_inc_not_zero(&hwpt->obj.users))
+ if (!hwpt->auto_domain)
continue;
rc = iommufd_device_do_attach(idev, hwpt, flags);
- refcount_dec(&hwpt->obj.users);
+
> > +int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id,
> > + unsigned int flags)
> > +{
> > + struct iommufd_object *pt_obj;
> > + int rc;
> > +
> > + pt_obj = iommufd_get_object(idev->ictx, *pt_id,
> > IOMMUFD_OBJ_ANY);
> > + if (IS_ERR(pt_obj))
> > + return PTR_ERR(pt_obj);
>
> Is there a reason why get_object() is not required for idev?
Caller has to hold a legal reference because caller is responsible for
the pointer.
> concurrent attach/unbind could lead to use-after-free here given users
> for idev is added only in the end of the function.
Caller bug. It is not allowed to continue to use idev while it is
destroying it by calling iommufd_device_unbind()
> > + default:
> > + rc = -EINVAL;
> > + goto out_put_pt_obj;
> > + }
> > +
> > + refcount_inc(&idev->obj.users);
> > + *pt_id = idev->hwpt->obj.id;
>
> What is the value of returning hwpt id of auto domain to user?
It allows the userspace to potentially query the auto domain, if we
decide on some query later.
> IMHO this will add more complexity to the life cycle of auto domain.
Not really, the usespace still controls the lifecylce, the hwpt id is
returned and is valid until userspace commands the device it just
bound to become unbound.
> now if allowing user to populate auto domain similar to user-created
> hwpt then detach also need get_object() to wait for concurrent
> ioctls similar to iommufd_destroy() does and more trick on destroying
> the object.
"autodomain" means the hwpt is destroyed when its device list becomes
empty. This is all protected by the ioas->mutex, so we don't need
additional refcounting or locking.
> > +void iommufd_device_detach(struct iommufd_device *idev)
> > +{
> > + struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> > +
> > + mutex_lock(&hwpt->ioas->mutex);
> > + mutex_lock(&hwpt->devices_lock);
> > + list_del(&idev->devices_item);
> > + if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> > + if (list_empty(&hwpt->devices)) {
> > + iopt_table_remove_domain(&hwpt->ioas->iopt,
> > + hwpt->domain);
> > + list_del(&hwpt->hwpt_item);
> > + }
> > + iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
>
> this is always required. not just for last device in a group.
Yep, good catch
Thanks,
Jason
next prev parent reply other threads:[~2022-11-07 17:57 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-25 18:12 [PATCH v3 00/15] IOMMUFD Generic interface Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 01/15] iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY Jason Gunthorpe
2022-10-26 12:45 ` Baolu Lu
2022-11-03 5:03 ` Tian, Kevin
2022-11-04 19:25 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 02/15] iommu: Add device-centric DMA ownership interfaces Jason Gunthorpe
2022-11-03 5:11 ` Tian, Kevin
2022-11-04 19:32 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 03/15] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-11-03 5:31 ` Tian, Kevin
2022-11-04 19:38 ` Jason Gunthorpe
2022-11-05 1:32 ` Tian, Kevin
2022-11-05 1:48 ` Matthew Wilcox
2022-11-07 14:38 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 04/15] iommufd: Overview documentation Jason Gunthorpe
2022-10-26 4:17 ` Bagas Sanjaya
2022-10-28 19:09 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 05/15] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-10-26 12:58 ` Baolu Lu
2022-10-26 17:14 ` Jason Gunthorpe
2022-10-29 3:43 ` Baolu Lu
2022-11-03 7:22 ` Tian, Kevin
2022-11-07 17:00 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 06/15] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-11-03 7:23 ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 07/15] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-11-01 19:38 ` Nicolin Chen
2022-11-02 13:13 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 08/15] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-10-31 16:01 ` [PATCH v3 8/15] " Jason Gunthorpe
2022-11-01 16:09 ` Jason Gunthorpe
2022-11-03 20:08 ` Jason Gunthorpe
2022-11-04 16:26 ` Jason Gunthorpe
2022-11-04 16:04 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 09/15] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-10-26 18:46 ` [PATCH v3 9/15] " Jason Gunthorpe
2022-10-27 11:37 ` Jason Gunthorpe
2022-10-27 13:35 ` Jason Gunthorpe
2022-10-28 18:52 ` Jason Gunthorpe
2022-11-01 19:17 ` [PATCH v3 09/15] " Nicolin Chen
2022-11-02 13:11 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 10/15] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-10-26 17:01 ` Jason Gunthorpe
2022-10-26 23:17 ` Jason Gunthorpe
2022-10-29 7:25 ` Baolu Lu
2022-11-07 14:17 ` Jason Gunthorpe
2022-11-04 8:32 ` Tian, Kevin
2022-11-07 15:02 ` Jason Gunthorpe
2022-11-08 2:05 ` Tian, Kevin
2022-11-08 17:29 ` Jason Gunthorpe
2022-11-09 2:50 ` Tian, Kevin
2022-11-09 13:05 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 11/15] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-11-04 10:00 ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 12/15] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-10-29 7:19 ` Baolu Lu
2022-11-07 14:14 ` Jason Gunthorpe
2022-11-05 7:17 ` Tian, Kevin
2022-11-07 17:54 ` Jason Gunthorpe [this message]
2022-11-08 2:17 ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 13/15] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 14/15] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-10-27 14:12 ` Jason Gunthorpe
2022-11-01 19:45 ` Nicolin Chen
2022-11-02 13:15 ` Jason Gunthorpe
2022-11-05 0:07 ` Baolu Lu
2022-11-07 14:23 ` Jason Gunthorpe
2022-11-05 9:31 ` Tian, Kevin
2022-11-07 17:08 ` Jason Gunthorpe
2022-11-07 23:53 ` Tian, Kevin
2022-11-08 0:09 ` Jason Gunthorpe
2022-11-08 0:13 ` Tian, Kevin
2022-11-08 0:17 ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 15/15] iommufd: Add a selftest Jason Gunthorpe
2022-11-01 20:32 ` Nicolin Chen
2022-11-02 13:17 ` Jason Gunthorpe
2022-11-02 18:49 ` Nathan Chancellor
2022-11-04 1:01 ` Jason Gunthorpe
2022-11-04 5:43 ` Tian, Kevin
2022-11-04 19:42 ` Jason Gunthorpe
2022-10-28 23:57 ` [PATCH v3 00/15] IOMMUFD Generic interface Nicolin Chen
2022-11-04 21:27 ` Alex Williamson
2022-11-04 22:03 ` Alex Williamson
2022-11-07 14:22 ` Jason Gunthorpe
2022-11-07 14:19 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2lGYkO9X0neQTP5@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=bpf@vger.kernel.org \
--cc=chaitanyak@nvidia.com \
--cc=cohuck@redhat.com \
--cc=corbet@lwn.net \
--cc=daniel.m.jordan@oracle.com \
--cc=david@gibson.dropbear.id.au \
--cc=dwmw2@infradead.org \
--cc=eric.auger@redhat.com \
--cc=farman@linux.ibm.com \
--cc=iommu@lists.linux.dev \
--cc=jasowang@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=joao.m.martins@oracle.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mjrosato@linux.ibm.com \
--cc=mst@redhat.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=nicolinc@nvidia.com \
--cc=ojeda@kernel.org \
--cc=robin.murphy@arm.com \
--cc=schnelle@linux.ibm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=shuah@kernel.org \
--cc=suravee.suthikulpanit@amd.com \
--cc=trix@redhat.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).