From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
Cornelia Huck <cohuck@redhat.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
David Gibson <david@gibson.dropbear.id.au>,
Eric Auger <eric.auger@redhat.com>,
iommu@lists.linux-foundation.org,
Jason Wang <jasowang@redhat.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Joao Martins <joao.m.martins@oracle.com>,
Kevin Tian <kevin.tian@intel.com>,
kvm@vger.kernel.org, Matthew Rosato <mjrosato@linux.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
Yi Liu <yi.l.liu@intel.com>, Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH RFC 07/12] iommufd: Data structure to provide IOVA to PFN mapping
Date: Tue, 22 Mar 2022 16:15:44 -0600 [thread overview]
Message-ID: <20220322161544.54fd459d.alex.williamson@redhat.com> (raw)
In-Reply-To: <7-v1-e79cd8d168e8+6-iommufd_jgg@nvidia.com>
On Fri, 18 Mar 2022 14:27:32 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> +/*
> + * The area takes a slice of the pages from start_bytes to start_byte + length
> + */
> +static struct iopt_area *
> +iopt_alloc_area(struct io_pagetable *iopt, struct iopt_pages *pages,
> + unsigned long iova, unsigned long start_byte,
> + unsigned long length, int iommu_prot, unsigned int flags)
> +{
> + struct iopt_area *area;
> + int rc;
> +
> + area = kzalloc(sizeof(*area), GFP_KERNEL);
> + if (!area)
> + return ERR_PTR(-ENOMEM);
> +
> + area->iopt = iopt;
> + area->iommu_prot = iommu_prot;
> + area->page_offset = start_byte % PAGE_SIZE;
> + area->pages_node.start = start_byte / PAGE_SIZE;
> + if (check_add_overflow(start_byte, length - 1, &area->pages_node.last))
> + return ERR_PTR(-EOVERFLOW);
> + area->pages_node.last = area->pages_node.last / PAGE_SIZE;
> + if (WARN_ON(area->pages_node.last >= pages->npages))
> + return ERR_PTR(-EOVERFLOW);
@area leaked in the above two error cases.
> +
> + down_write(&iopt->iova_rwsem);
> + if (flags & IOPT_ALLOC_IOVA) {
> + rc = iopt_alloc_iova(iopt, &iova,
> + (uintptr_t)pages->uptr + start_byte,
> + length);
> + if (rc)
> + goto out_unlock;
> + }
> +
> + if (check_add_overflow(iova, length - 1, &area->node.last)) {
> + rc = -EOVERFLOW;
> + goto out_unlock;
> + }
> +
> + if (!(flags & IOPT_ALLOC_IOVA)) {
> + if ((iova & (iopt->iova_alignment - 1)) ||
> + (length & (iopt->iova_alignment - 1)) || !length) {
> + rc = -EINVAL;
> + goto out_unlock;
> + }
> +
> + /* No reserved IOVA intersects the range */
> + if (interval_tree_iter_first(&iopt->reserved_iova_itree, iova,
> + area->node.last)) {
> + rc = -ENOENT;
> + goto out_unlock;
> + }
> +
> + /* Check that there is not already a mapping in the range */
> + if (iopt_area_iter_first(iopt, iova, area->node.last)) {
> + rc = -EADDRINUSE;
> + goto out_unlock;
> + }
> + }
> +
> + /*
> + * The area is inserted with a NULL pages indicating it is not fully
> + * initialized yet.
> + */
> + area->node.start = iova;
> + interval_tree_insert(&area->node, &area->iopt->area_itree);
> + up_write(&iopt->iova_rwsem);
> + return area;
> +
> +out_unlock:
> + up_write(&iopt->iova_rwsem);
> + kfree(area);
> + return ERR_PTR(rc);
> +}
...
> +/**
> + * iopt_access_pages() - Return a list of pages under the iova
> + * @iopt: io_pagetable to act on
> + * @iova: Starting IOVA
> + * @length: Number of bytes to access
> + * @out_pages: Output page list
> + * @write: True if access is for writing
> + *
> + * Reads @npages starting at iova and returns the struct page * pointers. These
> + * can be kmap'd by the caller for CPU access.
> + *
> + * The caller must perform iopt_unaccess_pages() when done to balance this.
> + *
> + * iova can be unaligned from PAGE_SIZE. The first returned byte starts at
> + * page_to_phys(out_pages[0]) + (iova % PAGE_SIZE). The caller promises not to
> + * touch memory outside the requested iova slice.
> + *
> + * FIXME: callers that need a DMA mapping via a sgl should create another
> + * interface to build the SGL efficiently
> + */
> +int iopt_access_pages(struct io_pagetable *iopt, unsigned long iova,
> + unsigned long length, struct page **out_pages, bool write)
> +{
> + unsigned long cur_iova = iova;
> + unsigned long last_iova;
> + struct iopt_area *area;
> + int rc;
> +
> + if (!length)
> + return -EINVAL;
> + if (check_add_overflow(iova, length - 1, &last_iova))
> + return -EOVERFLOW;
> +
> + down_read(&iopt->iova_rwsem);
> + for (area = iopt_area_iter_first(iopt, iova, last_iova); area;
> + area = iopt_area_iter_next(area, iova, last_iova)) {
> + unsigned long last = min(last_iova, iopt_area_last_iova(area));
> + unsigned long last_index;
> + unsigned long index;
> +
> + /* Need contiguous areas in the access */
> + if (iopt_area_iova(area) < cur_iova || !area->pages) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Should this be (cur_iova != iova && iopt_area_iova(area) < cur_iova)?
I can't see how we'd require in-kernel page users to know the iopt_area
alignment from userspace, so I think this needs to skip the first
iteration. Thanks,
Alex
> + rc = -EINVAL;
> + goto out_remove;
> + }
> +
> + index = iopt_area_iova_to_index(area, cur_iova);
> + last_index = iopt_area_iova_to_index(area, last);
> + rc = iopt_pages_add_user(area->pages, index, last_index,
> + out_pages, write);
> + if (rc)
> + goto out_remove;
> + if (last == last_iova)
> + break;
> + /*
> + * Can't cross areas that are not aligned to the system page
> + * size with this API.
> + */
> + if (cur_iova % PAGE_SIZE) {
> + rc = -EINVAL;
> + goto out_remove;
> + }
> + cur_iova = last + 1;
> + out_pages += last_index - index;
> + atomic_inc(&area->num_users);
> + }
> +
> + up_read(&iopt->iova_rwsem);
> + return 0;
> +
> +out_remove:
> + if (cur_iova != iova)
> + iopt_unaccess_pages(iopt, iova, cur_iova - iova);
> + up_read(&iopt->iova_rwsem);
> + return rc;
> +}
next prev parent reply other threads:[~2022-03-22 22:16 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-18 17:27 [PATCH RFC 00/12] IOMMUFD Generic interface Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 01/12] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 02/12] iommufd: Overview documentation Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 03/12] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-03-22 14:18 ` Niklas Schnelle
2022-03-22 14:50 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-03-22 14:28 ` Niklas Schnelle
2022-03-22 14:57 ` Jason Gunthorpe
2022-03-22 15:29 ` Alex Williamson
2022-03-22 16:15 ` Jason Gunthorpe
2022-03-24 2:11 ` Tian, Kevin
2022-03-24 2:27 ` Jason Wang
2022-03-24 2:42 ` Tian, Kevin
2022-03-24 2:57 ` Jason Wang
2022-03-24 3:15 ` Tian, Kevin
2022-03-24 3:50 ` Jason Wang
2022-03-24 4:29 ` Tian, Kevin
2022-03-24 11:46 ` Jason Gunthorpe
2022-03-28 1:53 ` Jason Wang
2022-03-28 12:22 ` Jason Gunthorpe
2022-03-29 4:59 ` Jason Wang
2022-03-29 11:46 ` Jason Gunthorpe
2022-03-28 13:14 ` Sean Mooney
2022-03-28 14:27 ` Jason Gunthorpe
2022-03-24 20:40 ` Alex Williamson
2022-03-24 22:27 ` Jason Gunthorpe
2022-03-24 22:41 ` Alex Williamson
2022-03-22 16:31 ` Niklas Schnelle
2022-03-22 16:41 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 05/12] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-03-23 15:37 ` Niklas Schnelle
2022-03-23 16:09 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 06/12] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 07/12] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-03-22 22:15 ` Alex Williamson [this message]
2022-03-23 18:15 ` Jason Gunthorpe
2022-03-24 3:09 ` Tian, Kevin
2022-03-24 12:46 ` Jason Gunthorpe
2022-03-25 13:34 ` zhangfei.gao
2022-03-25 17:19 ` Jason Gunthorpe
2022-04-13 14:02 ` Yi Liu
2022-04-13 14:36 ` Jason Gunthorpe
2022-04-13 14:49 ` Yi Liu
2022-04-17 14:56 ` Yi Liu
2022-04-18 10:47 ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-03-23 19:10 ` Alex Williamson
2022-03-23 19:34 ` Jason Gunthorpe
2022-03-23 20:04 ` Alex Williamson
2022-03-23 20:34 ` Jason Gunthorpe
2022-03-23 22:54 ` Jason Gunthorpe
2022-03-24 7:25 ` Tian, Kevin
2022-03-24 13:46 ` Jason Gunthorpe
2022-03-25 2:15 ` Tian, Kevin
2022-03-27 2:32 ` Tian, Kevin
2022-03-27 14:28 ` Jason Gunthorpe
2022-03-28 17:17 ` Alex Williamson
2022-03-28 18:57 ` Jason Gunthorpe
2022-03-28 19:47 ` Jason Gunthorpe
2022-03-28 21:26 ` Alex Williamson
2022-03-24 6:46 ` Tian, Kevin
2022-03-30 13:35 ` Yi Liu
2022-03-31 12:59 ` Jason Gunthorpe
2022-04-01 13:30 ` Yi Liu
2022-03-31 4:36 ` David Gibson
2022-03-31 5:41 ` Tian, Kevin
2022-03-31 12:58 ` Jason Gunthorpe
2022-04-28 5:58 ` David Gibson
2022-04-28 14:22 ` Jason Gunthorpe
2022-04-29 6:00 ` David Gibson
2022-04-29 12:54 ` Jason Gunthorpe
2022-04-30 14:44 ` David Gibson
2022-03-18 17:27 ` [PATCH RFC 09/12] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 10/12] iommufd: Add kAPI toward external drivers Jason Gunthorpe
2022-03-23 18:10 ` Alex Williamson
2022-03-23 18:15 ` Jason Gunthorpe
2022-05-11 12:54 ` Yi Liu
2022-05-19 9:45 ` Yi Liu
2022-05-19 12:35 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-03-23 22:51 ` Alex Williamson
2022-03-24 0:33 ` Jason Gunthorpe
2022-03-24 8:13 ` Eric Auger
2022-03-24 22:04 ` Alex Williamson
2022-03-24 23:11 ` Jason Gunthorpe
2022-03-25 3:10 ` Tian, Kevin
2022-03-25 11:24 ` Joao Martins
2022-04-28 14:53 ` David Gibson
2022-04-28 15:10 ` Jason Gunthorpe
2022-04-29 1:21 ` Tian, Kevin
2022-04-29 6:22 ` David Gibson
2022-04-29 12:50 ` Jason Gunthorpe
2022-05-02 4:10 ` David Gibson
2022-04-29 6:20 ` David Gibson
2022-04-29 12:48 ` Jason Gunthorpe
2022-05-02 7:30 ` David Gibson
2022-05-05 19:07 ` Jason Gunthorpe
2022-05-06 5:25 ` David Gibson
2022-05-06 10:42 ` Tian, Kevin
2022-05-09 3:36 ` David Gibson
2022-05-06 12:48 ` Jason Gunthorpe
2022-05-09 6:01 ` David Gibson
2022-05-09 14:00 ` Jason Gunthorpe
2022-05-10 7:12 ` David Gibson
2022-05-10 19:00 ` Jason Gunthorpe
2022-05-11 3:15 ` Tian, Kevin
2022-05-11 16:32 ` Jason Gunthorpe
2022-05-11 23:23 ` Tian, Kevin
2022-05-13 4:35 ` David Gibson
2022-05-11 4:40 ` David Gibson
2022-05-11 2:46 ` Tian, Kevin
2022-05-23 6:02 ` Alexey Kardashevskiy
2022-05-24 13:25 ` Jason Gunthorpe
2022-05-25 1:39 ` David Gibson
2022-05-25 2:09 ` Alexey Kardashevskiy
2022-03-29 9:17 ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 12/12] iommufd: Add a selftest Jason Gunthorpe
2022-04-12 20:13 ` [PATCH RFC 00/12] IOMMUFD Generic interface Eric Auger
2022-04-12 20:22 ` Jason Gunthorpe
2022-04-12 20:50 ` Eric Auger
2022-04-14 10:56 ` Yi Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220322161544.54fd459d.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=chaitanyak@nvidia.com \
--cc=cohuck@redhat.com \
--cc=daniel.m.jordan@oracle.com \
--cc=david@gibson.dropbear.id.au \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jasowang@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=schnelle@linux.ibm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=yi.l.liu@intel.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).