Re: [PATCH v6 07/15] iommupt: Add map_pages op

patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Joerg Roedel <joro@8bytes.org>,
	Justin Stitt <justinstitt@google.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Will Deacon <will@kernel.org>, Alexey Kardashevskiy <aik@amd.com>,
	Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	James Gowans <jgowans@amazon.com>,
	Michael Roth <michael.roth@amd.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	"patches@lists.linux.dev" <patches@lists.linux.dev>
Subject: Re: [PATCH v6 07/15] iommupt: Add map_pages op
Date: Tue, 21 Oct 2025 14:19:10 -0300	[thread overview]
Message-ID: <20251021171910.GH712833@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB5276271302BF0AAAE2C7085A8CE9A@BN9PR11MB5276.namprd11.prod.outlook.com>

On Thu, Oct 16, 2025 at 06:52:09AM +0000, Tian, Kevin wrote:
> > +static __always_inline int __do_map_single_page(struct pt_range *range,
> > +						void *arg, unsigned int level,
> > +						struct pt_table_p *table,
> > +						pt_level_fn_t descend_fn)
> > +{
> > +	struct pt_state pts = pt_init(range, level, table);
> > +	struct pt_iommu_map_args *map = arg;
> > +
> > +	pts.type = pt_load_single_entry(&pts);
> > +	if (level == 0) {
> > +		if (pts.type != PT_ENTRY_EMPTY)
> > +			return -EADDRINUSE;
> > +		pt_install_leaf_entry(&pts, map->oa, PAGE_SHIFT,
> > +				      &map->attrs);
> > +		map->oa += PAGE_SIZE;
> > +		return 0;
> > +	}
> > +	if (pts.type != PT_ENTRY_TABLE)
> > +		return -EAGAIN;
> 
> return -EADDRINUSE if PT_ENTRY_OA. No need to retry if no empty.

I wanted to keep it small this catches if the table needs to be
allocated too. This is a bit clearer:

	if (pts.type == PT_ENTRY_TABLE)
		return pt_descend(&pts, arg, descend_fn);
	/* Something else, use the slow path */
	return -EAGAIN;

> > +/**
> > + * map_pages() - Install translation for an IOVA range
> > + * @domain: Domain to manipulate
> > + * @iova: IO virtual address to start
> > + * @paddr: Physical/Output address to start
> > + * @pgsize: Length of each page
> > + * @pgcount: Length of the range in pgsize units starting from @iova
> > + * @prot: A bitmap of IOMMU_READ/WRITE/CACHE/NOEXEC/MMIO
> > + * @gfp: GFP flags for any memory allocations
> > + * @mapped: Total bytes successfully mapped
> > + *
> > + * The range starting at IOVA will have paddr installed into it. The caller
> > + * must specify a valid pgsize and pgcount to segment the range into
> > compatible
> > + * blocks.
> > + *
> > + * On error the caller will probably want to invoke unmap on the range from
> > iova
> > + * up to the amount indicated by @mapped to return the table back to an
> > + * unchanged state.
> > + *
> > + * Context: The caller must hold a write range lock that includes the whole
> > + * range.
> > + *
> > + * Returns: -ERRNO on failure, 0 on success. The number of bytes of VA that
> > were
> > + * mapped are added to @mapped, @mapped is not zerod first.
> 
> hmm seems drivers are not consistent on this, e.g. Intel and several
> others (virtio, rockchip. etc.) don't use addition. and there is no such
> guidance from iommu.h.

Yes. At least ARM64 does addition though.

> Existing callers e.g. iommu_map_nosync() initializes mmaped to 0,
> so this difference doesn't lead to observable problem so far. 

Yep

> But it may be good to unify drivers while at it.

As it doesn't cause any issue, I'm just going to leave it.. The new
implementation will consistently use the addition mode like ARM did
and that is convenient for the kunit as well.

> > +	/* Calculate target page size and level for the leaves */
> > +	if (pt_has_system_page_size(common) && pgsize == PAGE_SIZE &&
> > +	    pgcount == 1) {
> > +		PT_WARN_ON(!(pgsize_bitmap & PAGE_SIZE));
> > +		if (log2_mod(iova | paddr, PAGE_SHIFT))
> > +			return -ENXIO;
> > +		map.leaf_pgsize_lg2 = PAGE_SHIFT;
> > +		map.leaf_level = 0;
> > +		single_page = true;
> > +	} else {
> > +		map.leaf_pgsize_lg2 = pt_compute_best_pgsize(
> > +			pgsize_bitmap, range.va, range.last_va, paddr);
> > +		if (!map.leaf_pgsize_lg2)
> > +			return -ENXIO;
> > +		map.leaf_level =
> > +			pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
> > +	}
> > +
> > +	ret = check_map_range(iommu_table, &range, &map);
> > +	if (ret)
> > +		return ret;
> 
> single_page should aways pass the check then could skip it.

Even single_page needs this check, it checks if iova is with in range.

> > +	PT_WARN_ON(map.leaf_level > range.top_level);
> > +
> > +	do {
> > +		if (single_page) {
> > +			ret = pt_walk_range(&range, __map_single_page,
> > &map);
> > +			if (ret != -EAGAIN)
> > +				break;
> > +		}
> > +
> > +		if (map.leaf_level == range.top_level)
> > +			ret = pt_walk_range(&range, __map_range_leaf,
> > &map);
> > +		else
> > +			ret = pt_walk_range(&range, __map_range, &map);
> > +	} while (false);
> 
> what is the point of do...while(false) compared to 'goto'?

Didn't want to use goto for something other than an error exit

Let's put it in a function:

static int do_map(struct pt_range *range, bool single_page,
		  struct pt_iommu_map_args *map)
{
	if (single_page) {
		int ret;

		ret = pt_walk_range(range, __map_single_page, map);
		if (ret != -EAGAIN)
			return ret;
		/* EAGAIN falls through to the full path */
	}

	if (map->leaf_level == range->top_level)
		return pt_walk_range(range, __map_range_leaf, map);
	return pt_walk_range(range, __map_range, map);
}

> > +/**
> > + * struct pt_iommu_flush_ops - HW IOTLB cache flushing operations
> > + *
> > + * The IOMMU driver should implement these using
> > container_of(iommu_table) to
> > + * get to it's iommu_domain derived structure. All ops can be called in
> > atomic
> > + * contexts as they are buried under DMA API calls.
> > + */
> > +struct pt_iommu_flush_ops {
> 
> not sure it's appropriate to call it flush_ops. It's not purely about flush,
> e.g. @change_top requires the driver to actually change the top.

Hmm, at one point it did have flush related calls but now it is just
about change top..

I'll change it to pt_iommu_driver_ops

> > +	 * HW references to this domain with a new top address and
> > +	 * configuration. On return mappings placed in the new top must be
> > +	 * reachable by the HW.
> > +	 *
> > +	 * top_level encodes the level in IOMMU PT format, level 0 is the
> > +	 * smallest page size increasing from there. This has to be translated
> > +	 * to any HW specific format. During this call the new top will not be
> > +	 * visible to any other API.
> > +	 *
> > +	 * This op is only used by PT_FEAT_DYNAMIC_TOP, and is required if
> > +	 * enabled.
> > +	 */
> > +	void (*change_top)(struct pt_iommu *iommu_table, phys_addr_t
> > top_paddr,
> > +			   unsigned int top_level);
> > +
> > +	/**
> > +	 * @get_top_lock: Return a lock to hold when changing the table top
> > +	 * @iommu_table: Table to operate on
> > +	 *
> 
> blank line


	/**
	 * @get_top_lock: lock to hold when changing the table top
	 * @iommu_table: Table to operate on
	 *
	 * Return a lock to hold when changing the table top page table from
	 * being stored in HW. The lock will be held prior to calling
	 * change_top() and released once the top is fully visible.
	 *
	 * Typically this would be a lock that protects the iommu_domain's
	 * attachment list.
	 *
	 * This op is only used by PT_FEAT_DYNAMIC_TOP, and is required if
	 * enabled.
	 */

Thanks,
Jason

next prev parent reply	other threads:[~2025-10-21 17:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07 16:11 [PATCH v6 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-10-08 14:37   ` Jason Gunthorpe
2025-10-15  2:03   ` Tian, Kevin
2025-10-21 16:46     ` Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-10-15  7:25   ` Tian, Kevin
2025-10-21 14:07     ` Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-10-16  6:52   ` Tian, Kevin
2025-10-21 17:19     ` Jason Gunthorpe [this message]
2025-10-07 16:11 ` [PATCH v6 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-10-16  6:54   ` Tian, Kevin
2025-10-07 16:11 ` [PATCH v6 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-10-16  6:54   ` Tian, Kevin
2025-10-07 16:11 ` [PATCH v6 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-10-16  6:55   ` Tian, Kevin
2025-10-21 14:09     ` Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-10-07 16:11 ` [PATCH v6 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-10-07 16:12 ` [PATCH v6 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe
2025-10-16  6:55   ` Tian, Kevin
2025-10-07 17:44 ` [PATCH v6 00/15] Consolidate iommu page table implementations (AMD) Pasha Tatashin
2025-10-07 17:54   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251021171910.GH712833@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aik@amd.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=corbet@lwn.net \
    --cc=iommu@lists.linux.dev \
    --cc=jgowans@amazon.com \
    --cc=joro@8bytes.org \
    --cc=justinstitt@google.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=michael.roth@amd.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patches@lists.linux.dev \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).