patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Vasant Hegde <vasant.hegde@amd.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Jonathan Corbet <corbet@lwn.net>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	Justin Stitt <justinstitt@google.com>,
	Kevin Tian <kevin.tian@intel.com>,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	llvm@lists.linux.dev, Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Will Deacon <will@kernel.org>
Cc: Alexey Kardashevskiy <aik@amd.com>,
	Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	James Gowans <jgowans@amazon.com>,
	Michael Roth <michael.roth@amd.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	patches@lists.linux.dev
Subject: Re: [PATCH v5 07/15] iommupt: Add map_pages op
Date: Tue, 7 Oct 2025 17:38:48 +0530	[thread overview]
Message-ID: <b9b18a03-63a2-4065-a27e-d92dd5c860bc@amd.com> (raw)
In-Reply-To: <7-v5-116c4948af3d+68091-iommu_pt_jgg@nvidia.com>

Jason,

On 9/3/2025 11:16 PM, Jason Gunthorpe wrote:
> map is slightly complicated because it has to handle a number of special
> edge cases:
>  - Overmapping a previously shared table with an OA - requries validating
>    and freeing the possibly empty tables
>  - Doing the above across an entire to-be-created contiguous entry
>  - Installing a new shared table level concurrently with another thread
>  - Expanding the table by adding more top levels
> 
> Table expansion is a unique feature of AMDv1, this version is quite
> similar except we handle racing concurrent lockless map. The table top
> pointer and starting level are encoded in a single uintptr_t which ensures
> we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use
> that fixed point as its starting point. Concurrent expansion is handled
> with a table global spinlock.
> 
> When inserting a new table entry map checks that the entire portion of the
> table is empty. This includes freeing any empty lower tables that will be
> overwritten by an OA. A separate free list is used while checking and
> collecting all the empty lower tables so that writing the new entry is
> uninterrupted, either the new entry fully writes or nothing changes.
> 
> A special fast path for PAGE_SIZE is implemented that does a direct walk
> to the leaf level and installs a single entry. This gives ~15% improvement
> for iommu_map() when mapping lists of single pages.
> 
> This version sits under the iommu_domain_ops as map_pages() but does not
> require the external page size calculation. The implementation is actually
> map_range() and can do arbitrary ranges, internally handling all the
> validation and supporting any arrangment of page sizes. A future series
> can optimize iommu_map() to take advantage of this.
> 
> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/generic_pt/iommu_pt.h | 481 ++++++++++++++++++++++++++++
>  include/linux/generic_pt/iommu.h    |  58 ++++
>  2 files changed, 539 insertions(+)
> 

.../...

> +static int __map_range_leaf(struct pt_range *range, void *arg,
> +			    unsigned int level, struct pt_table_p *table)
> +{
> +	struct pt_state pts = pt_init(range, level, table);
> +	struct pt_iommu_map_args *map = arg;
> +	unsigned int leaf_pgsize_lg2 = map->leaf_pgsize_lg2;
> +	unsigned int start_index;
> +	pt_oaddr_t oa = map->oa;
> +	unsigned int step;
> +	bool need_contig;
> +	int ret = 0;
> +
> +	PT_WARN_ON(map->leaf_level != level);
> +	PT_WARN_ON(!pt_can_have_leaf(&pts));
> +
> +	step = log2_to_int_t(unsigned int,
> +			     leaf_pgsize_lg2 - pt_table_item_lg2sz(&pts));
> +	need_contig = leaf_pgsize_lg2 != pt_table_item_lg2sz(&pts);
> +
> +	_pt_iter_first(&pts);
> +	start_index = pts.index;
> +	do {
> +		pts.type = pt_load_entry_raw(&pts);
> +		if (pts.type != PT_ENTRY_EMPTY || need_contig) {
> +			if (pts.index != start_index)
> +				pt_index_to_va(&pts);
> +			ret = clear_contig(&pts, map->iotlb_gather, step,
> +					   leaf_pgsize_lg2);
> +			if (ret)
> +				break;
> +		}
> +
> +		PT_WARN_ON(compute_best_pgsize(&pts, oa) != leaf_pgsize_lg2);


If I select CONFIG_DEBUG_GENERIC_PT=y and boot AMD system with V1 (Host page
table), in some cases we hit this warning. Code path looks ok. may be silence
these warning?


[   31.985383] pt_iommu_amdv1_map_pages : oa 0x208b95d000 va 0xfef80000 last_va
0xfef9ffff pgsz_lg 0xc pgsize 0x1000 pgcount 0x20
[   31.985384] __map_range_leaf oa 0x208b95e000 va 0xfef80000 last_va 0xfef9ffff
pgsize 0xd leaf_pgsize 0xc possible_sz 0x1ff000
[   31.985391] ------------[ cut here ]------------
[   31.985392] WARNING: CPU: 359 PID: 2540 at
drivers/iommu/generic_pt/fmt/../iommu_pt.h:493 __map_range_leaf+0x636/0x860
[   31.985399] Modules linked in:
[   31.985402] CPU: 359 UID: 0 PID: 2540 Comm: systemd-udevd Not tainted
6.17.0-rc3-genricpt+ #444 VOLUNTARY
[   31.985405] Hardware name: AMD Corporation Titanite_4G/Titanite_4G, BIOS
RTI100EB 12/05/2024
[   31.985406] RIP: 0010:__map_range_leaf+0x636/0x860
[   31.985409] Code: 49 89 6e 18 48 8b 54 24 58 65 48 2b 15 6b 4d b8 01 0f 85 2a
02 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f e9 55 2e 67 ff <0f> 0b e9 07
fe ff ff 0f b6 48 21 e9 e5 fb ff ff 48 8b 7c 24 18 44
[   31.985411] RSP: 0018:ff78b42ad7063558 EFLAGS: 00010297
[   31.985413] RAX: 0000000000000000 RBX: ff453e2c423cdc08 RCX: 000000000000000d
[   31.985414] RDX: 0000000000000000 RSI: 0000000000002000 RDI: ffffff7fffffffff
[   31.985415] RBP: 000000208b95e000 R08: 00000000fef9ffff R09: 00000000fffeffff
[   31.985416] R10: 000000000000000c R11: ff453e6b4c696000 R12: 0000000000003000
[   31.985417] R13: ff78b42ad7063770 R14: ff78b42ad7063748 R15: 000000000000000c
[   31.985418] FS:  00007f46c7e888c0(0000) GS:ff453e6aabbc2000(0000)
knlGS:0000000000000000
[   31.985420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   31.985421] CR2: 00007f46c7e03000 CR3: 0000000141f6b002 CR4: 0000000000771ef0
[   31.985422] PKRU: 55555554
[   31.985423] Call Trace:
[   31.985424]  <TASK>
[   31.985426]  __map_range+0x399/0x5a0
[   31.985429]  ? down_trylock+0x20/0x30
[   31.985434]  __map_range+0x1af/0x5a0
[   31.985436]  ? _printk+0x52/0x70
[   31.985441]  pt_iommu_amdv1_map_pages+0x6e6/0xca0
[   31.985444]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985448]  ? iommu_map_nosync+0x129/0x230
[   31.985451]  iommu_map_nosync+0x129/0x230
[   31.985454]  blk_rq_dma_map_iter_start+0x186/0x1c0
[   31.985458]  nvme_prep_rq+0x4ff/0x8b0
[   31.985461]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985463]  nvme_queue_rqs+0xc0/0x1d0
[   31.985466]  blk_mq_dispatch_queue_requests+0xf2/0x140
[   31.985469]  blk_mq_flush_plug_list+0x71/0x170
[   31.985472]  __blk_flush_plug+0xcc/0x120
[   31.985476]  blk_finish_plug+0x1f/0x30
[   31.985478]  read_pages+0x1a8/0x260
[   31.985483]  ? filemap_add_folio+0xae/0xd0
[   31.985485]  page_cache_ra_unbounded+0x174/0x230
[   31.985488]  force_page_cache_ra+0x89/0xb0
[   31.985491]  filemap_get_pages+0x12a/0x720
[   31.985494]  filemap_read+0xda/0x3e0
[   31.985497]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985499]  ? alloc_pages_mpol+0x76/0x140
[   31.985502]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985504]  ? mod_memcg_lruvec_state+0x96/0x1a0
[   31.985507]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985509]  ? __lruvec_stat_mod_folio+0x6d/0xa0
[   31.985511]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985512]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985514]  ? set_ptes.constprop.0+0x36/0x80
[   31.985517]  ? srso_alias_return_thunk+0x5/0xfbef5
[   31.985519]  ? __handle_mm_fault+0xa2c/0x14d0
[   31.985522]  blkdev_read_iter+0x6f/0x140
[   31.985525]  vfs_read+0x207/0x330
[   31.985528]  ksys_read+0x5c/0xd0
[   31.985530]  do_syscall_64+0x50/0x1e0
[   31.985533]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   31.985535] RIP: 0033:0x7f46c8576852
[   31.985537] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 1a b4 0c 00 e8 a5 1d 02 00 0f
1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0
ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[   31.985538] RSP: 002b:00007ffc06f9c638 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[   31.985540] RAX: ffffffffffffffda RBX: 00007f46c7e02028 RCX: 00007f46c8576852
[   31.985541] RDX: 0000000000040000 RSI: 00007f46c7e02038 RDI: 000000000000000c
[   31.985542] RBP: 0000555f80925280 R08: 00007f46c7e02010 R09: 00007f46c7e02010
[   31.985543] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
[   31.985544] R13: 0000000000040000 R14: 00007f46c7e02010 R15: 0000555f809252d0
[   31.985546]  </TASK>
[   31.985547] ---[ end trace 0000000000000000 ]---


-Vasant



  parent reply	other threads:[~2025-10-07 12:09 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-03 17:46 [PATCH v5 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-09-10  3:40   ` Nicolin Chen
2025-09-15 15:51     ` Jason Gunthorpe
2025-09-18  7:14       ` Nicolin Chen
2025-09-18 14:49         ` Jason Gunthorpe
2025-09-18 19:43           ` Nicolin Chen
2025-09-18  6:49   ` Tian, Kevin
2025-09-18 18:06     ` Jason Gunthorpe
2025-09-19  8:11       ` Tian, Kevin
2025-09-19 14:31         ` Jason Gunthorpe
2025-09-24  9:20           ` Tian, Kevin
2025-09-22 14:45   ` [External] : " ALOK TIWARI
2025-09-22 17:05     ` Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-09-11  4:23   ` Nicolin Chen
2025-09-15 15:42     ` Jason Gunthorpe
2025-09-18  6:55   ` Tian, Kevin
2025-09-19 14:42     ` Jason Gunthorpe
2025-09-24  9:21       ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-09-11  5:38   ` Nicolin Chen
2025-09-15 15:36     ` Jason Gunthorpe
2025-09-18  6:58   ` Tian, Kevin
2025-09-19 15:26     ` Jason Gunthorpe
2025-09-24  9:22       ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-09-18  7:05   ` Tian, Kevin
2025-09-19 18:19     ` Jason Gunthorpe
2025-09-24  9:23       ` Tian, Kevin
2025-10-07 12:28     ` Jason Gunthorpe
2025-10-08  9:43   ` Vasant Hegde
2025-10-08 13:08     ` Jason Gunthorpe
2025-10-09 11:44       ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-09-18  7:08   ` Tian, Kevin
2025-09-19 18:35     ` Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-09-24  9:28   ` Tian, Kevin
2025-09-24 12:23     ` Jason Gunthorpe
2025-09-26  7:23       ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-09-26  7:47   ` Tian, Kevin
2025-09-29 16:44     ` Jason Gunthorpe
2025-10-07 12:08   ` Vasant Hegde [this message]
2025-10-07 13:11     ` Jason Gunthorpe
2025-10-08  9:52       ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-09-26  7:48   ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-09-26  7:50   ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-09-26  7:57   ` Tian, Kevin
2025-09-29 16:17     ` Jason Gunthorpe
2025-10-08 10:05   ` Vasant Hegde
2025-10-08 13:03     ` Jason Gunthorpe
2025-10-09 11:43       ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-09-25 12:07   ` Ankit Soni
2025-09-25 12:32     ` Jason Gunthorpe
2025-09-25 12:39       ` Ankit Soni
2025-10-08  9:47   ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9b18a03-63a2-4065-a27e-d92dd5c860bc@amd.com \
    --to=vasant.hegde@amd.com \
    --cc=aik@amd.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=corbet@lwn.net \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=joro@8bytes.org \
    --cc=justinstitt@google.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=michael.roth@amd.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patches@lists.linux.dev \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).