From: Vasant Hegde <vasant.hegde@amd.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
Jonathan Corbet <corbet@lwn.net>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
Justin Stitt <justinstitt@google.com>,
Kevin Tian <kevin.tian@intel.com>,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
llvm@lists.linux.dev, Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
Miguel Ojeda <ojeda@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Shuah Khan <shuah@kernel.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Will Deacon <will@kernel.org>
Cc: Alexey Kardashevskiy <aik@amd.com>,
Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
James Gowans <jgowans@amazon.com>,
Michael Roth <michael.roth@amd.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
patches@lists.linux.dev
Subject: Re: [PATCH v5 07/15] iommupt: Add map_pages op
Date: Tue, 7 Oct 2025 17:38:48 +0530 [thread overview]
Message-ID: <b9b18a03-63a2-4065-a27e-d92dd5c860bc@amd.com> (raw)
In-Reply-To: <7-v5-116c4948af3d+68091-iommu_pt_jgg@nvidia.com>
Jason,
On 9/3/2025 11:16 PM, Jason Gunthorpe wrote:
> map is slightly complicated because it has to handle a number of special
> edge cases:
> - Overmapping a previously shared table with an OA - requries validating
> and freeing the possibly empty tables
> - Doing the above across an entire to-be-created contiguous entry
> - Installing a new shared table level concurrently with another thread
> - Expanding the table by adding more top levels
>
> Table expansion is a unique feature of AMDv1, this version is quite
> similar except we handle racing concurrent lockless map. The table top
> pointer and starting level are encoded in a single uintptr_t which ensures
> we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use
> that fixed point as its starting point. Concurrent expansion is handled
> with a table global spinlock.
>
> When inserting a new table entry map checks that the entire portion of the
> table is empty. This includes freeing any empty lower tables that will be
> overwritten by an OA. A separate free list is used while checking and
> collecting all the empty lower tables so that writing the new entry is
> uninterrupted, either the new entry fully writes or nothing changes.
>
> A special fast path for PAGE_SIZE is implemented that does a direct walk
> to the leaf level and installs a single entry. This gives ~15% improvement
> for iommu_map() when mapping lists of single pages.
>
> This version sits under the iommu_domain_ops as map_pages() but does not
> require the external page size calculation. The implementation is actually
> map_range() and can do arbitrary ranges, internally handling all the
> validation and supporting any arrangment of page sizes. A future series
> can optimize iommu_map() to take advantage of this.
>
> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/generic_pt/iommu_pt.h | 481 ++++++++++++++++++++++++++++
> include/linux/generic_pt/iommu.h | 58 ++++
> 2 files changed, 539 insertions(+)
>
.../...
> +static int __map_range_leaf(struct pt_range *range, void *arg,
> + unsigned int level, struct pt_table_p *table)
> +{
> + struct pt_state pts = pt_init(range, level, table);
> + struct pt_iommu_map_args *map = arg;
> + unsigned int leaf_pgsize_lg2 = map->leaf_pgsize_lg2;
> + unsigned int start_index;
> + pt_oaddr_t oa = map->oa;
> + unsigned int step;
> + bool need_contig;
> + int ret = 0;
> +
> + PT_WARN_ON(map->leaf_level != level);
> + PT_WARN_ON(!pt_can_have_leaf(&pts));
> +
> + step = log2_to_int_t(unsigned int,
> + leaf_pgsize_lg2 - pt_table_item_lg2sz(&pts));
> + need_contig = leaf_pgsize_lg2 != pt_table_item_lg2sz(&pts);
> +
> + _pt_iter_first(&pts);
> + start_index = pts.index;
> + do {
> + pts.type = pt_load_entry_raw(&pts);
> + if (pts.type != PT_ENTRY_EMPTY || need_contig) {
> + if (pts.index != start_index)
> + pt_index_to_va(&pts);
> + ret = clear_contig(&pts, map->iotlb_gather, step,
> + leaf_pgsize_lg2);
> + if (ret)
> + break;
> + }
> +
> + PT_WARN_ON(compute_best_pgsize(&pts, oa) != leaf_pgsize_lg2);
If I select CONFIG_DEBUG_GENERIC_PT=y and boot AMD system with V1 (Host page
table), in some cases we hit this warning. Code path looks ok. may be silence
these warning?
[ 31.985383] pt_iommu_amdv1_map_pages : oa 0x208b95d000 va 0xfef80000 last_va
0xfef9ffff pgsz_lg 0xc pgsize 0x1000 pgcount 0x20
[ 31.985384] __map_range_leaf oa 0x208b95e000 va 0xfef80000 last_va 0xfef9ffff
pgsize 0xd leaf_pgsize 0xc possible_sz 0x1ff000
[ 31.985391] ------------[ cut here ]------------
[ 31.985392] WARNING: CPU: 359 PID: 2540 at
drivers/iommu/generic_pt/fmt/../iommu_pt.h:493 __map_range_leaf+0x636/0x860
[ 31.985399] Modules linked in:
[ 31.985402] CPU: 359 UID: 0 PID: 2540 Comm: systemd-udevd Not tainted
6.17.0-rc3-genricpt+ #444 VOLUNTARY
[ 31.985405] Hardware name: AMD Corporation Titanite_4G/Titanite_4G, BIOS
RTI100EB 12/05/2024
[ 31.985406] RIP: 0010:__map_range_leaf+0x636/0x860
[ 31.985409] Code: 49 89 6e 18 48 8b 54 24 58 65 48 2b 15 6b 4d b8 01 0f 85 2a
02 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f e9 55 2e 67 ff <0f> 0b e9 07
fe ff ff 0f b6 48 21 e9 e5 fb ff ff 48 8b 7c 24 18 44
[ 31.985411] RSP: 0018:ff78b42ad7063558 EFLAGS: 00010297
[ 31.985413] RAX: 0000000000000000 RBX: ff453e2c423cdc08 RCX: 000000000000000d
[ 31.985414] RDX: 0000000000000000 RSI: 0000000000002000 RDI: ffffff7fffffffff
[ 31.985415] RBP: 000000208b95e000 R08: 00000000fef9ffff R09: 00000000fffeffff
[ 31.985416] R10: 000000000000000c R11: ff453e6b4c696000 R12: 0000000000003000
[ 31.985417] R13: ff78b42ad7063770 R14: ff78b42ad7063748 R15: 000000000000000c
[ 31.985418] FS: 00007f46c7e888c0(0000) GS:ff453e6aabbc2000(0000)
knlGS:0000000000000000
[ 31.985420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.985421] CR2: 00007f46c7e03000 CR3: 0000000141f6b002 CR4: 0000000000771ef0
[ 31.985422] PKRU: 55555554
[ 31.985423] Call Trace:
[ 31.985424] <TASK>
[ 31.985426] __map_range+0x399/0x5a0
[ 31.985429] ? down_trylock+0x20/0x30
[ 31.985434] __map_range+0x1af/0x5a0
[ 31.985436] ? _printk+0x52/0x70
[ 31.985441] pt_iommu_amdv1_map_pages+0x6e6/0xca0
[ 31.985444] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985448] ? iommu_map_nosync+0x129/0x230
[ 31.985451] iommu_map_nosync+0x129/0x230
[ 31.985454] blk_rq_dma_map_iter_start+0x186/0x1c0
[ 31.985458] nvme_prep_rq+0x4ff/0x8b0
[ 31.985461] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985463] nvme_queue_rqs+0xc0/0x1d0
[ 31.985466] blk_mq_dispatch_queue_requests+0xf2/0x140
[ 31.985469] blk_mq_flush_plug_list+0x71/0x170
[ 31.985472] __blk_flush_plug+0xcc/0x120
[ 31.985476] blk_finish_plug+0x1f/0x30
[ 31.985478] read_pages+0x1a8/0x260
[ 31.985483] ? filemap_add_folio+0xae/0xd0
[ 31.985485] page_cache_ra_unbounded+0x174/0x230
[ 31.985488] force_page_cache_ra+0x89/0xb0
[ 31.985491] filemap_get_pages+0x12a/0x720
[ 31.985494] filemap_read+0xda/0x3e0
[ 31.985497] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985499] ? alloc_pages_mpol+0x76/0x140
[ 31.985502] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985504] ? mod_memcg_lruvec_state+0x96/0x1a0
[ 31.985507] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985509] ? __lruvec_stat_mod_folio+0x6d/0xa0
[ 31.985511] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985512] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985514] ? set_ptes.constprop.0+0x36/0x80
[ 31.985517] ? srso_alias_return_thunk+0x5/0xfbef5
[ 31.985519] ? __handle_mm_fault+0xa2c/0x14d0
[ 31.985522] blkdev_read_iter+0x6f/0x140
[ 31.985525] vfs_read+0x207/0x330
[ 31.985528] ksys_read+0x5c/0xd0
[ 31.985530] do_syscall_64+0x50/0x1e0
[ 31.985533] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 31.985535] RIP: 0033:0x7f46c8576852
[ 31.985537] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 1a b4 0c 00 e8 a5 1d 02 00 0f
1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0
ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 31.985538] RSP: 002b:00007ffc06f9c638 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 31.985540] RAX: ffffffffffffffda RBX: 00007f46c7e02028 RCX: 00007f46c8576852
[ 31.985541] RDX: 0000000000040000 RSI: 00007f46c7e02038 RDI: 000000000000000c
[ 31.985542] RBP: 0000555f80925280 R08: 00007f46c7e02010 R09: 00007f46c7e02010
[ 31.985543] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
[ 31.985544] R13: 0000000000040000 R14: 00007f46c7e02010 R15: 0000555f809252d0
[ 31.985546] </TASK>
[ 31.985547] ---[ end trace 0000000000000000 ]---
-Vasant
next prev parent reply other threads:[~2025-10-07 12:09 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 17:46 [PATCH v5 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-09-10 3:40 ` Nicolin Chen
2025-09-15 15:51 ` Jason Gunthorpe
2025-09-18 7:14 ` Nicolin Chen
2025-09-18 14:49 ` Jason Gunthorpe
2025-09-18 19:43 ` Nicolin Chen
2025-09-18 6:49 ` Tian, Kevin
2025-09-18 18:06 ` Jason Gunthorpe
2025-09-19 8:11 ` Tian, Kevin
2025-09-19 14:31 ` Jason Gunthorpe
2025-09-24 9:20 ` Tian, Kevin
2025-09-22 14:45 ` [External] : " ALOK TIWARI
2025-09-22 17:05 ` Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-09-11 4:23 ` Nicolin Chen
2025-09-15 15:42 ` Jason Gunthorpe
2025-09-18 6:55 ` Tian, Kevin
2025-09-19 14:42 ` Jason Gunthorpe
2025-09-24 9:21 ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-09-11 5:38 ` Nicolin Chen
2025-09-15 15:36 ` Jason Gunthorpe
2025-09-18 6:58 ` Tian, Kevin
2025-09-19 15:26 ` Jason Gunthorpe
2025-09-24 9:22 ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-09-18 7:05 ` Tian, Kevin
2025-09-19 18:19 ` Jason Gunthorpe
2025-09-24 9:23 ` Tian, Kevin
2025-10-07 12:28 ` Jason Gunthorpe
2025-10-08 9:43 ` Vasant Hegde
2025-10-08 13:08 ` Jason Gunthorpe
2025-10-09 11:44 ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-09-18 7:08 ` Tian, Kevin
2025-09-19 18:35 ` Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-09-24 9:28 ` Tian, Kevin
2025-09-24 12:23 ` Jason Gunthorpe
2025-09-26 7:23 ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-09-26 7:47 ` Tian, Kevin
2025-09-29 16:44 ` Jason Gunthorpe
2025-10-07 12:08 ` Vasant Hegde [this message]
2025-10-07 13:11 ` Jason Gunthorpe
2025-10-08 9:52 ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-09-26 7:48 ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-09-26 7:50 ` Tian, Kevin
2025-09-03 17:46 ` [PATCH v5 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-09-26 7:57 ` Tian, Kevin
2025-09-29 16:17 ` Jason Gunthorpe
2025-10-08 10:05 ` Vasant Hegde
2025-10-08 13:03 ` Jason Gunthorpe
2025-10-09 11:43 ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-09-25 12:07 ` Ankit Soni
2025-09-25 12:32 ` Jason Gunthorpe
2025-09-25 12:39 ` Ankit Soni
2025-10-08 9:47 ` Vasant Hegde
2025-09-03 17:46 ` [PATCH v5 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-09-03 17:46 ` [PATCH v5 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b9b18a03-63a2-4065-a27e-d92dd5c860bc@amd.com \
--to=vasant.hegde@amd.com \
--cc=aik@amd.com \
--cc=alejandro.j.jimenez@oracle.com \
--cc=corbet@lwn.net \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=jgowans@amazon.com \
--cc=joro@8bytes.org \
--cc=justinstitt@google.com \
--cc=kevin.tian@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=michael.roth@amd.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
--cc=ojeda@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=shuah@kernel.org \
--cc=suravee.suthikulpanit@amd.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).