public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed
From: Sairaj Kodilkar <sarunkod@amd.com>
To: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	<qemu-devel@nongnu.org>
Cc: <sarunkod@amd.com>, <qemu@demindiro.com>, <mst@redhat.com>,
	<clement.mathieu--drif@eviden.com>, <pbonzini@redhat.com>,
	<richard.henderson@linaro.org>, <eduardo@habkost.net>,
	<boris.ostrovsky@oracle.com>
Subject: Re: [PATCH 1/2] amd_iommu: Follow root pointer before page walk and use 1-based levels
Date: Thu, 26 Mar 2026 12:22:37 +0530	[thread overview]
Message-ID: <7c01325b-c509-4121-9b16-c73e2e7dc2a0@amd.com> (raw)
In-Reply-To: <ba2d7018-7dcc-42d7-ba52-395d4e1d0298@oracle.com>



On 3/24/2026 7:55 PM, Alejandro Jimenez wrote:
>
> On 3/23/26 7:01 AM, Sairaj Kodilkar wrote:
>>
>> On 3/12/2026 2:09 AM, Alejandro Jimenez wrote:
>>> DTE[Mode] and PTE NextLevel encode page table levels as 1-based values, but
>>> fetch_pte() currently uses a 0-based level counter, making the logic
>>> harder to follow and requiring conversions between DTE mode and level.
>>>
>>> Switch the page table walk logic to use 1-based level accounting in
>>> fetch_pte() and the relevant macro helpers. To further simplify the page
>>> walking loop, split the root page table access from the walk i.e. rework
>>> fetch_pte() to follow the DTE Page Table Root Pointer and retrieve the top
>>> level pagetable entry before entering the loop, then iterate only over the
>>> PDE/PTE entries.
>>>
>>> The reworked algorithm fixes a page walk bug where the page size was
>>> calculated for the next level before checking if the current PTE was already
>>> a leaf/hugepage. That caused hugepage mappings to be reported as 4K pages,
>>> leading to performance degradation and failures in some setups.
>>>
>>> Fixes: a74bb3110a5b ("amd_iommu: Add helpers to walk AMD v1 Page Table
>>> format")
>>> Cc: qemu-stable@nongnu.org
>>> Reported-by: David Hoppenbrouwers <qemu@demindiro.com>
>>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>>> ---
>>>    hw/i386/amd_iommu.c | 132 ++++++++++++++++++++++++++++++--------------
>>>    hw/i386/amd_iommu.h |  11 ++--
>>>    2 files changed, 97 insertions(+), 46 deletions(-)
>>>
> [...]
>
>> Hi Alejandro,
>>
>> amdvi_sync_shadow_page_table_range() does not check if DTE
>> **valid and translation valid** bit are set. I added this check in
>> my reply to david's patch. Do you think we should include that
>> check as well to ensure that only DTEs with **valid and translation valid**
>> bit are passed to the fetch_pte ?
>>
> Right now the check for DTE[V] and DTE[TV] are implemented in
> amdvi_as_to_dte(), which all callers of
> amdvi_sync_shadow_page_table_range() use to extract the DTE that will be
> passed as its argument. So I think if we were to add a check, an assert
> would be more appropriate.
>
> On the other hand, I am worried about sprinkling assertions around the
> code, and I am actually considering removing the ones the original series
> added to amdvi_sync_shadow_page_table_range(), fetch_pte(), and
> amdvi_notify_iommu(), since any failures in these cases are likely to cause
> immediate symptoms and I don't believe it creates any window for data
> corruption.
>
> With that in mind, to answer your question I would not add any additional
> check on amdvi_sync_shadow_page_table_range(), and would rely on the
> current convention to check when retrieving a DTE via the interface we
> provided for it and all the code already uses i.e. amdvi_as_to_dte().
>
>
> For a follow up cleanup, maybe amdvi_as_to_dte() should be renamed with a
> more suggestive name that clearly indicates that it validates for those
> conditions. But in context, all of its callers rely on the fact that it
> does this verification and its return values also encode it.
>
> While looking into this I noticed a chance for minor cleanup in this flow:
> amdvi_do_translate()
>      amdvi_as_to_dte()
>      amdvi_page_walk()
>
> Where amdvi_page_walk() does an unnecessary check for DTE[TV], that
> amd_vi_as_to_dte() already ensures is set.

Hi Alejandro,
I agree with you about sprinkling assertions, I think some of the assertions
should ideally deliver a events to the guest. But we can have this in later
series

Also for both the patches

Reviewed-by: Sairaj Kodilkar <sarunkod@amd.com>

Thanks
Sairaj

>
> Thank you,
> Alejandro
>
>> Thanks
>> Sairaj Kodilkar
>>



  reply	other threads:[~2026-03-26  6:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11 20:39 [PATCH 0/2] amd_iommu: Fix page size reporting on hugepage mappings Alejandro Jimenez
2026-03-11 20:39 ` [PATCH 1/2] amd_iommu: Follow root pointer before page walk and use 1-based levels Alejandro Jimenez
2026-03-12 17:01   ` David Hoppenbrouwers
2026-03-23 11:01   ` Sairaj Kodilkar
2026-03-24 14:25     ` Alejandro Jimenez
2026-03-26  6:52       ` Sairaj Kodilkar [this message]
2026-03-11 20:39 ` [PATCH 2/2] amd_iommu: Reject non-decreasing NextLevel in fetch_pte() Alejandro Jimenez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c01325b-c509-4121-9b16-c73e2e7dc2a0@amd.com \
    --to=sarunkod@amd.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=clement.mathieu--drif@eviden.com \
    --cc=eduardo@habkost.net \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu@demindiro.com \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox