Re: MMIO through IOMMU from a TCG processor

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: Mark Burton <quic_mburton@quicinc.com>,
	Mark Cave-Ayland <mark.caveayland@nutanix.com>
Cc: "QEMU Developers" <qemu-devel@nongnu.org>,
	peterx@redhat.com, eric.auger@redhat.com,
	zhenzhong.duan@intel.com, alejandro.j.jimenez@oracle.com,
	peter.maydell@linaro.org, jasowang@redhat.com,
	pbonzini@redhat.com, tjeznach@rivosinc.com,
	steven.sistare@oracle.com, clement.mathieu--drif@eviden.com,
	joao.m.martins@oracle.com, ean-philippe@linaro.org,
	jean-philippe@linaro.org, sarunkod@amd.com,
	"Phil Mathieu-Daudé" <philmd@linaro.org>
Subject: Re: MMIO through IOMMU from a TCG processor
Date: Thu, 9 Oct 2025 11:07:19 -0700	[thread overview]
Message-ID: <0138de27-04af-4ec6-83bd-db917f867aa5@linaro.org> (raw)
In-Reply-To: <D7DA7B85-2439-4CC2-A852-604154ABDC99@quicinc.com>

Hi Mark,

"recorded section therefore seems to be incorrect".
do you observe a crash, or on assert failing at execution?

I don't know in details the code you mention, but after investigating 
and fixing https://gitlab.com/qemu-project/qemu/-/issues/3040, I can 
share a few things.

Overall, what you describe looks like a race condition exposing a 
lifetime issue, especially when saying "we 'loose' the address space 
that has been returned by the translate function".
A value was not updated as expected and is out of sync, or was freed too 
early. Memory regions lifetime is something definitely tricky in QEMU, 
and when you mix that with RCU, things can become very obscure in 
multithreaded scenarios.

In the bug above, the solution was to stop duplicating this information, 
and get it from the same source. The overhead to read such atomic data 
is quite small, thanks to use of RCU.
At KVM Forum, Paolo told me he introduced this copy precisely to avoid 
issues, but the opposite happened in reality, which we both found was 
quite funny.

Additional questions:
- At which time of execution does it happen? Is it during pci devices 
initialization, or when remapping specific memory sections?
- Is the bug deterministic or random? If random, does increasing the 
number of pci devices attached increase the probably to meet it?

Additional tools:
- If you observe a crash, build with asan. If you get a use-after-free 
error, it's probably an issue with RCU cleaning up things before you 
expect. This is what I had in the bug mentioned above.
- If your assert fail, I can recommend you capture execution through rr 
(https://github.com/rr-debugger/rr), using chaos mode rr record --chaos, 
which will randomize scheduling of threads. I don't know if you're 
familiar with it, but it allows you to debug your execution "backward".
Once you captured the faulty execution, you can reach the crash or 
faulty assert, then execute backward (reverse-continue) with a 
watchpoint set on the (correct) value that was updated meanwhile. This 
way, you'll find which sequence led to desynchronization, and then 
you'll have a good start to deduce what the root cause is.
- Spend some time making the crash/assert almost deterministic, it will 
save you time later, especially when implementing a possible fix and 
prove it works.

I hope it helps.

Regards,
Pierrick

On 10/9/25 2:10 AM, Mark Burton wrote:
> 
> (Adding Pierrick)
> Thanks for getting back to me Mark.
> 
> I initially thought the same, and I think I have seen that issue, I have also taken that patch, However …..
> 
> For MMIO access, as best I can tell, the initial calculation of the despatch is based on the iotlb reported by the translate function (correct), while the subsequent use of the section number uses the dispatch table from the CPU’s address space….. which gives you the wrong section.
> 
> I would very happily do a live debug with you (or anybody) if it would help… I’m more than willing to believe I’ve made a mistake, but I just don’t see how it’s supposed to work.
> 
> I have been looking at solutions, and right now, I don’t see anything obvious. As best I can tell, we “loose” the address space that has been returned by the translate function - so, either we would need a way to hold onto that, or, we would have to re-call the function, or….
> All of those options look really really nasty to me.
> 
> The issue is going to be systems where SMMU’s are used all over the place, specifically, in front of MMIO. (Memory works OK because we get the memory pointer itself, all is fine, the issue seems only be with MMIO accesses through IOMMU regions).
> 
> Cheers
> Mark.
> 
> 
>> On 9 Oct 2025, at 10:43, Mark Cave-Ayland <mark.caveayland@nutanix.com> wrote:
>>
>> On 08/10/2025 13:38, Mark Burton wrote:
>>
>>> All, sorry for the wide CC, I’m trying to find somebody who understands this corder of the code…. This is perhaps a obscure, but I think it should work.
>>> I am trying to access an MMIO region through an IOMMU, from TCG.
>>> The IOMMU translation has provided an address space that is different from the CPU’s own address space.
>>> In address_space_translate_for_iotlb the section is calculated using the address space provide by the IOMMU translation.
>>>> d = flatview_to_dispatch(address_space_to_flatview(iotlb.target_as));
>>>>
>>> Later, we come to do the actual access (via e.g. do_st_mmio_leN), and at this point we pick up the cpu’s address spaces in iotlb_to_section, which is different, and the recorded section therefore seems to be incorrect.
>>>> CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
>>>> AddressSpaceDispatch *d = cpuas->memory_dispatch;
>>>> int section_index = index & ~TARGET_PAGE_MASK;
>>>> MemoryRegionSection *ret;
>>>>
>>>> assert(section_index < d->map.sections_nb);
>>>> ret = d->map.sections + section_index;
>>> What I don’t fully understand is how this is supposed to work….?
>>> Have I missed something obvious?
>>> Cheers
>>> Mark.
>>
>> What version of QEMU are you using? I'm wondering if you're getting caught out by a variant of this: https://gitlab.com/qemu-project/qemu/-/issues/3040.
>>
>>
>> ATB,
>>
>> Mark.
>>
>

next prev parent reply	other threads:[~2025-10-09 18:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-08 12:38 MMIO through IOMMU from a TCG processor Mark Burton
2025-10-09  8:43 ` Mark Cave-Ayland
2025-10-09  9:10   ` Mark Burton
2025-10-09 18:07     ` Pierrick Bouvier [this message]
2025-10-10  8:53       ` Mark Burton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0138de27-04af-4ec6-83bd-db917f867aa5@linaro.org \
    --to=pierrick.bouvier@linaro.org \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=clement.mathieu--drif@eviden.com \
    --cc=ean-philippe@linaro.org \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=mark.caveayland@nutanix.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quic_mburton@quicinc.com \
    --cc=sarunkod@amd.com \
    --cc=steven.sistare@oracle.com \
    --cc=tjeznach@rivosinc.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).