From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Elias El Yandouzi <eliasely@amazon.com>,
julien@xen.org, pdurrant@amazon.com, dwmw@amazon.com,
Hongyan Xia <hongyxia@amazon.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Julien Grall <jgrall@amazon.com>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH V3 (resend) 01/19] x86: Create per-domain mapping of guest_root_pt
Date: Mon, 17 Jun 2024 09:33:42 +0200 [thread overview]
Message-ID: <Zm_m1shUlyt_KvBJ@macbook> (raw)
In-Reply-To: <71f7b9c8-43f9-4703-b6e3-8b3fe8b740c0@suse.com>
On Fri, Jun 14, 2024 at 08:23:30AM +0200, Jan Beulich wrote:
> On 13.06.2024 18:31, Elias El Yandouzi wrote:
> > On 16/05/2024 08:17, Jan Beulich wrote:
> >> On 15.05.2024 20:25, Elias El Yandouzi wrote:
> >>> However, I noticed quite a weird bug while doing some testing. I may
> >>> need your expertise to find the root cause.
> >>
> >> Looks like you've overflowed the dom0 kernel stack, most likely because
> >> of recurring nested exceptions.
> >>
> >>> In the case where I have more vCPUs than pCPUs (and let's consider we
> >>> have one pCPU for two vCPUs), I noticed that I would always get a page
> >>> fault in dom0 kernel (5.10.0-13-amd64) at the exact same location. I did
> >>> a bit of investigation but I couldn't come to a clear conclusion.
> >>> Looking at the stack trace [1], I have the feeling the crash occurs in a
> >>> loop or a recursive call.
> >>>
> >>> I tried to identify where the crash occurred using addr2line:
> >>>
> >>> > addr2line -e vmlinux-5.10.0-29-amd64 0xffffffff810218a0
> >>> debian/build/build_amd64_none_amd64/arch/x86/xen/mmu_pv.c:880
> >>>
> >>> It turns out to point on the closing bracket of the function
> >>> xen_mm_unpin_all()[2].
> >>>
> >>> I thought the crash could happen while returning from the function in
> >>> the assembly epilogue but the output of objdump doesn't even show the
> >>> address.
> >>>
> >>> The only theory I could think of was that because we only have one pCPU,
> >>> we may never execute one of the two vCPUs, and never setup the mapping
> >>> to the guest_root_pt in write_ptbase(), hence the page fault. This is
> >>> just a random theory, I couldn't find any hint suggesting it would be
> >>> the case though. Any idea how I could debug this?
> >>
> >> I guess you want to instrument Xen enough to catch the top level fault (or
> >> the 2nd from top, depending on where the nesting actually starts) to see
> >> why that happens. Quite likely some guest mapping isn't set up properly.
> >>
> >
> > Julien helped me with this one and I believe we have identified the
> > problem.
> >
> > As you've suggested, I wrote the mapping of the guest root PT in our
> > per-domain section, root_pt_l1tab, within write_ptbase() function as
> > we'd always be in the case v == current plus switch_cr3_cr4() would
> > always flush local tlb.
> >
> > However, there exists a path, in toggle_guest_mode(), where we could
> > call update_cr3()/make_cr3() without calling write_ptbase() and hence
> > not maintain mappings properly. Instead toggle_guest_mode() has a partly
> > open-coded version of write_ptbase().
> >
> > Would you rather like to see the mappings written in make_cr3() or in
> > toggle_guest_mode() within the pseudo open-coded version of write_ptbase()?
>
> Likely the latter, but that's hard to tell without seeing the resulting
> code.
There's already a special case for XPTI in toggle_guest_mode() to deal
exactly with that AFAICT. Maybe it would be better if write_ptbase()
could be made suitable to be used in _toggle_guest_pt() instead of
directly calling write_cr3(), as we could then avoid having to pile
open-coded bodges in toggle_guest_mode() and/or _toggle_guest_pt().
Thanks, Roger.
next prev parent reply other threads:[~2024-06-17 7:34 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-13 13:40 [PATCH V3 (resend) 00/19] Remove the directmap Elias El Yandouzi
2024-05-13 13:40 ` [PATCH V3 (resend) 01/19] x86: Create per-domain mapping of guest_root_pt Elias El Yandouzi
2024-05-14 14:51 ` Jan Beulich
2024-05-15 18:25 ` Elias El Yandouzi
2024-05-16 7:17 ` Jan Beulich
2024-06-13 16:31 ` Elias El Yandouzi
2024-06-14 6:23 ` Jan Beulich
2024-06-17 7:33 ` Roger Pau Monné [this message]
2024-05-13 13:40 ` [PATCH V3 (resend) 02/19] x86/pv: Domheap pages should be mapped while relocating initrd Elias El Yandouzi
2024-05-13 15:40 ` Roger Pau Monné
2024-05-13 13:40 ` [PATCH V3 (resend) 03/19] x86/pv: Rewrite how building PV dom0 handles domheap mappings Elias El Yandouzi
2024-05-13 16:49 ` Roger Pau Monné
2024-05-14 14:58 ` Jan Beulich
2024-05-14 15:03 ` Jan Beulich
2024-07-16 16:12 ` Elias El Yandouzi
2024-07-17 10:45 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 04/19] x86: Lift mapcache variable to the arch level Elias El Yandouzi
2024-05-14 8:21 ` Roger Pau Monné
2024-05-15 13:11 ` Jan Beulich
2024-07-16 17:06 ` Alejandro Vallejo
2024-07-17 12:41 ` Alejandro Vallejo
2024-05-13 13:40 ` [PATCH V3 (resend) 05/19] x86/mapcache: Initialise the mapcache for the idle domain Elias El Yandouzi
2024-05-14 8:42 ` Roger Pau Monné
2024-05-15 13:44 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 06/19] x86: Add a boot option to enable and disable the direct map Elias El Yandouzi
2024-05-14 9:20 ` Roger Pau Monné
2024-05-14 10:20 ` Roger Pau Monné
2024-05-15 13:54 ` Jan Beulich
2024-05-16 9:19 ` Roger Pau Monné
2024-05-16 9:24 ` Jan Beulich
2024-05-15 13:59 ` Jan Beulich
2024-05-15 16:02 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 07/19] xen/x86: Add support for the PMAP Elias El Yandouzi
2024-05-14 9:40 ` Roger Pau Monné
2024-05-14 9:43 ` Jan Beulich
2024-05-14 10:22 ` Roger Pau Monné
2024-05-14 10:26 ` Jan Beulich
2024-05-14 11:51 ` Roger Pau Monné
2024-05-14 12:33 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 08/19] xen/x86: Add build assertion for fixmap entries Elias El Yandouzi
2024-05-14 9:42 ` Roger Pau Monné
2024-05-14 9:45 ` Jan Beulich
2024-05-15 14:03 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 09/19] x86/domain_page: Remove the fast paths when mfn is not in the directmap Elias El Yandouzi
2024-05-14 11:48 ` Roger Pau Monné
2024-05-15 14:21 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 10/19] xen/page_alloc: Add a path for xenheap when there is no direct map Elias El Yandouzi
2024-05-14 13:07 ` Roger Pau Monné
2024-05-15 15:13 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 11/19] x86/setup: Leave early boot slightly earlier Elias El Yandouzi
2024-05-14 14:11 ` Roger Pau Monné
2024-05-15 15:22 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 12/19] x86/setup: vmap heap nodes when they are outside the direct map Elias El Yandouzi
2024-05-14 15:02 ` Roger Pau Monné
2024-05-15 15:28 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 13/19] x86/setup: Do not create valid mappings when directmap=no Elias El Yandouzi
2024-05-14 15:39 ` Roger Pau Monné
2024-05-15 15:50 ` Jan Beulich
2024-05-15 15:59 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 14/19] Rename mfn_to_virt() calls Elias El Yandouzi
2024-05-14 15:45 ` Roger Pau Monné
2024-05-14 16:22 ` Jan Beulich
2024-05-15 9:38 ` Roger Pau Monné
2024-05-15 9:42 ` Jan Beulich
2024-05-16 8:57 ` Jan Beulich
2024-05-13 13:40 ` [PATCH V3 (resend) 15/19] Rename maddr_to_virt() calls Elias El Yandouzi
2024-05-13 13:40 ` [PATCH V3 (resend) 16/19] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Elias El Yandouzi
2024-05-13 13:40 ` [PATCH V3 (resend) 17/19] xen/arm64: mm: Use per-pCPU page-tables Elias El Yandouzi
2024-05-13 13:40 ` [PATCH V3 (resend) 18/19] xen/arm64: Implement a mapcache for arm64 Elias El Yandouzi
2024-05-13 13:40 ` [PATCH V3 (resend) 19/19] xen/arm64: Allow the admin to enable/disable the directmap Elias El Yandouzi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zm_m1shUlyt_KvBJ@macbook \
--to=roger.pau@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=dwmw@amazon.com \
--cc=eliasely@amazon.com \
--cc=hongyxia@amazon.com \
--cc=jbeulich@suse.com \
--cc=jgrall@amazon.com \
--cc=julien@xen.org \
--cc=pdurrant@amazon.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.