From: "Alejandro Vallejo" <alejandro.vallejo@cloud.com>
To: "Elias El Yandouzi" <eliasely@amazon.com>,
<xen-devel@lists.xenproject.org>
Cc: <julien@xen.org>, <pdurrant@amazon.com>, <dwmw@amazon.com>,
"Hongyan Xia" <hongyxia@amazon.com>,
"Julien Grall" <jgrall@amazon.com>
Subject: Re: [PATCH V4 03/15] x86/pv: Rewrite how building PV dom0 handles domheap mappings
Date: Mon, 09 Dec 2024 17:42:06 +0000 [thread overview]
Message-ID: <D67CSQZBRZ0E.34J8KW00AIFI6@cloud.com> (raw)
In-Reply-To: <20241111131148.52568-4-eliasely@amazon.com>
Hi,
I've been trying to run this series for a while, but it crashes very
frequentyly starting from the patch that generalizes the mapcache. I think I've
tracked it down to this patch.
On Mon Nov 11, 2024 at 1:11 PM GMT, Elias El Yandouzi wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
>
> Building a PV dom0 is allocating from the domheap but uses it like the
> xenheap. Use the pages as they should be.
>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Elias El Yandouzi <eliasely@amazon.com>
>
> ----
> Changes in V4:
> * Reduce the scope of l{1,2,4}start_mfn variables
> * Make the macro `UNMAP_MAP_AND_ADVANCE` return the new virtual address
>
> Changes in V3:
> * Fold following patch 'x86/pv: Map L4 page table for shim domain'
>
> Changes in V2:
> * Clarify the commit message
> * Break the patch in two parts
>
> Changes since Hongyan's version:
> * Rebase
> * Remove spurious newline
>
> diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> index 18b7a3e4e025..b03df609cadb 100644
> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -382,6 +382,7 @@ static int __init dom0_construct(struct domain *d,
> l3_pgentry_t *l3tab = NULL, *l3start = NULL;
> l2_pgentry_t *l2tab = NULL, *l2start = NULL;
> l1_pgentry_t *l1tab = NULL, *l1start = NULL;
> + mfn_t l3start_mfn = INVALID_MFN;
>
> /*
> * This fully describes the memory layout of the initial domain. All
> @@ -719,22 +720,34 @@ static int __init dom0_construct(struct domain *d,
> v->arch.pv.event_callback_cs = FLAT_COMPAT_KERNEL_CS;
> }
>
> +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) ({ \
> + do { \
> + unmap_domain_page(virt_var); \
> + mfn_var = maddr_to_mfn(maddr); \
> + maddr += PAGE_SIZE; \
> + virt_var = map_domain_page(mfn_var); \
> + } while ( false ); \
> + virt_var; \
> +})
> +
> if ( !compat )
> {
> + mfn_t l4start_mfn;
> maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
> - l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> + l4tab = UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc);
In here l4start is mapped on the idle domain perdomain area, but...
> clear_page(l4tab);
> - init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
> - d, INVALID_MFN, true);
> - v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
> + init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true);
> + v->arch.guest_table = pagetable_from_mfn(l4start_mfn);
> }
> else
> {
> /* Monitor table already created by switch_compat(). */
> - l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
> + mfn_t l4start_mfn = pagetable_get_mfn(v->arch.guest_table);
> + l4start = l4tab = map_domain_page(l4start_mfn);
> /* See public/xen.h on why the following is needed. */
> maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
> l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> + UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
> }
>
> l4tab += l4_table_offset(v_start);
> @@ -743,15 +756,17 @@ static int __init dom0_construct(struct domain *d,
> {
> if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) )
> {
> + mfn_t l1start_mfn;
> maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table;
> - l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> + l1tab = UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc);
> clear_page(l1tab);
> if ( count == 0 )
> l1tab += l1_table_offset(v_start);
> if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) )
> {
> + mfn_t l2start_mfn;
> maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table;
> - l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> + l2tab = UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
> clear_page(l2tab);
> if ( count == 0 )
> l2tab += l2_table_offset(v_start);
> @@ -761,19 +776,19 @@ static int __init dom0_construct(struct domain *d,
> {
> maddr_to_page(mpt_alloc)->u.inuse.type_info =
> PGT_l3_page_table;
> - l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> + UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
> }
> l3tab = l3start;
> clear_page(l3tab);
> if ( count == 0 )
> l3tab += l3_table_offset(v_start);
> - *l4tab = l4e_from_paddr(__pa(l3start), L4_PROT);
> + *l4tab = l4e_from_mfn(l3start_mfn, L4_PROT);
> l4tab++;
> }
> - *l3tab = l3e_from_paddr(__pa(l2start), L3_PROT);
> + *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
> l3tab++;
> }
> - *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
> + *l2tab = l2e_from_mfn(l1start_mfn, L2_PROT);
> l2tab++;
> }
> if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
> @@ -792,27 +807,32 @@ static int __init dom0_construct(struct domain *d,
>
> if ( compat )
> {
> - l2_pgentry_t *l2t;
> -
> /* Ensure the first four L3 entries are all populated. */
> for ( i = 0, l3tab = l3start; i < 4; ++i, ++l3tab )
> {
> if ( !l3e_get_intpte(*l3tab) )
> {
> + mfn_t l2start_mfn;
> maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table;
> - l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> - clear_page(l2tab);
> - *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
> + UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
> + clear_page(l2start);
> + *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
> }
> if ( i == 3 )
> l3e_get_page(*l3tab)->u.inuse.type_info |= PGT_pae_xen_l2;
> }
>
> - l2t = map_l2t_from_l3e(l3start[3]);
> - init_xen_pae_l2_slots(l2t, d);
> - unmap_domain_page(l2t);
> + UNMAP_DOMAIN_PAGE(l2start);
> + l2start = map_l2t_from_l3e(l3start[3]);
> + init_xen_pae_l2_slots(l2start, d);
> }
>
> +#undef UNMAP_MAP_AND_ADVANCE
> +
> + UNMAP_DOMAIN_PAGE(l1start);
> + UNMAP_DOMAIN_PAGE(l2start);
> + UNMAP_DOMAIN_PAGE(l3start);
... l4start is not unmapped here. This is a problem, because we're about to
change the page tables into dom0's and start using its mapcache.
IMO, we should be unmapping here, and remapping in dom0's context. Otherwise
l4start becomes a transiently stale pointer. Any remaining pointer obtained via
map_domain_page() is a dangling pointer after the mapcache+pagetable switch.
> +
> /* Pages that are part of page tables must be read only. */
> mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages, &flush_flags);
>
> @@ -987,6 +1007,8 @@ static int __init dom0_construct(struct domain *d,
> pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
> vphysmap_start, si);
>
> + UNMAP_DOMAIN_PAGE(l4start);
As it is, this unmap is operating on the wrong mapcache, I think. I don't quite
understand why I see intermittent boot crashes and not constant ones, but this
seems like a bug.
What we want, I think, is:
1. Increase the scope of l4start_mfn to be function-level.
2. Do UNMAP_DOMAIN_PAGE(l4start) along with l1start, l2start and l3start.
3. Include a pair of map_domain_page() and UNMAP_DOMAIN_PAGE() within the
conditional, surrounding pv_shim_setup_dom.
> +
> #ifdef CONFIG_COMPAT
> if ( compat )
> xlat_start_info(si, pv_shim ? XLAT_start_info_console_domU
I'll keep testing it in case I missed something, but this seems to work.
Cheers,
Alejandro
next prev parent reply other threads:[~2024-12-09 17:42 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-11 13:11 [PATCH V4 00/15] Remove the directmap Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 01/15] x86: Create per-domain mapping for guest_root_pt Elias El Yandouzi
2024-12-16 16:01 ` Alejandro Vallejo
2024-11-11 13:11 ` [PATCH V4 02/15] x86/pv: Use copy_domain_page() to manage domheap pages during initrd relocation Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 03/15] x86/pv: Rewrite how building PV dom0 handles domheap mappings Elias El Yandouzi
2024-12-09 17:42 ` Alejandro Vallejo [this message]
2024-11-11 13:11 ` [PATCH V4 04/15] x86: Initialize mapcache for PV, HVM, and idle domains Elias El Yandouzi
2024-11-11 18:46 ` Andrew Cooper
2024-11-11 13:11 ` [PATCH V4 05/15] x86: Add a boot option to enable and disable the direct map Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 06/15] xen/x86: Add support for the PMAP Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 07/15] x86/domain_page: Remove the fast paths when mfn is not in the directmap Elias El Yandouzi
2024-11-18 18:08 ` Alejandro Vallejo
2024-11-19 7:55 ` Jan Beulich
2024-11-11 13:11 ` [PATCH V4 08/15] xen/page_alloc: Add a path for xenheap when there is no direct map Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 09/15] x86/setup: Leave early boot slightly earlier Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 10/15] xen/page_alloc: vmap heap nodes when they are outside the direct map Elias El Yandouzi
2024-12-13 13:46 ` Alejandro Vallejo
2024-12-13 14:59 ` Alejandro Vallejo
2024-11-11 13:11 ` [PATCH V4 11/15] x86/setup: Do not create valid mappings when directmap=no Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 12/15] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Elias El Yandouzi
2024-11-21 10:34 ` Michal Orzel
2024-11-11 13:11 ` [PATCH V4 13/15] xen/arm64: mm: Use per-pCPU page-tables Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 14/15] xen/arm64: Implement a mapcache for arm64 Elias El Yandouzi
2024-11-11 13:11 ` [PATCH V4 15/15] xen/arm64: Allow the admin to enable/disable the directmap Elias El Yandouzi
2024-11-11 19:03 ` [PATCH V4 00/15] Remove " Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D67CSQZBRZ0E.34J8KW00AIFI6@cloud.com \
--to=alejandro.vallejo@cloud.com \
--cc=dwmw@amazon.com \
--cc=eliasely@amazon.com \
--cc=hongyxia@amazon.com \
--cc=jgrall@amazon.com \
--cc=julien@xen.org \
--cc=pdurrant@amazon.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.