From: "Alejandro Vallejo" <alejandro.vallejo@cloud.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: <xen-devel@lists.xenproject.org>,
"Jan Beulich" <jbeulich@suse.com>,
"Andrew Cooper" <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v2 04/18] x86/pv: introduce function to populate perdomain area and use it to map Xen GDT
Date: Fri, 10 Jan 2025 15:50:17 +0000 [thread overview]
Message-ID: <D6YIGKABLDGN.J7DROWZ0H7BK@cloud.com> (raw)
In-Reply-To: <Z4Eur-PNej2JQAm_@macbook.local>
On Fri Jan 10, 2025 at 2:29 PM GMT, Roger Pau Monné wrote:
> On Thu, Jan 09, 2025 at 09:55:44AM +0000, Alejandro Vallejo wrote:
> > On Wed Jan 8, 2025 at 2:26 PM GMT, Roger Pau Monne wrote:
> > > The current code to update the Xen part of the GDT when running a PV guest
> > > relies on caching the direct map address of all the L1 tables used to map the
> > > GDT and LDT, so that entries can be modified.
> > >
> > > Introduce a new function that populates the per-domain region, either using the
> > > recursive linear mappings when the target vCPU is the current one, or by
> > > directly modifying the L1 table of the per-domain region.
> > >
> > > Using such function to populate per-domain addresses drops the need to keep a
> > > reference to per-domain L1 tables previously used to change the per-domain
> > > mappings.
> > >
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > > xen/arch/x86/domain.c | 11 +++-
> > > xen/arch/x86/include/asm/desc.h | 6 +-
> > > xen/arch/x86/include/asm/mm.h | 2 +
> > > xen/arch/x86/include/asm/processor.h | 5 ++
> > > xen/arch/x86/mm.c | 88 ++++++++++++++++++++++++++++
> > > xen/arch/x86/smpboot.c | 6 +-
> > > xen/arch/x86/traps.c | 10 ++--
> > > 7 files changed, 113 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > > index 1f680bf176ee..0bd0ef7e40f4 100644
> > > --- a/xen/arch/x86/domain.c
> > > +++ b/xen/arch/x86/domain.c
> > > @@ -1953,9 +1953,14 @@ static always_inline bool need_full_gdt(const struct domain *d)
> > >
> > > static void update_xen_slot_in_full_gdt(const struct vcpu *v, unsigned int cpu)
> > > {
> > > - l1e_write(pv_gdt_ptes(v) + FIRST_RESERVED_GDT_PAGE,
> > > - !is_pv_32bit_vcpu(v) ? per_cpu(gdt_l1e, cpu)
> > > - : per_cpu(compat_gdt_l1e, cpu));
> > > + ASSERT(v != current);
> >
> > For this assert, and others below. IIUC, curr_vcpu == current when we're
> > properly switched. When we're idling current == idle and curr_vcpu == prev_ctx.
> >
> > Granted, calling this in the middle of a lazy idle loop would be weird, but
> > would it make sense for PT consistency to use curr_vcpu here...
>
> Hm, this function is called in a very specific context, and the assert
> intends to reflect that. TBH I could just drop it, as
> populate_perdomain_mapping() will DTRT also when v == current. The
> expectation for the context is also that current == curr_vcpu.
>
> Note however that if v == current we would need a flush after the
> populate_perdomain_mapping() call, since populate_perdomain_mapping()
> doesn't perform any flushing of the modified entries. The main
> purpose of the ASSERT() is to notice this.
>
> > > +
> > > + populate_perdomain_mapping(v,
> > > + GDT_VIRT_START(v) +
> > > + (FIRST_RESERVED_GDT_PAGE << PAGE_SHIFT),
> > > + !is_pv_32bit_vcpu(v) ? &per_cpu(gdt_mfn, cpu)
> > > + : &per_cpu(compat_gdt_mfn,
> > > + cpu), 1);
> > > }
> > >
> > > static void load_full_gdt(const struct vcpu *v, unsigned int cpu)
> > > diff --git a/xen/arch/x86/include/asm/desc.h b/xen/arch/x86/include/asm/desc.h
> > > index a1e0807d97ed..33981bfca588 100644
> > > --- a/xen/arch/x86/include/asm/desc.h
> > > +++ b/xen/arch/x86/include/asm/desc.h
> > > @@ -44,6 +44,8 @@
> > >
> > > #ifndef __ASSEMBLY__
> > >
> > > +#include <xen/mm-frame.h>
> > > +
> > > #define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3)
> > >
> > > /* Fix up the RPL of a guest segment selector. */
> > > @@ -212,10 +214,10 @@ struct __packed desc_ptr {
> > >
> > > extern seg_desc_t boot_gdt[];
> > > DECLARE_PER_CPU(seg_desc_t *, gdt);
> > > -DECLARE_PER_CPU(l1_pgentry_t, gdt_l1e);
> > > +DECLARE_PER_CPU(mfn_t, gdt_mfn);
> > > extern seg_desc_t boot_compat_gdt[];
> > > DECLARE_PER_CPU(seg_desc_t *, compat_gdt);
> > > -DECLARE_PER_CPU(l1_pgentry_t, compat_gdt_l1e);
> > > +DECLARE_PER_CPU(mfn_t, compat_gdt_mfn);
> > > DECLARE_PER_CPU(bool, full_gdt_loaded);
> > >
> > > static inline void lgdt(const struct desc_ptr *gdtr)
> > > diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
> > > index 6c7e66ee21ab..b50a51327b2b 100644
> > > --- a/xen/arch/x86/include/asm/mm.h
> > > +++ b/xen/arch/x86/include/asm/mm.h
> > > @@ -603,6 +603,8 @@ int compat_arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
> > > int create_perdomain_mapping(struct domain *d, unsigned long va,
> > > unsigned int nr, l1_pgentry_t **pl1tab,
> > > struct page_info **ppg);
> > > +void populate_perdomain_mapping(const struct vcpu *v, unsigned long va,
> > > + mfn_t *mfn, unsigned long nr);
> > > void destroy_perdomain_mapping(struct domain *d, unsigned long va,
> > > unsigned int nr);
> > > void free_perdomain_mappings(struct domain *d);
> > > diff --git a/xen/arch/x86/include/asm/processor.h b/xen/arch/x86/include/asm/processor.h
> > > index d247ef8dd226..82ee89f736c2 100644
> > > --- a/xen/arch/x86/include/asm/processor.h
> > > +++ b/xen/arch/x86/include/asm/processor.h
> > > @@ -243,6 +243,11 @@ static inline unsigned long cr3_pa(unsigned long cr3)
> > > return cr3 & X86_CR3_ADDR_MASK;
> > > }
> > >
> > > +static inline mfn_t cr3_mfn(unsigned long cr3)
> > > +{
> > > + return maddr_to_mfn(cr3_pa(cr3));
> > > +}
> > > +
> > > static inline unsigned int cr3_pcid(unsigned long cr3)
> > > {
> > > return IS_ENABLED(CONFIG_PV) ? cr3 & X86_CR3_PCID_MASK : 0;
> > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > > index 3d5dd22b6c36..0abea792486c 100644
> > > --- a/xen/arch/x86/mm.c
> > > +++ b/xen/arch/x86/mm.c
> > > @@ -6423,6 +6423,94 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
> > > return rc;
> > > }
> > >
> > > +void populate_perdomain_mapping(const struct vcpu *v, unsigned long va,
> > > + mfn_t *mfn, unsigned long nr)
> > > +{
> > > + l1_pgentry_t *l1tab = NULL, *pl1e;
> > > + const l3_pgentry_t *l3tab;
> > > + const l2_pgentry_t *l2tab;
> > > + struct domain *d = v->domain;
> > > +
> > > + ASSERT(va >= PERDOMAIN_VIRT_START &&
> > > + va < PERDOMAIN_VIRT_SLOT(PERDOMAIN_SLOTS));
> > > + ASSERT(!nr || !l3_table_offset(va ^ (va + nr * PAGE_SIZE - 1)));
> > > +
> > > + /* Use likely to force the optimization for the fast path. */
> > > + if ( likely(v == current) )
> >
> > ... and here? In particular I'd expect using curr_vcpu here means...
>
> I'm afraid not, this is a trap I've fallen originally when doing this
> series, as I indeed had v == curr_vcpu here (and no
> sync_local_execstate() call).
>
> However as a result of an interrupt, a call to sync_local_execstate()
> might happen, at which point the previous check of v == curr_vcpu
> becomes stale.
Wow, that's nasty! More than fair enough then. Guess the XSAVE wrappers (and
more generally all vCPU-local memory accessors) will have to take this into
account before poking into the contents of the perdomain region.
>
> > > + {
> > > + unsigned int i;
> > > +
> > > + /* Ensure page-tables are from current (if current != curr_vcpu). */
> > > + sync_local_execstate();
> >
> > ... this should not be needed.
>
> As kind of mentioned above, this is required to ensure the page-tables
> are in-sync with the vCPU in current, and cannot change as a result of
> an interrupt triggering a call to sync_local_execstate().
>
> Otherwise the page-tables could change while or after the call to
> populate_perdomain_mapping(), and the mappings could end up being
> created on the wrong page-tables.
>
> Thanks, Roger.
Cheers,
Alejandro
next prev parent reply other threads:[~2025-01-10 15:50 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 14:26 [PATCH v2 00/18] x86: adventures in Address Space Isolation Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 01/18] x86/mm: purge unneeded destroy_perdomain_mapping() Roger Pau Monne
2025-01-08 15:59 ` Alejandro Vallejo
2025-01-08 14:26 ` [PATCH v2 02/18] x86/domain: limit window where curr_vcpu != current on context switch Roger Pau Monne
2025-01-08 16:26 ` Alejandro Vallejo
2025-01-09 17:39 ` Roger Pau Monné
2025-01-09 8:59 ` Jan Beulich
2025-01-09 17:33 ` Roger Pau Monné
2025-01-14 15:02 ` Jan Beulich
2025-01-17 14:57 ` Roger Pau Monné
2025-01-08 14:26 ` [PATCH v2 03/18] x86/mm: introduce helper to detect per-domain L1 entries that need freeing Roger Pau Monne
2025-01-09 9:03 ` Jan Beulich
2025-01-08 14:26 ` [PATCH v2 04/18] x86/pv: introduce function to populate perdomain area and use it to map Xen GDT Roger Pau Monne
2025-01-09 9:10 ` Jan Beulich
2025-01-10 14:15 ` Roger Pau Monné
2025-01-09 9:55 ` Alejandro Vallejo
2025-01-10 14:29 ` Roger Pau Monné
2025-01-10 15:50 ` Alejandro Vallejo [this message]
2025-01-08 14:26 ` [PATCH v2 05/18] x86/mm: switch destroy_perdomain_mapping() parameter from domain to vCPU Roger Pau Monne
2025-01-09 10:02 ` Alejandro Vallejo
2025-01-10 14:30 ` Roger Pau Monné
2025-01-08 14:26 ` [PATCH v2 06/18] x86/pv: set/clear guest GDT mappings using {populate,destroy}_perdomain_mapping() Roger Pau Monne
2025-01-08 15:11 ` [PATCH v2.1 " Roger Pau Monne
2025-01-09 10:25 ` Alejandro Vallejo
2025-01-10 14:33 ` Roger Pau Monné
2025-01-14 15:30 ` Jan Beulich
2025-01-08 14:26 ` [PATCH v2 07/18] x86/pv: update guest LDT mappings using the linear entries Roger Pau Monne
2025-01-09 14:34 ` Alejandro Vallejo
2025-01-10 14:44 ` Roger Pau Monné
2025-01-10 15:36 ` Alejandro Vallejo
2025-01-14 15:42 ` Jan Beulich
2025-01-08 14:26 ` [PATCH v2 08/18] x86/pv: remove stashing of GDT/LDT L1 page-tables Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 09/18] x86/mm: simplify create_perdomain_mapping() interface Roger Pau Monne
2025-01-09 11:01 ` Alejandro Vallejo
2025-01-10 14:45 ` Roger Pau Monné
2025-01-08 14:26 ` [PATCH v2 10/18] x86/mm: switch {create,destroy}_perdomain_mapping() domain parameter to vCPU Roger Pau Monne
2025-01-14 16:27 ` Jan Beulich
2025-01-08 14:26 ` [PATCH v2 11/18] x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 12/18] x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 13/18] x86/spec-ctrl: introduce Address Space Isolation command line option Roger Pau Monne
2025-01-09 14:58 ` Alejandro Vallejo
2025-01-10 14:55 ` Roger Pau Monné
2025-01-10 15:51 ` Alejandro Vallejo
2025-01-08 14:26 ` [PATCH v2 14/18] x86/mm: introduce per-vCPU L3 page-table Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 15/18] x86/mm: introduce a per-vCPU mapcache when using ASI Roger Pau Monne
2025-01-09 15:08 ` Alejandro Vallejo
2025-01-10 15:02 ` Roger Pau Monné
2025-01-10 16:12 ` Alejandro Vallejo
2025-01-10 16:19 ` Alejandro Vallejo
2025-01-10 18:43 ` Roger Pau Monné
2025-01-08 14:26 ` [PATCH v2 16/18] x86/pv: allow using a unique per-pCPU root page table (L4) Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 17/18] x86/mm: switch to a per-CPU mapped stack when using ASI Roger Pau Monne
2025-01-08 14:26 ` [PATCH v2 18/18] x86/mm: zero stack on context switch Roger Pau Monne
2025-01-14 16:20 ` [PATCH v2 00/18] x86: adventures in Address Space Isolation Jan Beulich
2025-01-17 14:45 ` Roger Pau Monné
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D6YIGKABLDGN.J7DROWZ0H7BK@cloud.com \
--to=alejandro.vallejo@cloud.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.