All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alejandro Vallejo" <alejandro.vallejo@cloud.com>
To: "Roger Pau Monne" <roger.pau@citrix.com>,
	<xen-devel@lists.xenproject.org>
Cc: "Jan Beulich" <jbeulich@suse.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode
Date: Fri, 16 Aug 2024 19:02:54 +0100	[thread overview]
Message-ID: <D3HJ80ZGO0MR.2JCGJIV5JPYQP@cloud.com> (raw)
In-Reply-To: <20240726152206.28411-14-roger.pau@citrix.com>

On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote:
> Instead of allocating a monitor table for each vCPU when running in HVM HAP
> mode, use a per-pCPU monitor table, which gets the per-domain slot updated on
> guest context switch.
>
> This limits the amount of memory used for HVM HAP monitor tables to the amount
> of active pCPUs, rather than to the number of vCPUs.  It also simplifies vCPU
> allocation and teardown, since the monitor table handling is removed from
> there.
>
> Note the switch to using a per-CPU monitor table is done regardless of whether

s/per-CPU/per-pCPU/

> Address Space Isolation is enabled or not.  Partly for the memory usage
> reduction, and also because it allows to simplify the VM tear down path by not
> having to cleanup the per-vCPU monitor tables.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Note the monitor table is not made static because uses outside of the file
> where it's defined will be added by further patches.
> ---
>  xen/arch/x86/hvm/hvm.c             | 60 ++++++++++++++++++++++++
>  xen/arch/x86/hvm/svm/svm.c         |  5 ++
>  xen/arch/x86/hvm/vmx/vmcs.c        |  1 +
>  xen/arch/x86/hvm/vmx/vmx.c         |  4 ++
>  xen/arch/x86/include/asm/hap.h     |  1 -
>  xen/arch/x86/include/asm/hvm/hvm.h |  8 ++++
>  xen/arch/x86/mm.c                  |  8 ++++
>  xen/arch/x86/mm/hap/hap.c          | 75 ------------------------------
>  xen/arch/x86/mm/paging.c           |  4 +-
>  9 files changed, 87 insertions(+), 79 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 7f4b627b1f5f..3f771bc65677 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -104,6 +104,54 @@ static const char __initconst warning_hvm_fep[] =
>  static bool __initdata opt_altp2m_enabled;
>  boolean_param("altp2m", opt_altp2m_enabled);
>  
> +DEFINE_PER_CPU(root_pgentry_t *, monitor_pgt);
> +
> +static int allocate_cpu_monitor_table(unsigned int cpu)

To avoid ambiguity, could we call these *_pcpu_*() instead?

> +{
> +    root_pgentry_t *pgt = alloc_xenheap_page();
> +
> +    if ( !pgt )
> +        return -ENOMEM;
> +
> +    clear_page(pgt);
> +
> +    init_xen_l4_slots(pgt, _mfn(virt_to_mfn(pgt)), INVALID_MFN, NULL,
> +                      false, true, false);
> +
> +    ASSERT(!per_cpu(monitor_pgt, cpu));
> +    per_cpu(monitor_pgt, cpu) = pgt;
> +
> +    return 0;
> +}
> +
> +static void free_cpu_monitor_table(unsigned int cpu)
> +{
> +    root_pgentry_t *pgt = per_cpu(monitor_pgt, cpu);
> +
> +    if ( !pgt )
> +        return;
> +
> +    per_cpu(monitor_pgt, cpu) = NULL;
> +    free_xenheap_page(pgt);
> +}
> +
> +void hvm_set_cpu_monitor_table(struct vcpu *v)
> +{
> +    root_pgentry_t *pgt = this_cpu(monitor_pgt);
> +
> +    ASSERT(pgt);
> +
> +    setup_perdomain_slot(v, pgt);

Why not modify them as part of write_ptbase() instead? As it stands, it appears
to be modifying the PTEs of what may very well be our current PT, which makes
the perdomain slot be in a $DEITY-knows-what state until the next flush
(presumably the write to cr3 in write_ptbase()?; assuming no PCIDs).

Setting the slot up right before the cr3 change should reduce the potential for
misuse.

> +
> +    make_cr3(v, _mfn(virt_to_mfn(pgt)));
> +}
> +
> +void hvm_clear_cpu_monitor_table(struct vcpu *v)
> +{
> +    /* Poison %cr3, it will be updated when the vCPU is scheduled. */
> +    make_cr3(v, INVALID_MFN);

I think this would benefit from more exposition in the comment. If I'm getting
this right, after descheduling this vCPU we can't assume it'll be rescheduled
on the same pCPU, and if it's not it'll end up using a different monitor table.
This poison value is meant to highlight forgetting to set cr3 in the
"ctxt_switch_to()" path. 

All of that can be deduced from what you wrote and sufficient headscratching
but seeing how this is invoked from the context switch path it's not incredibly
clear wether you meant the perdomain slot would be updated by the next vCPU or
what I stated in the previous paragraph.

Assuming it is as I mentioned, maybe hvm_forget_cpu_monitor_table() would
convey what it does better? i.e: the vCPU forgets/unbinds the monitor table
from its internal state.

Cheers,
Alejandro


  reply	other threads:[~2024-08-16 18:03 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-26 15:21 [PATCH 00/22] x86: adventures in Address Space Isolation Roger Pau Monne
2024-07-26 15:21 ` [PATCH 01/22] x86/mm: drop l{1,2,3,4}e_write_atomic() Roger Pau Monne
2024-07-29  7:52   ` Jan Beulich
2024-07-29 12:53     ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 02/22] x86/mm: rename l{1,2,3,4}e_read_atomic() Roger Pau Monne
2024-07-29  7:53   ` Jan Beulich
2024-07-26 15:21 ` [PATCH 03/22] x86/dom0: only disable SMAP for the PV dom0 build Roger Pau Monne
2024-07-29  8:17   ` Roger Pau Monné
2024-07-29 11:53   ` Jan Beulich
2024-07-29 15:52     ` Andrew Cooper
2024-07-29 16:18       ` Roger Pau Monné
2024-07-29 17:51         ` Andrew Cooper
2024-07-30 10:55           ` Roger Pau Monné
2024-07-30 11:06             ` Andrew Cooper
2024-07-30 13:03               ` Roger Pau Monné
2024-07-29 15:59   ` Andrew Cooper
2024-07-29 16:32     ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot Roger Pau Monne
2024-08-13 15:54   ` Jan Beulich
2024-09-10  8:54     ` Roger Pau Monné
2024-09-10  9:00       ` Jan Beulich
2024-09-10  9:32         ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 05/22] x86/mm: make virt_to_xen_l1e() static Roger Pau Monne
2024-07-30 13:12   ` Andrew Cooper
2024-07-26 15:21 ` [PATCH 06/22] x86/mm: introduce a local domain variable to write_ptbase() Roger Pau Monne
2024-07-30 13:19   ` Andrew Cooper
2024-07-26 15:21 ` [PATCH 07/22] x86/spec-ctrl: initialize per-domain XPTI in spec_ctrl_init_domain() Roger Pau Monne
2024-08-14  9:47   ` Jan Beulich
2024-07-26 15:21 ` [PATCH 08/22] x86/mm: avoid passing a domain parameter to L4 init function Roger Pau Monne
2024-07-29 13:36   ` Alejandro Vallejo
2024-07-29 13:43     ` Jan Beulich
2024-07-29 14:18     ` Roger Pau Monné
2024-08-14 10:24   ` Jan Beulich
2024-07-26 15:21 ` [PATCH 09/22] x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI Roger Pau Monne
2024-07-26 15:21 ` [PATCH 10/22] x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush Roger Pau Monne
2024-07-26 15:21 ` [PATCH 11/22] x86/mm: split setup of the per-domain slot on context switch Roger Pau Monne
2024-07-26 15:21 ` [PATCH 12/22] x86/spec-ctrl: introduce Address Space Isolation command line option Roger Pau Monne
2024-08-14 10:10   ` Jan Beulich
2024-09-25 13:31     ` Roger Pau Monné
2024-09-25 14:03       ` Jan Beulich
2024-09-25 15:27         ` Roger Pau Monné
2024-09-25 15:47           ` Jan Beulich
2024-07-26 15:21 ` [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode Roger Pau Monne
2024-08-16 18:02   ` Alejandro Vallejo [this message]
2024-08-19  8:29     ` Jan Beulich
2024-08-19 18:22     ` Alejandro Vallejo
2024-09-25 16:19     ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 14/22] x86/hvm: use a per-pCPU monitor table in shadow mode Roger Pau Monne
2024-07-26 15:21 ` [PATCH 15/22] x86/idle: allow using a per-pCPU L4 Roger Pau Monne
2024-08-21 16:42   ` Alejandro Vallejo
2024-09-27  9:29     ` Roger Pau Monné
2024-07-26 15:22 ` [PATCH 16/22] x86/mm: introduce a per-CPU L3 table for the per-domain slot Roger Pau Monne
2024-08-16 18:40   ` Alejandro Vallejo
2024-09-27  9:46     ` Roger Pau Monné
2024-07-26 15:22 ` [PATCH 17/22] x86/mm: introduce support to populate a per-CPU page-table region Roger Pau Monne
2024-07-26 15:22 ` [PATCH 18/22] x86/mm: allow modifying per-CPU entries of remote page-tables Roger Pau Monne
2024-07-26 15:22 ` [PATCH 19/22] x86/mm: introduce a per-CPU fixmap area Roger Pau Monne
2024-07-26 15:22 ` [PATCH 20/22] x86/pv: allow using a unique per-pCPU root page table (L4) Roger Pau Monne
2024-07-26 15:22 ` [PATCH 21/22] x86/mm: switch to a per-CPU mapped stack when using ASI Roger Pau Monne
2024-07-26 15:22 ` [PATCH 22/22] x86/mm: zero stack on stack switch or reset Roger Pau Monne
2024-07-29 15:40   ` Andrew Cooper
2024-07-30 10:49     ` Roger Pau Monné
2024-08-13 13:16   ` Jan Beulich
2024-09-27 10:22     ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D3HJ80ZGO0MR.2JCGJIV5JPYQP@cloud.com \
    --to=alejandro.vallejo@cloud.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.