All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: xen-devel@lists.xenproject.org, alejandro.vallejo@cloud.com,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH 22/22] x86/mm: zero stack on stack switch or reset
Date: Tue, 30 Jul 2024 12:49:20 +0200	[thread overview]
Message-ID: <ZqjFB3U-7l8Nop_u@macbook> (raw)
In-Reply-To: <5e600017-e929-4ebf-b620-1e673b06fc1a@citrix.com>

On Mon, Jul 29, 2024 at 04:40:24PM +0100, Andrew Cooper wrote:
> On 26/07/2024 4:22 pm, Roger Pau Monne wrote:
> > With the stack mapped on a per-CPU basis there's no risk of other CPUs being
> > able to read the stack contents, but vCPUs running on the current pCPU could
> > read stack rubble from operations of previous vCPUs.
> >
> > The #DF stack is not zeroed because handling of #DF results in a panic.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >  xen/arch/x86/include/asm/current.h | 30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/include/asm/current.h b/xen/arch/x86/include/asm/current.h
> > index 75b9a341f814..02b4118b03ef 100644
> > --- a/xen/arch/x86/include/asm/current.h
> > +++ b/xen/arch/x86/include/asm/current.h
> > @@ -177,6 +177,14 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
> >  # define SHADOW_STACK_WORK ""
> >  #endif
> >  
> > +#define ZERO_STACK                                              \
> > +    "test %[stk_size], %[stk_size];"                            \
> > +    "jz .L_skip_zeroing.%=;"                                    \
> > +    "std;"                                                      \
> > +    "rep stosb;"                                                \
> > +    "cld;"                                                      \
> > +    ".L_skip_zeroing.%=:"
> > +
> >  #if __GNUC__ >= 9
> >  # define ssaj_has_attr_noreturn(fn) __builtin_has_attribute(fn, __noreturn__)
> >  #else
> > @@ -187,10 +195,24 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
> >  #define switch_stack_and_jump(fn, instr, constr)                        \
> >      ({                                                                  \
> >          unsigned int tmp;                                               \
> > +        bool zero_stack = current->domain->arch.asi;                    \
> >          BUILD_BUG_ON(!ssaj_has_attr_noreturn(fn));                      \
> > +        ASSERT(IS_ALIGNED((unsigned long)guest_cpu_user_regs() -        \
> > +                          PRIMARY_STACK_SIZE +                          \
> > +                          sizeof(struct cpu_info), PAGE_SIZE));         \
> > +        if ( zero_stack )                                               \
> > +        {                                                               \
> > +            unsigned long stack_top = get_stack_bottom() &              \
> > +                                      ~(STACK_SIZE - 1);                \
> > +                                                                        \
> > +            clear_page((void *)stack_top + IST_MCE * PAGE_SIZE);        \
> > +            clear_page((void *)stack_top + IST_NMI * PAGE_SIZE);        \
> > +            clear_page((void *)stack_top + IST_DB  * PAGE_SIZE);        \
> > +        }                                                               \
> >          __asm__ __volatile__ (                                          \
> >              SHADOW_STACK_WORK                                           \
> >              "mov %[stk], %%rsp;"                                        \
> > +            ZERO_STACK                                                  \
> >              CHECK_FOR_LIVEPATCH_WORK                                    \
> >              instr "[fun]"                                               \
> >              : [val] "=&r" (tmp),                                        \
> > @@ -201,7 +223,13 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
> >                ((PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8),               \
> >                [stack_mask] "i" (STACK_SIZE - 1),                        \
> >                _ASM_BUGFRAME_INFO(BUGFRAME_bug, __LINE__,                \
> > -                                 __FILE__, NULL)                        \
> > +                                 __FILE__, NULL),                       \
> > +              /* For stack zeroing. */                                  \
> > +              "D" ((void *)guest_cpu_user_regs() - 1),                  \
> > +              [stk_size] "c"                                            \
> > +              (zero_stack ? PRIMARY_STACK_SIZE - sizeof(struct cpu_info)\
> > +                          : 0),                                         \
> > +              "a" (0)                                                   \
> >              : "memory" );                                               \
> >          unreachable();                                                  \
> >      })
> 
> This looks very expensive.
> 
> For starters, switch_stack_and_jump() is used twice in a typical context
> switch; once in the schedule tail, and again out of hvm_do_resume().

Right, it's the reset_stack_and_call_ind() at the end of context
switch and then the reset_stack_and_jump() in the HVM tail context
switch handlers.

One option would be to only do the stack zeroing from the
reset_stack_and_call_ind() call in context_switch().

I've got no idea how expensive this is, I might try to run some
benchmarks to get some figures.  I was planning on running two VMs
with 1 vCPU each, both pinned to the same pCPU.

> 
> Furthermore, #MC happen never (to many many significant figures), #DB
> happens never for HVM guests (but does happen for PV), and NMIs are
> either ~never, or 2Hz which is far less often than the 30ms default
> timeslice.
> 
> So, the overwhelming majority of the time, those 3 calls to clear_page()
> will be re-zeroing blocks of zeroes.
> 
> This can probably be avoided by making use of ist_exit (held in %r12) to
> only zero an IST stack when leaving it.  This leaves the IRET frame able
> to be recovered, but with e.g. RFDS, you can do that irrespective, and
> it's not terribly sensitive.

I could look into that, TBH I was bordeline with clearing the IST
stacks, as I wasn't convinced there could be anything sensitive there,
but again couldn't convince myself there's nothing sensitive now,
nor can be in the future.

> What about shadow stacks?  You're not zeroing those, and while they're
> less sensitive than the data stack, there ought to be some reasoning
> about them.

I've assumed that shadow stacks only contained the expected return
addresses, and hence won't be considered sensitive information, but
maybe I was too lax.

An attacker could get execution traces of the previous vCPU, and that
might be useful for some exploits?

Thanks, Roger.


  reply	other threads:[~2024-07-30 10:49 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-26 15:21 [PATCH 00/22] x86: adventures in Address Space Isolation Roger Pau Monne
2024-07-26 15:21 ` [PATCH 01/22] x86/mm: drop l{1,2,3,4}e_write_atomic() Roger Pau Monne
2024-07-29  7:52   ` Jan Beulich
2024-07-29 12:53     ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 02/22] x86/mm: rename l{1,2,3,4}e_read_atomic() Roger Pau Monne
2024-07-29  7:53   ` Jan Beulich
2024-07-26 15:21 ` [PATCH 03/22] x86/dom0: only disable SMAP for the PV dom0 build Roger Pau Monne
2024-07-29  8:17   ` Roger Pau Monné
2024-07-29 11:53   ` Jan Beulich
2024-07-29 15:52     ` Andrew Cooper
2024-07-29 16:18       ` Roger Pau Monné
2024-07-29 17:51         ` Andrew Cooper
2024-07-30 10:55           ` Roger Pau Monné
2024-07-30 11:06             ` Andrew Cooper
2024-07-30 13:03               ` Roger Pau Monné
2024-07-29 15:59   ` Andrew Cooper
2024-07-29 16:32     ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot Roger Pau Monne
2024-08-13 15:54   ` Jan Beulich
2024-09-10  8:54     ` Roger Pau Monné
2024-09-10  9:00       ` Jan Beulich
2024-09-10  9:32         ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 05/22] x86/mm: make virt_to_xen_l1e() static Roger Pau Monne
2024-07-30 13:12   ` Andrew Cooper
2024-07-26 15:21 ` [PATCH 06/22] x86/mm: introduce a local domain variable to write_ptbase() Roger Pau Monne
2024-07-30 13:19   ` Andrew Cooper
2024-07-26 15:21 ` [PATCH 07/22] x86/spec-ctrl: initialize per-domain XPTI in spec_ctrl_init_domain() Roger Pau Monne
2024-08-14  9:47   ` Jan Beulich
2024-07-26 15:21 ` [PATCH 08/22] x86/mm: avoid passing a domain parameter to L4 init function Roger Pau Monne
2024-07-29 13:36   ` Alejandro Vallejo
2024-07-29 13:43     ` Jan Beulich
2024-07-29 14:18     ` Roger Pau Monné
2024-08-14 10:24   ` Jan Beulich
2024-07-26 15:21 ` [PATCH 09/22] x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI Roger Pau Monne
2024-07-26 15:21 ` [PATCH 10/22] x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush Roger Pau Monne
2024-07-26 15:21 ` [PATCH 11/22] x86/mm: split setup of the per-domain slot on context switch Roger Pau Monne
2024-07-26 15:21 ` [PATCH 12/22] x86/spec-ctrl: introduce Address Space Isolation command line option Roger Pau Monne
2024-08-14 10:10   ` Jan Beulich
2024-09-25 13:31     ` Roger Pau Monné
2024-09-25 14:03       ` Jan Beulich
2024-09-25 15:27         ` Roger Pau Monné
2024-09-25 15:47           ` Jan Beulich
2024-07-26 15:21 ` [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode Roger Pau Monne
2024-08-16 18:02   ` Alejandro Vallejo
2024-08-19  8:29     ` Jan Beulich
2024-08-19 18:22     ` Alejandro Vallejo
2024-09-25 16:19     ` Roger Pau Monné
2024-07-26 15:21 ` [PATCH 14/22] x86/hvm: use a per-pCPU monitor table in shadow mode Roger Pau Monne
2024-07-26 15:21 ` [PATCH 15/22] x86/idle: allow using a per-pCPU L4 Roger Pau Monne
2024-08-21 16:42   ` Alejandro Vallejo
2024-09-27  9:29     ` Roger Pau Monné
2024-07-26 15:22 ` [PATCH 16/22] x86/mm: introduce a per-CPU L3 table for the per-domain slot Roger Pau Monne
2024-08-16 18:40   ` Alejandro Vallejo
2024-09-27  9:46     ` Roger Pau Monné
2024-07-26 15:22 ` [PATCH 17/22] x86/mm: introduce support to populate a per-CPU page-table region Roger Pau Monne
2024-07-26 15:22 ` [PATCH 18/22] x86/mm: allow modifying per-CPU entries of remote page-tables Roger Pau Monne
2024-07-26 15:22 ` [PATCH 19/22] x86/mm: introduce a per-CPU fixmap area Roger Pau Monne
2024-07-26 15:22 ` [PATCH 20/22] x86/pv: allow using a unique per-pCPU root page table (L4) Roger Pau Monne
2024-07-26 15:22 ` [PATCH 21/22] x86/mm: switch to a per-CPU mapped stack when using ASI Roger Pau Monne
2024-07-26 15:22 ` [PATCH 22/22] x86/mm: zero stack on stack switch or reset Roger Pau Monne
2024-07-29 15:40   ` Andrew Cooper
2024-07-30 10:49     ` Roger Pau Monné [this message]
2024-08-13 13:16   ` Jan Beulich
2024-09-27 10:22     ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZqjFB3U-7l8Nop_u@macbook \
    --to=roger.pau@citrix.com \
    --cc=alejandro.vallejo@cloud.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.