Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
       [not found] <20240116234901.3238852-1-avagin@google.com>
@ 2024-01-17 19:34 ` Dave Hansen
  2024-01-17 22:30   ` Andrei Vagin
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2024-01-17 19:34 UTC (permalink / raw)
  To: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML
  Cc: x86, H. Peter Anvin

... adding LKML.  More context here:

https://lore.kernel.org/all/20240116234901.3238852-1-avagin@google.com/

On 1/16/24 15:49, Andrei Vagin wrote:
> +	/* xstate_size has to fit all requested components. */
> +	if (fx_sw->xstate_size != fpstate->user_size) {
> +		int min_xstate_size =
> +			xstate_calculate_size(fx_sw->xfeatures, false);
> +
> +		if (min_xstate_size < 0 ||
> +		    fx_sw->xstate_size < min_xstate_size ||
> +		    fx_sw->xstate_size > fpstate->user_size)
> +			goto setfx;
> +	}

The bug here is that the buffer from userspace is garbage and the (XSAVE
XSTATE_BV) metadata doesn't match the size of the buffer.  Right?

This proposed fix just checks another piece of user-supplied metadata
instead: fx_sw->xstate_size.

Can't userspace just provide more bad data there and end up with the
same problem?

Seems like the real problem here is that the fault_in_readable() doesn't
match the XRSTOR.  It's going to continue to be a problem as long as we
don't know what memory XRSTOR tried to access.  We can try all day long
to precalculate what XRSTOR _will_ do, but that seems a bit silly
because the CPU knows where the fault happened.  It told us in CR2 and
all we have to do is plumb that back to fault_in_readable().

It would take a little XSTATE_OP() munging to pass something back other
than 'err', but that doesn't seem insurmountable.

Anybody have better ideas?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-17 19:34 ` [PATCH] x86/fpu: verify xstate buffer size according with requested features Dave Hansen
@ 2024-01-17 22:30   ` Andrei Vagin
  2024-01-17 23:52     ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Andrei Vagin @ 2024-01-17 22:30 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On Wed, Jan 17, 2024 at 11:34 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> .. adding LKML.  More context here:
>
> https://lore.kernel.org/all/20240116234901.3238852-1-avagin@google.com/
>
> On 1/16/24 15:49, Andrei Vagin wrote:
> > +     /* xstate_size has to fit all requested components. */
> > +     if (fx_sw->xstate_size != fpstate->user_size) {
> > +             int min_xstate_size =
> > +                     xstate_calculate_size(fx_sw->xfeatures, false);
> > +
> > +             if (min_xstate_size < 0 ||
> > +                 fx_sw->xstate_size < min_xstate_size ||
> > +                 fx_sw->xstate_size > fpstate->user_size)
> > +                     goto setfx;
> > +     }
>
> The bug here is that the buffer from userspace is garbage and the (XSAVE
> XSTATE_BV) metadata doesn't match the size of the buffer.  Right?

right

>
> This proposed fix just checks another piece of user-supplied metadata
> instead: fx_sw->xstate_size.
>
> Can't userspace just provide more bad data there and end up with the
> same problem?

It can't... I would not post this change if I thought otherwise...

>
> Seems like the real problem here is that the fault_in_readable() doesn't
> match the XRSTOR.  It's going to continue to be a problem as long as we
> don't know what memory XRSTOR tried to access.  We can try all day long
> to precalculate what XRSTOR _will_ do, but that seems a bit silly

I don't understand this part. The behavior of XRSTOR is well-defined
by CPU specifications, allowing us to easily precalculate the memory it
will attempt to access. What does it mean "we don't know what memory
XRSTOR tried to access"?

xrstor restores only features that are set in fx_sw->xfeatures.

> because the CPU knows where the fault happened.  It told us in CR2 and
> all we have to do is plumb that back to fault_in_readable().

I considered this option as well, but then I decided that this approach
is better. The most important aspect is that it always rejects bad
buffers, allowing a user space to detect an issue even when a fault
isn't triggered. I believe proper handling of xrstor page faults could
be a valuable additional improvement to this change. If we detect a
fault outside of a provided buffer, we can print a warning to signal
that check_xstate_in_sigframe is incomplete.

>
> It would take a little XSTATE_OP() munging to pass something back other
> than 'err', but that doesn't seem insurmountable.
>
> Anybody have better ideas?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-17 22:30   ` Andrei Vagin
@ 2024-01-17 23:52     ` Dave Hansen
  2024-01-18  7:59       ` Andrei Vagin
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2024-01-17 23:52 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On 1/17/24 14:30, Andrei Vagin wrote:
> On Wed, Jan 17, 2024 at 11:34 AM Dave Hansen <dave.hansen@intel.com> wrote:
>> Seems like the real problem here is that the fault_in_readable() doesn't
>> match the XRSTOR.  It's going to continue to be a problem as long as we
>> don't know what memory XRSTOR tried to access.  We can try all day long
>> to precalculate what XRSTOR _will_ do, but that seems a bit silly
> 
> I don't understand this part. The behavior of XRSTOR is well-defined
> by CPU specifications, allowing us to easily precalculate the memory it
> will attempt to access. What does it mean "we don't know what memory
> XRSTOR tried to access"?
> 
> xrstor restores only features that are set in fx_sw->xfeatures.

XSAVE is a big old pile of fun.  Someone is confused here and that
someone might be me.  Let me walk through my logic a bit.  Maybe you can
point out where I've gone wrong. :)

Take a look at the "Legacy Region of an XSAVE Area" in the SDM.

	The x87 state component comprises bytes 23:0 and bytes 159:32.
	The SSE state component comprises bytes 31:24 and bytes 415:160.
	The XSAVE feature set does not use bytes 511:416; bytes 463:416
	are reserved.

and the next section:

	The XSAVE header of an XSAVE area comprises the 64 bytes
	starting at offset 512 from the area’s base address:
	...
	Bytes 7:0 of the XSAVE header is a state-component bitmap (see
	Section 13.1) called XSTATE_BV.

XRSTOR accesses memory based on XSTATE_BV (and XCOMP_BV which should be
irrelevant here).  So we're looking for something at 512 bytes up in the
XSAVE buffer.

Let's have gdb help us out a bit.  Where is sw_reserved?

(gdb) print/d &((struct fxregs_state *)0)->sw_reserved
$4 = 464

Where does XRSTOR itself look?

(gdb) print/d &((union fpregs_state *)0)->xsave->header.xfeatures
$5 = 512

(xfeatures aka. XSTATE_BV)

There _was_ a reason once upon a time that the kernel started sticking a
copy of XSTATE_BV in 'sw_reserved'.  I just forget what it is at the
moment.  It's horribly confusing to it laid out like this, but the SDM
is pretty clear: "the XSAVE feature set does not use bytes 511:416".

"fx_sw" is actually a software-defined and software-only-consumed area
of the XSAVE buffer, thus the '_sw'.  Nothing in the '_sw' section tells
us how the hardware will behave.

>> because the CPU knows where the fault happened.  It told us in CR2 and
>> all we have to do is plumb that back to fault_in_readable().
> 
> I considered this option as well, but then I decided that this approach
> is better. The most important aspect is that it always rejects bad
> buffers, allowing a user space to detect an issue even when a fault
> isn't triggered. I believe proper handling of xrstor page faults could
> be a valuable additional improvement to this change. If we detect a
> fault outside of a provided buffer, we can print a warning to signal
> that check_xstate_in_sigframe is incomplete.

I'm not really following the logic there.  What's the downside of taking
the fault?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-17 23:52     ` Dave Hansen
@ 2024-01-18  7:59       ` Andrei Vagin
  2024-01-18 18:27         ` Dave Hansen
  2024-01-18 19:07         ` Thomas Gleixner
  0 siblings, 2 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-01-18  7:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On Wed, Jan 17, 2024 at 3:52 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/17/24 14:30, Andrei Vagin wrote:
> > On Wed, Jan 17, 2024 at 11:34 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >> Seems like the real problem here is that the fault_in_readable() doesn't
> >> match the XRSTOR.  It's going to continue to be a problem as long as we
> >> don't know what memory XRSTOR tried to access.  We can try all day long
> >> to precalculate what XRSTOR _will_ do, but that seems a bit silly
> >
> > I don't understand this part. The behavior of XRSTOR is well-defined
> > by CPU specifications, allowing us to easily precalculate the memory it
> > will attempt to access. What does it mean "we don't know what memory
> > XRSTOR tried to access"?
> >
> > xrstor restores only features that are set in fx_sw->xfeatures.
>
> XSAVE is a big old pile of fun.  Someone is confused here and that
> someone might be me.  Let me walk through my logic a bit.  Maybe you can
> point out where I've gone wrong. :)

I could be wrong too, so thank you for looking at this and helping to
find the right solution.

>
> Take a look at the "Legacy Region of an XSAVE Area" in the SDM.
>
>         The x87 state component comprises bytes 23:0 and bytes 159:32.
>         The SSE state component comprises bytes 31:24 and bytes 415:160.
>         The XSAVE feature set does not use bytes 511:416; bytes 463:416
>         are reserved.
>
> and the next section:
>
>         The XSAVE header of an XSAVE area comprises the 64 bytes
>         starting at offset 512 from the area’s base address:
>         ...
>         Bytes 7:0 of the XSAVE header is a state-component bitmap (see
>         Section 13.1) called XSTATE_BV.
>
> XRSTOR accesses memory based on XSTATE_BV (and XCOMP_BV which should be
> irrelevant here).  So we're looking for something at 512 bytes up in the
> XSAVE buffer.
>
> Let's have gdb help us out a bit.  Where is sw_reserved?
>
> (gdb) print/d &((struct fxregs_state *)0)->sw_reserved
> $4 = 464
>
> Where does XRSTOR itself look?
>
> (gdb) print/d &((union fpregs_state *)0)->xsave->header.xfeatures
> $5 = 512
>
> (xfeatures aka. XSTATE_BV)
>
> There _was_ a reason once upon a time that the kernel started sticking a
> copy of XSTATE_BV in 'sw_reserved'.  I just forget what it is at the
> moment.  It's horribly confusing to it laid out like this, but the SDM
> is pretty clear: "the XSAVE feature set does not use bytes 511:416".
>
> "fx_sw" is actually a software-defined and software-only-consumed area
> of the XSAVE buffer, thus the '_sw'.  Nothing in the '_sw' section tells
> us how the hardware will behave.

I think you don't take into account the requested-feature bitmap (RFBM),
which is the logical-and of edx:eax and XCR0. In our case, edx:eax
is set to the value of fx_sw->xfeatures.

>
> >> because the CPU knows where the fault happened.  It told us in CR2 and
> >> all we have to do is plumb that back to fault_in_readable().
> >
> > I considered this option as well, but then I decided that this approach
> > is better. The most important aspect is that it always rejects bad
> > buffers, allowing a user space to detect an issue even when a fault
> > isn't triggered. I believe proper handling of xrstor page faults could
> > be a valuable additional improvement to this change. If we detect a
> > fault outside of a provided buffer, we can print a warning to signal
> > that check_xstate_in_sigframe is incomplete.
>
> I'm not really following the logic there.  What's the downside of taking
> the fault?

Let's consider a scenario where someone messed up with an fpu state on a
signal frame. With my approach, a mistake can be promptly detected.
However, if we incorporate the page fault handling of xrstor, a mistake
will only be identified if xrstor triggers a fault. In cases where a
buffer is allocated in a large memory mapping, xrstor may silently read
memory beyond the buffer. Next time, a page beyond a buffer might be
swapped out, xrstore triggers a fault leading to application crashes.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18  7:59       ` Andrei Vagin
@ 2024-01-18 18:27         ` Dave Hansen
  2024-01-18 19:54           ` Thomas Gleixner
  2024-01-18 19:07         ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2024-01-18 18:27 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On 1/17/24 23:59, Andrei Vagin wrote:
> On Wed, Jan 17, 2024 at 3:52 PM Dave Hansen <dave.hansen@intel.com> wrote:
...
>> "fx_sw" is actually a software-defined and software-only-consumed area
>> of the XSAVE buffer, thus the '_sw'.  Nothing in the '_sw' section tells
>> us how the hardware will behave.
> 
> I think you don't take into account the requested-feature bitmap (RFBM),
> which is the logical-and of edx:eax and XCR0. In our case, edx:eax
> is set to the value of fx_sw->xfeatures.

Ahh, I did miss that indeed.

Let's step back and look at what's in play:

1. fx_sw->xstate_size, which is eventually passed to
   fault_in_readable()
2. fx_sw->xfeatures
3. XSTATE_BV (aka. fpu->...->header.xfeatures)
4. What XRSTOR actually does, which is #2 OR'd with #3.
5. xstate_calculate_size(fx_sw->xfeatures)

The bug that you've reported here is essentially that the size passed to
fault_in_readable() doesn't match what XRSTOR actually does (#4).   Note
that today, fault_in_readable() may end up faulting in _too_ much memory
if there's a bit clear in XSTATE_BV.

The proposed fix adds the #5 calculation.  It's conservative because
fx_sw->xfeatures is represents a superset of what XRSTOR will actually
restore.  But now, instead of just faulting in too much memory, a
too-small fx_sw->xstate_size will end up zapping all the XSAVE state.

This is all freakishly complicated and changes a bunch of behavior.
It's way too much to be done in a patch with a 5-line commit message.

I suspect this needs some real refactoring.  I really think fx_sw should
remain unmodified after being copied in from userspace.  If you want to
start interpreting 'fx_sw->xstate_size' as the fault_in_readable() size,
then that needs to be a separate logical variable.

>>>> because the CPU knows where the fault happened.  It told us in CR2 and
>>>> all we have to do is plumb that back to fault_in_readable().
>>>
>>> I considered this option as well, but then I decided that this approach
>>> is better. The most important aspect is that it always rejects bad
>>> buffers, allowing a user space to detect an issue even when a fault
>>> isn't triggered. I believe proper handling of xrstor page faults could
>>> be a valuable additional improvement to this change. If we detect a
>>> fault outside of a provided buffer, we can print a warning to signal
>>> that check_xstate_in_sigframe is incomplete.
>>
>> I'm not really following the logic there.  What's the downside of taking
>> the fault?
> 
> Let's consider a scenario where someone messed up with an fpu state on a
> signal frame. With my approach, a mistake can be promptly detected.
> However, if we incorporate the page fault handling of xrstor, a mistake
> will only be identified if xrstor triggers a fault. In cases where a
> buffer is allocated in a large memory mapping, xrstor may silently read
> memory beyond the buffer. Next time, a page beyond a buffer might be
> swapped out, xrstore triggers a fault leading to application crashes.

I think that's an orthogonal problem really.

Fault loops are nasty.  There's a reason that the architecture provides
CR2 instead of depending on software to, for instance, figure out why
and how every instruction faulted.  It's easy to look into the past and
as the CPU where the fault happened.

On the other hand, we have XRSTOR.  Sure, it's _possible_ to look into
the future and figure out what memory XRSTOR will touch.  But, we
apparently stink at looking into the future (thus this bug).  Not
because we're stupid, but simply because looking into the future is hard.

I'd much rather fix the fault loop problem by looking into the past than
predicting the future.  The fault handling _must_ be correct or we get
hangs.

If we have nice, reliable fault handling and then decide that we've got
XRSTOR's running amok reading random memory all over the place that need
a nicer error message, then we can add that code to predict the future.
If our "predict the future" code goes wrong, then we lose an error
message -- not a big deal.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18 18:27         ` Dave Hansen
@ 2024-01-18 19:54           ` Thomas Gleixner
  2024-01-18 22:02             ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2024-01-18 19:54 UTC (permalink / raw)
  To: Dave Hansen, Andrei Vagin
  Cc: Andrei Vagin, Ingo Molnar, Borislav Petkov, Dave Hansen, LKML,
	x86, H. Peter Anvin

On Thu, Jan 18 2024 at 10:27, Dave Hansen wrote:
> If we have nice, reliable fault handling and then decide that we've got
> XRSTOR's running amok reading random memory all over the place that need
> a nicer error message, then we can add that code to predict the future.
> If our "predict the future" code goes wrong, then we lose an error
> message -- not a big deal.

After staring more at it, it's arguable to pass fpstate->user_size to
fault_in_readable() and ignore fx_sw->xstate_size completely.

That's a guaranteed to be reliable size which prevents endless loops
because arguably that's the maximum size which can be touched by XRSTOR,
no?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18 19:54           ` Thomas Gleixner
@ 2024-01-18 22:02             ` Dave Hansen
  2024-01-18 22:11               ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2024-01-18 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Andrei Vagin
  Cc: Andrei Vagin, Ingo Molnar, Borislav Petkov, Dave Hansen, LKML,
	x86, H. Peter Anvin

On 1/18/24 11:54, Thomas Gleixner wrote:
> On Thu, Jan 18 2024 at 10:27, Dave Hansen wrote:
>> If we have nice, reliable fault handling and then decide that we've got
>> XRSTOR's running amok reading random memory all over the place that need
>> a nicer error message, then we can add that code to predict the future.
>> If our "predict the future" code goes wrong, then we lose an error
>> message -- not a big deal.
> After staring more at it, it's arguable to pass fpstate->user_size to
> fault_in_readable() and ignore fx_sw->xstate_size completely.
> 
> That's a guaranteed to be reliable size which prevents endless loops
> because arguably that's the maximum size which can be touched by XRSTOR,
> no?

I like it.  It takes fx_sw completely out of the picture, which was the
root of the problem in the first place.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18 22:02             ` Dave Hansen
@ 2024-01-18 22:11               ` Thomas Gleixner
  2024-01-22  3:58                 ` Andrei Vagin
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2024-01-18 22:11 UTC (permalink / raw)
  To: Dave Hansen, Andrei Vagin
  Cc: Andrei Vagin, Ingo Molnar, Borislav Petkov, Dave Hansen, LKML,
	x86, H. Peter Anvin

On Thu, Jan 18 2024 at 14:02, Dave Hansen wrote:
> On 1/18/24 11:54, Thomas Gleixner wrote:
>> On Thu, Jan 18 2024 at 10:27, Dave Hansen wrote:
>>> If we have nice, reliable fault handling and then decide that we've got
>>> XRSTOR's running amok reading random memory all over the place that need
>>> a nicer error message, then we can add that code to predict the future.
>>> If our "predict the future" code goes wrong, then we lose an error
>>> message -- not a big deal.
>> After staring more at it, it's arguable to pass fpstate->user_size to
>> fault_in_readable() and ignore fx_sw->xstate_size completely.
>> 
>> That's a guaranteed to be reliable size which prevents endless loops
>> because arguably that's the maximum size which can be touched by XRSTOR,
>> no?
>
> I like it.  It takes fx_sw completely out of the picture, which was the
> root of the problem in the first place.

Correct.

I really don't care about the esoteric case where this might
theoretically result in a unjustified application abort.

You really need to twist your brain around 6 corners and then squint
twice to construct that case. Of course syzcaller might trigger it, but
fuzzing the sigreturn frame is a #GP, #PF and whatever lottery anyway.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18 22:11               ` Thomas Gleixner
@ 2024-01-22  3:58                 ` Andrei Vagin
  2024-01-22 13:26                   ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Andrei Vagin @ 2024-01-22  3:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Andrei Vagin, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On Thu, Jan 18, 2024 at 2:11 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Jan 18 2024 at 14:02, Dave Hansen wrote:
> > On 1/18/24 11:54, Thomas Gleixner wrote:
> >> On Thu, Jan 18 2024 at 10:27, Dave Hansen wrote:
> >>> If we have nice, reliable fault handling and then decide that we've got
> >>> XRSTOR's running amok reading random memory all over the place that need
> >>> a nicer error message, then we can add that code to predict the future.
> >>> If our "predict the future" code goes wrong, then we lose an error
> >>> message -- not a big deal.
> >> After staring more at it, it's arguable to pass fpstate->user_size to
> >> fault_in_readable() and ignore fx_sw->xstate_size completely.
> >>
> >> That's a guaranteed to be reliable size which prevents endless loops
> >> because arguably that's the maximum size which can be touched by XRSTOR,
> >> no?

fpstate->user_size isn't constant.  It can be modified from the XFD #NM
handler. For example, it happens when a process invokes one of amx
instructions for the first time. It means we have to be able to restore
an fpu state from signal frames generated with a smaller
fpstate->user_size. Can it trigger any issues?

> >
> > I like it.  It takes fx_sw completely out of the picture, which was the
> > root of the problem in the first place.
>
> Correct.
>
> I really don't care about the esoteric case where this might
> theoretically result in a unjustified application abort.
>
> You really need to twist your brain around 6 corners and then squint
> twice to construct that case. Of course syzcaller might trigger it, but
> fuzzing the sigreturn frame is a #GP, #PF and whatever lottery anyway.

In my case, the bug was triggered by gVisor (it is like the user-mode
Linux).

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-22  3:58                 ` Andrei Vagin
@ 2024-01-22 13:26                   ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2024-01-22 13:26 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Dave Hansen, Andrei Vagin, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On Sun, Jan 21 2024 at 19:58, Andrei Vagin wrote:
> On Thu, Jan 18, 2024 at 2:11 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> After staring more at it, it's arguable to pass fpstate->user_size to
>> >> fault_in_readable() and ignore fx_sw->xstate_size completely.
>> >>
>> >> That's a guaranteed to be reliable size which prevents endless loops
>> >> because arguably that's the maximum size which can be touched by XRSTOR,
>> >> no?
>
> fpstate->user_size isn't constant.  It can be modified from the XFD #NM
> handler. For example, it happens when a process invokes one of amx
> instructions for the first time. It means we have to be able to restore
> an fpu state from signal frames generated with a smaller
> fpstate->user_size. Can it trigger any issues?

I know, but the #NM handler does not run in the signal restore
path. fpstate->user_size is guaranteed to be correct at that point.

The untested below should just work.

Thanks,

        tglx
---
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -274,8 +274,7 @@ static int __restore_fpregs_from_user(vo
  * Attempt to restore the FPU registers directly from user memory.
  * Pagefaults are handled and any errors returned are fatal.
  */
-static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
-				     bool fx_only, unsigned int size)
+static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
 {
 	struct fpu *fpu = &current->thread.fpu;
 	int ret;
@@ -309,7 +308,7 @@ static bool restore_fpregs_from_user(voi
 		if (ret != X86_TRAP_PF)
 			return false;
 
-		if (!fault_in_readable(buf, size))
+		if (!fault_in_readable(buf, fpu->fpstate->user_size))
 			goto retry;
 		return false;
 	}
@@ -339,7 +338,6 @@ static bool __fpu_restore_sig(void __use
 	struct user_i387_ia32_struct env;
 	bool success, fx_only = false;
 	union fpregs_state *fpregs;
-	unsigned int state_size;
 	u64 user_xfeatures = 0;
 
 	if (use_xsave()) {
@@ -349,17 +347,14 @@ static bool __fpu_restore_sig(void __use
 			return false;
 
 		fx_only = !fx_sw_user.magic1;
-		state_size = fx_sw_user.xstate_size;
 		user_xfeatures = fx_sw_user.xfeatures;
 	} else {
 		user_xfeatures = XFEATURE_MASK_FPSSE;
-		state_size = fpu->fpstate->user_size;
 	}
 
 	if (likely(!ia32_fxstate)) {
 		/* Restore the FPU registers directly from user memory. */
-		return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only,
-						state_size);
+		return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only);
 	}
 
 	/*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18  7:59       ` Andrei Vagin
  2024-01-18 18:27         ` Dave Hansen
@ 2024-01-18 19:07         ` Thomas Gleixner
  2024-01-22  6:43           ` Andrei Vagin
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2024-01-18 19:07 UTC (permalink / raw)
  To: Andrei Vagin, Dave Hansen
  Cc: Andrei Vagin, Ingo Molnar, Borislav Petkov, Dave Hansen, LKML,
	x86, H. Peter Anvin

On Wed, Jan 17 2024 at 23:59, Andrei Vagin wrote:
> On Wed, Jan 17, 2024 at 3:52 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> I'm not really following the logic there.  What's the downside of taking
>> the fault?
>
> Let's consider a scenario where someone messed up with an fpu state on a
> signal frame.

Then he can rightfully keep the pieces...

> With my approach, a mistake can be promptly detected.

How so? Everything which ends up at the 'setfx:' label will just
silently fall back to FX only and init all other components.

> However, if we incorporate the page fault handling of xrstor, a
> mistake will only be identified if xrstor triggers a fault. In cases
> where a buffer is allocated in a large memory mapping, xrstor may
> silently read memory beyond the buffer.

It's either failing the restore due to invalid data (#GP) or it will
restore garbage. User space asked for it.

> Next time, a page beyond a buffer might be swapped out, xrstore
> triggers a fault leading to application crashes.

If it's swapped out it will be swapped back in, no crash.

There are two ways for crashing:

      1) There is no mapping or a non-sufficient mapping
         i.e. fault_in_readable() fails.

      2) The data in the buffer is invalid.

Crashing the application in both cases is just fine.

The nasty part is that the expected size of the user space buffer is
taken from fx_sw->xstate_size.

So you can construct a sigreturn frame where

   1) fx_sw->xstate_size is smaller than the size required by
      the valid bits in fx_sw->xfeatures.

   2) user space unmapped parts of the stack so that not all
      of the buffer (as required by XRSTOR) is accessible.

Now XRSTOR tries to restore and accesses the unmapped part of the stack,
which results in a fault. But fault_in_readable() succeeds because 'buf
+ fx_sw->xstate_size' is within the still mapped stack.

So it goes back and tries XRSTOR again. Lather, rinse and repeat.

That's what Andrej is trying to prevent by calculating the size required
by the valid bits in fx_sw->xfeatures and validating that against
fx_sw->xstate_size.

That fx_sw construct is yet another horror from the past. It's not much
better than xsave itself.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features
  2024-01-18 19:07         ` Thomas Gleixner
@ 2024-01-22  6:43           ` Andrei Vagin
  0 siblings, 0 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-01-22  6:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Andrei Vagin, Ingo Molnar, Borislav Petkov,
	Dave Hansen, LKML, x86, H. Peter Anvin

On Thu, Jan 18, 2024 at 11:07 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, Jan 17 2024 at 23:59, Andrei Vagin wrote:
> > On Wed, Jan 17, 2024 at 3:52 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >> I'm not really following the logic there.  What's the downside of taking
> >> the fault?
> >
> > Let's consider a scenario where someone messed up with an fpu state on a
> > signal frame.
>
> Then he can rightfully keep the pieces...
>
> > With my approach, a mistake can be promptly detected.
>
> How so? Everything which ends up at the 'setfx:' label will just
> silently fall back to FX only and init all other components.

and it will trigger the x86_fpu_xstate_check_failed tracing event.

Usually, when I want to check that our code generates sigframe fpu states
properly, I enable this event type, run tests and check that no event
has been triggered.

Plus, we have tests that check fpu register values, so they will fail if
registers are changed in unexpected values.


Thanks,
Andrei

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-01-22 13:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240116234901.3238852-1-avagin@google.com>
2024-01-17 19:34 ` [PATCH] x86/fpu: verify xstate buffer size according with requested features Dave Hansen
2024-01-17 22:30   ` Andrei Vagin
2024-01-17 23:52     ` Dave Hansen
2024-01-18  7:59       ` Andrei Vagin
2024-01-18 18:27         ` Dave Hansen
2024-01-18 19:54           ` Thomas Gleixner
2024-01-18 22:02             ` Dave Hansen
2024-01-18 22:11               ` Thomas Gleixner
2024-01-22  3:58                 ` Andrei Vagin
2024-01-22 13:26                   ` Thomas Gleixner
2024-01-18 19:07         ` Thomas Gleixner
2024-01-22  6:43           ` Andrei Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox