public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ptrace vs FSGSBASE
@ 2016-04-29 18:22 Andy Lutomirski
  2016-05-02 12:40 ` Oleg Nesterov
  2016-05-02 14:27 ` Oleg Nesterov
  0 siblings, 2 replies; 6+ messages in thread
From: Andy Lutomirski @ 2016-04-29 18:22 UTC (permalink / raw)
  To: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen,
	Borislav Petkov, Oleg Nesterov, Brian Gerst

Suppose I'm a ptracer.  Wtf is supposed to happen when I write to
fs_base or gs_base?

Here are some schenarios:

1. I read fs_base using ptrace.  I think I should get the actual
fs_base without any nonsense.

2. I read all the regs (PEEKUSER or whatever) and then write then all
back verbatim.  At the very least, I think that if I do this
atomically using PTRACE_SETREGSET, the task's state needs to remain
unchanged.  Since ptrace doesn't seem to have any real concept of
atomic register state changes right now (although we could add such a
thing for GETREGSET), it would be convenient if writing the unchanged
state back in numerical order using POKEUSER should also leave it
unchanged.

3. I write fs_base on a non-FSGSBASE system.  I think it should have
the obvious effect of setting FSBASE to the value I wrote.

4. I write fs_base on an FSGSBASE system.  I think it should have the
obvious effect of setting FSBASE to the value I wrote.  It would be
nice if it behaved identically to #3 as well.

#3 that writing fs_base using ptrace, if it's set to a value that
doesn't match the base associated with the current selector, needs to
zero out fs.

#4 means that it should do the same on an fsgsbase system.

Due to the strange design of user_regs, fs_base and gs_base are
numerically below fs and gs, so #2 means that writing fs on an
FSGSBASE system shouldn't override a just-written fsbase value on an
FSGSBASE system.  But writing fs on a non-FSGSBASE system needs to
override the just-written base.

But wouldn't users expect that writing a value into fs would actually
change the base even on FSGSBASE systems?

The best thing I've come up with so far is to have POKEUSER to fs and
gs have different behaviorf on FSGSBASE systems, which IMO sucks.  On
the other hand, it wouldn't surprise me all that much if no one ever
does this in the first place.

See this, too:

https://lkml.org/lkml/2007/11/21/82

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace vs FSGSBASE
  2016-04-29 18:22 ptrace vs FSGSBASE Andy Lutomirski
@ 2016-05-02 12:40 ` Oleg Nesterov
  2016-05-02 14:27 ` Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-05-02 12:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen,
	Borislav Petkov, Brian Gerst



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace vs FSGSBASE
  2016-04-29 18:22 ptrace vs FSGSBASE Andy Lutomirski
  2016-05-02 12:40 ` Oleg Nesterov
@ 2016-05-02 14:27 ` Oleg Nesterov
  2016-05-02 15:38   ` Andy Lutomirski
  1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2016-05-02 14:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen,
	Borislav Petkov, Brian Gerst

Hi Andy,

let me first say that I never knew how this code (and the hardware)
actually works, I am not sure I even understand what ARCH_SET_.S
exactly does ;)

What is even worse, I do not understand your question. So it is not
that I am trying to help, I am asking you to help me understand the
problem.

On 04/29, Andy Lutomirski wrote:
>
> 1. I read fs_base using ptrace.  I think I should get the actual
> fs_base without any nonsense.

Which fs_base? The member of user_regs_struct? But this structure/layout
is just the ABI, so to me it seems correct that getreg() tries to look
at ->fs and/or ->fsindex.

IOW. getreg(fs) should return the same value as prctl(ARCH_GET_FS)
returns if called by the tracee, no?

> 2. I read all the regs (PEEKUSER or whatever) and then write then all
> back verbatim.  At the very least, I think that if I do this
> atomically using PTRACE_SETREGSET, the task's state needs to remain
> unchanged.

Agreed... do you mean this doesn't work?

> Since ptrace doesn't seem to have any real concept of
> atomic register state changes right now

Could you spell please?

I can't understand what does "atomically" mean in this context.

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace vs FSGSBASE
  2016-05-02 15:38   ` Andy Lutomirski
@ 2016-05-02 15:35     ` Oleg Nesterov
  2016-05-02 17:26       ` Andy Lutomirski
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2016-05-02 15:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen,
	Borislav Petkov, Brian Gerst

On 05/02, Andy Lutomirski wrote:
>
> On Mon, May 2, 2016 at 7:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >>
> >> 1. I read fs_base using ptrace.  I think I should get the actual
> >> fs_base without any nonsense.
> >
> > Which fs_base? The member of user_regs_struct? But this structure/layout
> > is just the ABI, so to me it seems correct that getreg() tries to look
> > at ->fs and/or ->fsindex.
>
> Yeah, the member of user_regs_struct.

Still can't understand this... user_regs_struct is just the set of offsets
we use to "name" the registers for getreg/putreg. We simply do not have
"the actual fs_base" we could use in getreg(), we need to calculate it.

> > I can't understand what does "atomically" mean in this context.
>
> I mean "change fs and fs_base to these two values in a single syscall
> so that the kernel can do something intelligent."
>
> Let me give some background:
> [... snip ...]

Thanks Andy. I need to re-read your explanation, but it seems I am starting
to understand. And yes, I didn't bother to look at putreg() when I wrote
my reply.

> If you write, say, 0x2b to
> fs and 12345 to fs_base using the ptrace API, you'd end up with FS ==
> 0x2b and FSBASE == 0,

Hmm. I can be easily wrong again but afaics in this case do_arch_prctl()
will change fs/fs_base first and set

	fsindex = FS_TLS_SEL
	fs = 0

and then... and then I simply can't understand what set_segment_reg(fs)
will/should do in this case. Nor I can understand the "thread.fs != value"
check before do_arch_prctl(ARCH_SET_FS). Confused.

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace vs FSGSBASE
  2016-05-02 14:27 ` Oleg Nesterov
@ 2016-05-02 15:38   ` Andy Lutomirski
  2016-05-02 15:35     ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2016-05-02 15:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen,
	Borislav Petkov, Brian Gerst

On Mon, May 2, 2016 at 7:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Hi Andy,
>
> let me first say that I never knew how this code (and the hardware)
> actually works, I am not sure I even understand what ARCH_SET_.S
> exactly does ;)
>
> What is even worse, I do not understand your question. So it is not
> that I am trying to help, I am asking you to help me understand the
> problem.
>
> On 04/29, Andy Lutomirski wrote:
>>
>> 1. I read fs_base using ptrace.  I think I should get the actual
>> fs_base without any nonsense.
>
> Which fs_base? The member of user_regs_struct? But this structure/layout
> is just the ABI, so to me it seems correct that getreg() tries to look
> at ->fs and/or ->fsindex.

Yeah, the member of user_regs_struct.

>
> IOW. getreg(fs) should return the same value as prctl(ARCH_GET_FS)
> returns if called by the tracee, no?

Hah, nice can of worms there.  You're assuming that ARCH_GET_FS
actually worked...

>
>> 2. I read all the regs (PEEKUSER or whatever) and then write then all
>> back verbatim.  At the very least, I think that if I do this
>> atomically using PTRACE_SETREGSET, the task's state needs to remain
>> unchanged.
>
> Agreed... do you mean this doesn't work?

I'm not 100% sure.  It probably does right now.  See below.

>
>> Since ptrace doesn't seem to have any real concept of
>> atomic register state changes right now
>
> Could you spell please?
>
> I can't understand what does "atomically" mean in this context.

I mean "change fs and fs_base to these two values in a single syscall
so that the kernel can do something intelligent."

Let me give some background:

On 32-bit systems, there are the FS and GS registers.  For any value
of FS, there is an implied base address of the FS segment.  A debugger
could, if it cared, try to figure out that implied base, except that
no one ever added the API for that.  If a debugger read FS and wrote
the same value back to FS, then the process would probably end up in
the same state it started in (modulo several bugs, all but one of
which are now fixed in -tip AFAIK.)  All was well.

On current 64-bit systems Linux systems, there is a degree of
independent control of FS and FSBASE.  A process can call ARCH_SET_FS
and pass an offset >4G, which will result in FS == 0 and FSBASE ==
whatever the process passed.  This is already a bit screwy.  Suppose a
debugger writes zero to FS.  If this were an actual MOV instruction on
an Intel chip, FSBASE would be reset to zero (and then the context
switch code would corrupt it).  But writing zero to FS through ptrace
should have no effect and currently has no effect.  If FS != 0, then
FSBASE has some implied value.  On old kernels, reading
user_regs_struct::fs_base would give either zero or garbage, depending
on which set of bugs you managed to hit.  If you write, say, 0x2b to
fs and 12345 to fs_base using the ptrace API, you'd end up with FS ==
0x2b and FSBASE == 0, because the fs_base write went to an ignored
field.

On Ivy Bridge and up, there's a new CPU feature that lets user code
override FSBASE on its own, making pretty much any combination of FS
and FSBASE possible.  But how should this interact with ptrace?  If a
debugger sets fs_base = 12345 and *then* sets fs to 0x2b, does the
debugger expect the write to fs to override FSBASE (which it would if
done using MOV) causing FSBASE to reset to zero?  Or should FSBASE
actually end up containing 12345?  The issue comes up because, on
these newer systems, 0x2b/12345 is actually a reasonable combination
of values, whereas, on older systems, it was not.


--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace vs FSGSBASE
  2016-05-02 15:35     ` Oleg Nesterov
@ 2016-05-02 17:26       ` Andy Lutomirski
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Lutomirski @ 2016-05-02 17:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: X86 ML, linux-kernel@vger.kernel.org, Roland McGrath, Andi Kleen,
	Borislav Petkov, Brian Gerst

On Mon, May 2, 2016 at 8:35 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 05/02, Andy Lutomirski wrote:
>>
>> On Mon, May 2, 2016 at 7:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >>
>> >> 1. I read fs_base using ptrace.  I think I should get the actual
>> >> fs_base without any nonsense.
>> >
>> > Which fs_base? The member of user_regs_struct? But this structure/layout
>> > is just the ABI, so to me it seems correct that getreg() tries to look
>> > at ->fs and/or ->fsindex.
>>
>> Yeah, the member of user_regs_struct.
>
> Still can't understand this... user_regs_struct is just the set of offsets
> we use to "name" the registers for getreg/putreg. We simply do not have
> "the actual fs_base" we could use in getreg(), we need to calculate it.

Right.  When I said writing to fs_base, I meant using POKEUSER or
similar to write to the thing referred to as fs_base via the helpers
in arch/x86/kernel/ptrace.c

>
>> > I can't understand what does "atomically" mean in this context.
>>
>> I mean "change fs and fs_base to these two values in a single syscall
>> so that the kernel can do something intelligent."
>>
>> Let me give some background:
>> [... snip ...]
>
> Thanks Andy. I need to re-read your explanation, but it seems I am starting
> to understand. And yes, I didn't bother to look at putreg() when I wrote
> my reply.
>
>> If you write, say, 0x2b to
>> fs and 12345 to fs_base using the ptrace API, you'd end up with FS ==
>> 0x2b and FSBASE == 0,
>
> Hmm. I can be easily wrong again but afaics in this case do_arch_prctl()
> will change fs/fs_base first and set
>
>         fsindex = FS_TLS_SEL
>         fs = 0
>
> and then... and then I simply can't understand what set_segment_reg(fs)
> will/should do in this case.

Exactly.  Hence my uncertainly as to what to do.

> Nor I can understand the "thread.fs != value"
> check before do_arch_prctl(ARCH_SET_FS). Confused.

I think that code was a optimization that doesn't make much sense.

It wouldn't surprise me if almost no one uses any of this
functionality right now.

--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-02 17:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-29 18:22 ptrace vs FSGSBASE Andy Lutomirski
2016-05-02 12:40 ` Oleg Nesterov
2016-05-02 14:27 ` Oleg Nesterov
2016-05-02 15:38   ` Andy Lutomirski
2016-05-02 15:35     ` Oleg Nesterov
2016-05-02 17:26       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox