what's papered over by set_fs(USER_DS) in amd64 signal delivery?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* what's papered over by set_fs(USER_DS) in amd64 signal delivery?
@ 2010-09-24 15:52 Al Viro
  2010-09-24 16:07 ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2010-09-24 15:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tglx, mingo, linux-kernel

	What the hell is going on in amd64 handle_signal()?  We do

#ifdef CONFIG_X86_64
        /*
         * This has nothing to do with segment registers,
         * despite the name.  This magic affects uaccess.h
         * macros' behavior.  Reset it to the normal setting.
         */
        set_fs(USER_DS);
#endif

in the end of sigframe creation; first of all, we'd just done a bunch of
copying to user-controlled addresses, protected only by access_ok(), so
if we *did* have something different we are already fucked badly.  Moreover,
if we had been on our way to userland (and we wouldn't have reached that
code otherwise) with wrong ->addr_limit and would *not* get a signal, we'd
be left with the same ->addr_limit for the next syscall to be done.  Fucked
again?

I've tried to find amd64-specific magic that would require that, but so far
I've found nothing of that kind.  It looks like until the i386/amd64 merge
*both* used to have that thing and during the merge it has suddenly grown
that ifdef, so another way to put it is "why the reasons allowing to kill
it on i386 do not apply to amd64?"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-24 15:52 what's papered over by set_fs(USER_DS) in amd64 signal delivery? Al Viro
@ 2010-09-24 16:07 ` Linus Torvalds
  2010-09-24 16:57   ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2010-09-24 16:07 UTC (permalink / raw)
  To: Al Viro; +Cc: tglx, mingo, linux-kernel

On Fri, Sep 24, 2010 at 8:52 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>        What the hell is going on in amd64 handle_signal()?  We do
>
> #ifdef CONFIG_X86_64
>        /*
>         * This has nothing to do with segment registers,
>         * despite the name.  This magic affects uaccess.h
>         * macros' behavior.  Reset it to the normal setting.
>         */
>        set_fs(USER_DS);
> #endif

I _think_ it is historical, and probably relates to just restoring all
the user mode state at signal delivery to a known state. IOW, I think
it really does go hand-in-hand with  the whole "clear bits in the
eflags register" thing.

x86-64 has historically had some left-over crap that we already
cleaned up in 32-bit mode, for the simple reason that the original
x86-64 code was forked from an earlier base, and then hacked up
somewhat. So I think this "#ifdef CONFIG_X86_64" is just a case of
that.

But maybe we should have a WARN_ON_ONCE() to verify it, rather than
just kill it outright.

                                         Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-24 16:07 ` Linus Torvalds
@ 2010-09-24 16:57   ` Al Viro
  2010-09-25  2:25     ` Brian Gerst
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2010-09-24 16:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tglx, mingo, linux-kernel

On Fri, Sep 24, 2010 at 09:07:33AM -0700, Linus Torvalds wrote:

> I _think_ it is historical, and probably relates to just restoring all
> the user mode state at signal delivery to a known state. IOW, I think
> it really does go hand-in-hand with  the whole "clear bits in the
> eflags register" thing.
> 
> x86-64 has historically had some left-over crap that we already
> cleaned up in 32-bit mode, for the simple reason that the original
> x86-64 code was forked from an earlier base, and then hacked up
> somewhat. So I think this "#ifdef CONFIG_X86_64" is just a case of
> that.

Nope - look at the commit that merged them; i386 used to have it as well
right until the merge.  And yes, I agree that it's most likely a leftover
from ancient times.
 
> But maybe we should have a WARN_ON_ONCE() to verify it, rather than
> just kill it outright.

Um...  If we do that, I'd suggest taking it in front of setup_...frame()
and doing do_exit() with loud warning along the lines of "someone has
come that -><- close to rooting the box".  Anybody who can trigger that
(assuming it can be triggered at all) can overwrite arbitrary parts of
kernel memory.

I'd still like to hear from the x86 maintainers, just in case there's
something more non-trivial to that than "it has always been that way"...

FWIW, looking at the history it seems that the thing has happened almost
by accident; the critical part is
-       regs->cs = USER_CS; regs->ss = USER_DS;
-       regs->ds = USER_DS; regs->es = USER_DS;
-       regs->gs = USER_DS; regs->fs = USER_DS;
+       {
+               unsigned long seg = USER_DS;
+               __asm__("mov %w0,%%fs ; mov %w0,%%gs":"=r" (seg) :"0" (seg));
+               set_fs(seg);
+               regs->xds = seg;
+               regs->xes = seg;
+               regs->xss = seg;
+               regs->xcs = USER_CS;
in 2.1.2.  And that's when we had
	* fs and gs evicted from pt_regs
	* fs and gs not saved restored on kernel entry/exit
	* just introduced set_fs() to start with (that went in 2.1.0)

A bit before my time, so I'm not sure what's been going on there...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-24 16:57   ` Al Viro
@ 2010-09-25  2:25     ` Brian Gerst
  2010-09-25  2:48       ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Gerst @ 2010-09-25  2:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, tglx, mingo, linux-kernel

On Fri, Sep 24, 2010 at 12:57 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Sep 24, 2010 at 09:07:33AM -0700, Linus Torvalds wrote:
>
>> I _think_ it is historical, and probably relates to just restoring all
>> the user mode state at signal delivery to a known state. IOW, I think
>> it really does go hand-in-hand with  the whole "clear bits in the
>> eflags register" thing.
>>
>> x86-64 has historically had some left-over crap that we already
>> cleaned up in 32-bit mode, for the simple reason that the original
>> x86-64 code was forked from an earlier base, and then hacked up
>> somewhat. So I think this "#ifdef CONFIG_X86_64" is just a case of
>> that.
>
> Nope - look at the commit that merged them; i386 used to have it as well
> right until the merge.  And yes, I agree that it's most likely a leftover
> from ancient times.
>
>> But maybe we should have a WARN_ON_ONCE() to verify it, rather than
>> just kill it outright.
>
> Um...  If we do that, I'd suggest taking it in front of setup_...frame()
> and doing do_exit() with loud warning along the lines of "someone has
> come that -><- close to rooting the box".  Anybody who can trigger that
> (assuming it can be triggered at all) can overwrite arbitrary parts of
> kernel memory.
>
> I'd still like to hear from the x86 maintainers, just in case there's
> something more non-trivial to that than "it has always been that way"...
>
> FWIW, looking at the history it seems that the thing has happened almost
> by accident; the critical part is
> -       regs->cs = USER_CS; regs->ss = USER_DS;
> -       regs->ds = USER_DS; regs->es = USER_DS;
> -       regs->gs = USER_DS; regs->fs = USER_DS;
> +       {
> +               unsigned long seg = USER_DS;
> +               __asm__("mov %w0,%%fs ; mov %w0,%%gs":"=r" (seg) :"0" (seg));
> +               set_fs(seg);
> +               regs->xds = seg;
> +               regs->xes = seg;
> +               regs->xss = seg;
> +               regs->xcs = USER_CS;
> in 2.1.2.  And that's when we had
>        * fs and gs evicted from pt_regs
>        * fs and gs not saved restored on kernel entry/exit
>        * just introduced set_fs() to start with (that went in 2.1.0)
>
> A bit before my time, so I'm not sure what's been going on there...

I believe it can be safely removed.  Looking through the history, the
corresponding set_fs() calls were removed from 32-bit by commit
b93b6ca3.  This is just an artifact from ancient i386 code where
set_fs (which is grossly misnamed now) really did set the %fs
register.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-25  2:25     ` Brian Gerst
@ 2010-09-25  2:48       ` Al Viro
  2010-09-25  3:51         ` Brian Gerst
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2010-09-25  2:48 UTC (permalink / raw)
  To: Brian Gerst; +Cc: Linus Torvalds, tglx, mingo, linux-kernel

On Fri, Sep 24, 2010 at 10:25:15PM -0400, Brian Gerst wrote:
> > + ?? ?? ?? ?? ?? ?? ?? __asm__("mov %w0,%%fs ; mov %w0,%%gs":"=r" (seg) :"0" (seg));
> > + ?? ?? ?? ?? ?? ?? ?? set_fs(seg);
> > + ?? ?? ?? ?? ?? ?? ?? regs->xds = seg;
> > + ?? ?? ?? ?? ?? ?? ?? regs->xes = seg;
> > + ?? ?? ?? ?? ?? ?? ?? regs->xss = seg;
> > + ?? ?? ?? ?? ?? ?? ?? regs->xcs = USER_CS;
> > in 2.1.2. ??And that's when we had
> > ?? ?? ?? ??* fs and gs evicted from pt_regs
> > ?? ?? ?? ??* fs and gs not saved restored on kernel entry/exit
> > ?? ?? ?? ??* just introduced set_fs() to start with (that went in 2.1.0)
> >
> > A bit before my time, so I'm not sure what's been going on there...
> 
> I believe it can be safely removed.  Looking through the history, the
> corresponding set_fs() calls were removed from 32-bit by commit
> b93b6ca3.  This is just an artifact from ancient i386 code where
> set_fs (which is grossly misnamed now) really did set the %fs
> register.

Not quite.  If you look at the tree where it has shown up (2.1.2), you'll see
that
	a) by that time it _wasn't_ an assignment to %fs
	b) the same patch that has introduced that call there does direct
assignment to %fs right next to that set_fs().  See that __asm__ above?

Again, I agree that it almost certainly can be dropped.  I really wonder
about the history, though.  It predates git and bk by far (late 1996).
Linus, do you have any recollection regarding that stuff?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-25  2:48       ` Al Viro
@ 2010-09-25  3:51         ` Brian Gerst
  2010-09-25  5:20           ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Gerst @ 2010-09-25  3:51 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, tglx, mingo, linux-kernel

On Fri, Sep 24, 2010 at 10:48 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Sep 24, 2010 at 10:25:15PM -0400, Brian Gerst wrote:
>> > + ?? ?? ?? ?? ?? ?? ?? __asm__("mov %w0,%%fs ; mov %w0,%%gs":"=r" (seg) :"0" (seg));
>> > + ?? ?? ?? ?? ?? ?? ?? set_fs(seg);
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xds = seg;
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xes = seg;
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xss = seg;
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xcs = USER_CS;
>> > in 2.1.2. ??And that's when we had
>> > ?? ?? ?? ??* fs and gs evicted from pt_regs
>> > ?? ?? ?? ??* fs and gs not saved restored on kernel entry/exit
>> > ?? ?? ?? ??* just introduced set_fs() to start with (that went in 2.1.0)
>> >
>> > A bit before my time, so I'm not sure what's been going on there...
>>
>> I believe it can be safely removed.  Looking through the history, the
>> corresponding set_fs() calls were removed from 32-bit by commit
>> b93b6ca3.  This is just an artifact from ancient i386 code where
>> set_fs (which is grossly misnamed now) really did set the %fs
>> register.
>
> Not quite.  If you look at the tree where it has shown up (2.1.2), you'll see
> that
>        a) by that time it _wasn't_ an assignment to %fs
>        b) the same patch that has introduced that call there does direct
> assignment to %fs right next to that set_fs().  See that __asm__ above?
>
> Again, I agree that it almost certainly can be dropped.  I really wonder
> about the history, though.  It predates git and bk by far (late 1996).
> Linus, do you have any recollection regarding that stuff?
>

In the beginning, the i386 kernel used a non-flat segmented memory
layout.  USER_[CD]S were 3GB segments at base 0, and KERNEL_[CD]S were
1GB segments at base 3GB.  This meant that the kernel could not access
userspace addresses without using a fs segment override (%fs was saved
in pt_regs, reloaded with USER_DS on kernel entry, and restored on
kernel exit).  You had to reload %fs with KERNEL_DS for the *_user
functions to address the kernel segment.

v2.1.2 introduced the modern flat memory layout with 4GB segments at
base 0.  %fs no longer was used for userspace access, so it wasn't
saved in pt_regs or touched in any way until a task switch.  Instead
of the hardware enforcing the limit, the check was moved to software.

Originally the signal handler had to set regs->xfs = USER_DS so that
the signal handler had a known state when it ran.  That had nothing to
do with the kernel's userspace access mechanism.  It was converted to
do both the immediate reloading of the %fs register (since it was no
longer saved in pt_regs and wouldn't get restored on kernel exit), and
to a new set_fs(USER_DS) call which meant something completely
different.  That is the origin of the code we are trying to remove
now.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-25  3:51         ` Brian Gerst
@ 2010-09-25  5:20           ` Al Viro
  2010-09-25  9:54             ` Brian Gerst
  2010-09-26  9:13             ` Ingo Molnar
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2010-09-25  5:20 UTC (permalink / raw)
  To: Brian Gerst; +Cc: Linus Torvalds, tglx, mingo, linux-kernel

On Fri, Sep 24, 2010 at 11:51:11PM -0400, Brian Gerst wrote:
> > Again, I agree that it almost certainly can be dropped. ??I really wonder
> > about the history, though. ??It predates git and bk by far (late 1996).
> > Linus, do you have any recollection regarding that stuff?
> >
> 
> In the beginning, the i386 kernel used a non-flat segmented memory
> layout.  USER_[CD]S were 3GB segments at base 0, and KERNEL_[CD]S were
> 1GB segments at base 3GB.  This meant that the kernel could not access
> userspace addresses without using a fs segment override (%fs was saved
> in pt_regs, reloaded with USER_DS on kernel entry, and restored on
> kernel exit).  You had to reload %fs with KERNEL_DS for the *_user
> functions to address the kernel segment.

I know.

> v2.1.2 introduced the modern flat memory layout with 4GB segments at
> base 0.  %fs no longer was used for userspace access, so it wasn't
> saved in pt_regs or touched in any way until a task switch.  Instead
> of the hardware enforcing the limit, the check was moved to software.

Yes.

> Originally the signal handler had to set regs->xfs = USER_DS so that
> the signal handler had a known state when it ran.  That had nothing to
> do with the kernel's userspace access mechanism.  It was converted to
> do both the immediate reloading of the %fs register (since it was no
> longer saved in pt_regs and wouldn't get restored on kernel exit), and
> to a new set_fs(USER_DS) call which meant something completely
> different.  That is the origin of the code we are trying to remove
> now.

That still makes no sense.  2.0 mechanism guaranteed that even if you forgot
to restore %fs to USER_DS, you wouldn't leak that to userland.  But this
one didn't - each place like that became a roothole, no matter what you
did on signal delivery.  Simply because there might have been no unblocked
signals with userland handlers.  IOW, that set_fs() seems to have been
useless from the day 1, unless I'm missing something really subtle, like
e.g. some processes deliberately running (in 2.0) with %fs set to something
with lower limit, with signal handlers allowed to switch back to normal
for duration.  And even that would've been broken, since there wouldn't be
a matching set_fs() in sigreturn()...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-25  5:20           ` Al Viro
@ 2010-09-25  9:54             ` Brian Gerst
  2010-09-26  9:13             ` Ingo Molnar
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Gerst @ 2010-09-25  9:54 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, tglx, mingo, linux-kernel

On Sat, Sep 25, 2010 at 1:20 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Sep 24, 2010 at 11:51:11PM -0400, Brian Gerst wrote:
>> > Again, I agree that it almost certainly can be dropped. ??I really wonder
>> > about the history, though. ??It predates git and bk by far (late 1996).
>> > Linus, do you have any recollection regarding that stuff?
>> >
>>
>> In the beginning, the i386 kernel used a non-flat segmented memory
>> layout.  USER_[CD]S were 3GB segments at base 0, and KERNEL_[CD]S were
>> 1GB segments at base 3GB.  This meant that the kernel could not access
>> userspace addresses without using a fs segment override (%fs was saved
>> in pt_regs, reloaded with USER_DS on kernel entry, and restored on
>> kernel exit).  You had to reload %fs with KERNEL_DS for the *_user
>> functions to address the kernel segment.
>
> I know.
>
>> v2.1.2 introduced the modern flat memory layout with 4GB segments at
>> base 0.  %fs no longer was used for userspace access, so it wasn't
>> saved in pt_regs or touched in any way until a task switch.  Instead
>> of the hardware enforcing the limit, the check was moved to software.
>
> Yes.
>
>> Originally the signal handler had to set regs->xfs = USER_DS so that
>> the signal handler had a known state when it ran.  That had nothing to
>> do with the kernel's userspace access mechanism.  It was converted to
>> do both the immediate reloading of the %fs register (since it was no
>> longer saved in pt_regs and wouldn't get restored on kernel exit), and
>> to a new set_fs(USER_DS) call which meant something completely
>> different.  That is the origin of the code we are trying to remove
>> now.
>
> That still makes no sense.  2.0 mechanism guaranteed that even if you forgot
> to restore %fs to USER_DS, you wouldn't leak that to userland.  But this
> one didn't - each place like that became a roothole, no matter what you
> did on signal delivery.  Simply because there might have been no unblocked
> signals with userland handlers.  IOW, that set_fs() seems to have been
> useless from the day 1

That's what I was getting at.  The code it replaced did something
totally different (setting up user-mode register state).  The asm()
part of the replacement was the correct thing to do.  The set_fs()
call was unnecessary.

> unless I'm missing something really subtle, like
> e.g. some processes deliberately running (in 2.0) with %fs set to something
> with lower limit, with signal handlers allowed to switch back to normal
> for duration.  And even that would've been broken, since there wouldn't be
> a matching set_fs() in sigreturn()...

If the user process used a different %fs it didn't matter to the
kernel, because %fs was saved to pt_regs and reloaded with USER_DS on
kernel entry.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?
  2010-09-25  5:20           ` Al Viro
  2010-09-25  9:54             ` Brian Gerst
@ 2010-09-26  9:13             ` Ingo Molnar
  1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2010-09-26  9:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Brian Gerst, Linus Torvalds, tglx, mingo, linux-kernel

* Al Viro <viro@ZenIV.linux.org.uk> wrote:

> [...]  IOW, that set_fs() seems to have been useless from the day 1, 
> unless I'm missing something really subtle, like e.g. some processes 
> deliberately running (in 2.0) with %fs set to something with lower 
> limit, with signal handlers allowed to switch back to normal for 
> duration.  And even that would've been broken, since there wouldn't be 
> a matching set_fs() in sigreturn()...

I dont recall us ever having done anything particularly 'clever' with 
in-kernel set_fs()/restore_fs(). Beyond fork/clone it was always 
supposed to be set/restored in a balanced manner. We sometimes leaked it 
unintentionally, and those were security holes.

( Cleverness with security primitives was in fact always actively
  discouraged, even in the early days, as cleverness has the uncanny
  tendency to bit-rot and then has the tendency to slow-convert to a
  security hole by stealth. We always wanted obvious, boringly dumb,
  fail-safe primitives, which can take a few years of bitrot robustly. )

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-26  9:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-24 15:52 what's papered over by set_fs(USER_DS) in amd64 signal delivery? Al Viro
2010-09-24 16:07 ` Linus Torvalds
2010-09-24 16:57   ` Al Viro
2010-09-25  2:25     ` Brian Gerst
2010-09-25  2:48       ` Al Viro
2010-09-25  3:51         ` Brian Gerst
2010-09-25  5:20           ` Al Viro
2010-09-25  9:54             ` Brian Gerst
2010-09-26  9:13             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox