context switch vs. signal delivery [was: Re: Accelerating user mode linux]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-02 13:28 Accelerating user mode linux Alan Cox
@ 2002-08-03 11:38 ` Ingo Molnar
  2002-08-04  6:46   ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2002-08-03 11:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

On 2 Aug 2002, Alan Cox wrote:

> The numbers look very different on a real processor. Signal delivery is
> indeed not stunningly fast but relative to a context switch its very low
> indeed.

actually the opposite is true, on a 2.2 GHz P4:

  $ ./lat_sig catch
  Signal handler overhead: 3.091 microseconds

  $ ./lat_ctx -s 0 2
  2 0.90

ie. *process to process* context switches are 3.4 times faster than signal
delivery. Ie. we can switch to a helper thread and back, and still be
faster than a *single* signal.

signals are in essence 'lightweight' threads created and destroyed for the
purpose of a single asynchronous event, it's IMO a very inefficient and
baroque concept for almost anything (but debugging and a number of very
special uses). I'd guess that with a sane threading library a helper
thread is faster for almost everything.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-03 11:38 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
@ 2002-08-04  6:46   ` Andi Kleen
  2002-08-05  5:35     ` Linus Torvalds
  2002-08-05 10:40     ` Ingo Molnar
  0 siblings, 2 replies; 20+ messages in thread
From: Andi Kleen @ 2002-08-04  6:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> actually the opposite is true, on a 2.2 GHz P4:
> 
>   $ ./lat_sig catch
>   Signal handler overhead: 3.091 microseconds
> 
>   $ ./lat_ctx -s 0 2
>   2 0.90
> 
> ie. *process to process* context switches are 3.4 times faster than signal
> delivery. Ie. we can switch to a helper thread and back, and still be
> faster than a *single* signal.

This is because the signal save/restore does a lot of unnecessary stuff.
One optimization I implemented at one time was adding a SA_NOFP signal
bit that told the kernel that the signal handler did not intend 
to modify floating point state (few signal handlers need FP) It would 
not save the FPU state then and reached quite some speedup in signal
latency. 

Linux got a lot slower in signal delivery when the SSE2 support was
added. That got this speed back.

The target were certain applications that use signal handlers for async
IO. 

If there is interest I can dig up the old patches. They were really simple.

x86-64 does it also faster by FXSAVE'ing directly to the user space
frame with exception handling instead of copying manually. But that's
not possible in i386 because it still has to use the baroque iBCS 
FP context format on the stack.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-04  6:46   ` Andi Kleen
@ 2002-08-05  5:35     ` Linus Torvalds
  2002-08-05  5:42       ` Arnaldo Carvalho de Melo
                         ` (3 more replies)
  2002-08-05 10:40     ` Ingo Molnar
  1 sibling, 4 replies; 20+ messages in thread
From: Linus Torvalds @ 2002-08-05  5:35 UTC (permalink / raw)
  To: linux-kernel

In article <m3u1mb5df3.fsf@averell.firstfloor.org>,
Andi Kleen  <ak@muc.de> wrote:
>Ingo Molnar <mingo@elte.hu> writes:
>
>
>> actually the opposite is true, on a 2.2 GHz P4:
>> 
>>   $ ./lat_sig catch
>>   Signal handler overhead: 3.091 microseconds
>> 
>>   $ ./lat_ctx -s 0 2
>>   2 0.90
>> 
>> ie. *process to process* context switches are 3.4 times faster than signal
>> delivery. Ie. we can switch to a helper thread and back, and still be
>> faster than a *single* signal.
>
>This is because the signal save/restore does a lot of unnecessary stuff.
>One optimization I implemented at one time was adding a SA_NOFP signal
>bit that told the kernel that the signal handler did not intend 
>to modify floating point state (few signal handlers need FP) It would 
>not save the FPU state then and reached quite some speedup in signal
>latency. 
>
>Linux got a lot slower in signal delivery when the SSE2 support was
>added. That got this speed back.

This will break _horribly_ when (if) glibc starts using SSE2 for things
like memcpy() etc.

I agree that it is really sad that we have to save/restore FP on
signals, but I think it's unavoidable. Your hack may work for you, but
it just gets really dangerous in general. having signals randomly
subtly corrupt some SSE2 state just because the signal handler uses
something like memcpy (without even realizing that that could lead to
trouble) is bad, bad, bad.

In other words, "not intending to" does not imply "will not".  It's just
potentially too easy to change SSE2 state by mistake. 

And yes, this signal handler thing is clearly visible on benchmarks. 
MUCH too clearly visible.  I just didn't see any safe alternatives
(and I still don't ;( )

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35     ` Linus Torvalds
@ 2002-08-05  5:42       ` Arnaldo Carvalho de Melo
  2002-08-05  6:37       ` Lincoln Dale
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2002-08-05  5:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Em Mon, Aug 05, 2002 at 05:35:13AM +0000, Linus Torvalds escreveu:
> This will break _horribly_ when (if) glibc starts using SSE2 for things
> like memcpy() etc.

Humm, related, wasn't one way of having userspace have access to the kernel
optimized versions of memcpy et al, thru a page with these functions that would
be mapped into the process address space (don't remember exact details)
something still being considered?

- Arnaldo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35     ` Linus Torvalds
  2002-08-05  5:42       ` Arnaldo Carvalho de Melo
@ 2002-08-05  6:37       ` Lincoln Dale
  2002-08-05 15:39       ` Jamie Lokier
  2002-08-06  5:31       ` Mark Mielke
  3 siblings, 0 replies; 20+ messages in thread
From: Lincoln Dale @ 2002-08-05  6:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

At 05:35 AM 5/08/2002 +0000, Linus Torvalds wrote:
> >Linux got a lot slower in signal delivery when the SSE2 support was
> >added. That got this speed back.
>
>This will break _horribly_ when (if) glibc starts using SSE2 for things
>like memcpy() etc.
>
>I agree that it is really sad that we have to save/restore FP on
>signals, but I think it's unavoidable. Your hack may work for you, but
>it just gets really dangerous in general. having signals randomly
>subtly corrupt some SSE2 state just because the signal handler uses
>something like memcpy (without even realizing that that could lead to
>trouble) is bad, bad, bad.

how about putting the onus on userspace to tell the kernel if/when it uses 
extensions that require FP state to be saved/restored?
if/when glibc starts using SSE2, it could then use these extensions.

could be as simple as user-space setting some bit somewhere.

>And yes, this signal handler thing is clearly visible on benchmarks.
>MUCH too clearly visible.  I just didn't see any safe alternatives
>(and I still don't ;( )

it probably isn't worthwhile penalising all users of signal just for those 
few userspace apps that actually do use SSE2.


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
       [not found]     ` <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>
@ 2002-08-05  8:38       ` Andi Kleen
  2002-08-05 14:24         ` Jeff Dike
  2002-08-05 16:19         ` Linus Torvalds
  0 siblings, 2 replies; 20+ messages in thread
From: Andi Kleen @ 2002-08-05  8:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

torvalds@transmeta.com (Linus Torvalds) writes:

> >This is because the signal save/restore does a lot of unnecessary stuff.
> >One optimization I implemented at one time was adding a SA_NOFP signal
> >bit that told the kernel that the signal handler did not intend 
> >to modify floating point state (few signal handlers need FP) It would 
> >not save the FPU state then and reached quite some speedup in signal
> >latency. 
> >
> >Linux got a lot slower in signal delivery when the SSE2 support was
> >added. That got this speed back.
> 
> This will break _horribly_ when (if) glibc starts using SSE2 for things
> like memcpy() etc.
> 
> I agree that it is really sad that we have to save/restore FP on
> signals, but I think it's unavoidable. Your hack may work for you, but
> it just gets really dangerous in general. having signals randomly
> subtly corrupt some SSE2 state just because the signal handler uses
> something like memcpy (without even realizing that that could lead to
> trouble) is bad, bad, bad.

I think the possibility at least for memcpy is rather remote. Any sane
SSE memcpy would only kick in for really big arguments (for small
memcpys it doesn't make any sense at all because of the context save/possible
reformatting penalty overhead). So only people doing really
big memcpys could be possibly hurt, and that is rather unlikely.

But your point stands, one definitely needs to be very careful with it.

Also for special things like UML who can ensure their environment is sane it 
could be still an useful optimization. I did it originally for async IO 
handling in some project. At least offering the choice does not hurt.
If it wcould speed up UML I think it would be certainly worth it.

After all Linux should give you enough rope to shot yourself in the foot ;)

> 
> In other words, "not intending to" does not imply "will not".  It's just
> potentially too easy to change SSE2 state by mistake. 
> 
> And yes, this signal handler thing is clearly visible on benchmarks. 
> MUCH too clearly visible.  I just didn't see any safe alternatives
> (and I still don't ;( )

In theory you could do a superhack: put the FP context into an unmapped
page on the stack and only save with lazy FPU or access to the unmapped
page. Unfortunately the details get too nasty
(where to find the unmapped page? is the tlb manipulation worth it if the
page was mapped? how to store the address of the unmapped page for nested 
signal handlers for the page fault handler?) so I discarded this idea.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-04  6:46   ` Andi Kleen
  2002-08-05  5:35     ` Linus Torvalds
@ 2002-08-05 10:40     ` Ingo Molnar
  2002-08-05 14:59       ` Larry McVoy
  2002-08-05 15:41       ` Jamie Lokier
  1 sibling, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2002-08-05 10:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

On 4 Aug 2002, Andi Kleen wrote:

> > actually the opposite is true, on a 2.2 GHz P4:
> > 
> >   $ ./lat_sig catch
> >   Signal handler overhead: 3.091 microseconds
> > 
> >   $ ./lat_ctx -s 0 2
> >   2 0.90
> > 
> > ie. *process to process* context switches are 3.4 times faster than signal
> > delivery. Ie. we can switch to a helper thread and back, and still be
> > faster than a *single* signal.
> 
> This is because the signal save/restore does a lot of unnecessary stuff.
> One optimization I implemented at one time was adding a SA_NOFP signal
> bit that told the kernel that the signal handler did not intend to
> modify floating point state (few signal handlers need FP) It would not
> save the FPU state then and reached quite some speedup in signal
> latency.

well, we have an optimization in this area already - if the thread
receiving the signal has not used any FPU registers during its current
scheduled atom yet then we do not save the FPU state into the signal
frame.

lat_sig uses the FPU so this cost is added. If the FPU saving cost is
removed then signal delivery latency is still 2.0 usecs - slightly more
than twice as expensive as a context-switch - so it's not a win. And
threads can do queued events that amortizes context switch overhead, while
queued signals generate per-event signal delivery, so signal delivery
costs are not amortized.

(Not that i advocate SIGIO or helper threads for highperformance IO -
Ben's aio interface is the fastest and most correct approach.)

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  8:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
@ 2002-08-05 14:24         ` Jeff Dike
  2002-08-05 16:19         ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Dike @ 2002-08-05 14:24 UTC (permalink / raw)
  To: Andi Kleen, Linus Torvalds; +Cc: linux-kernel

ak@suse.de said:
> Also for special things like UML who can ensure their environment is
> sane it  could be still an useful optimization. 

I use libc, and I haven't been able to convince myself that it isn't
going to use FP instructions or registers on my behalf.  I use it as little
as possible, but it still makes me nervous.

> If it wcould speed up UML I think it would be certainly
> worth it.

After Ingo's numbers, I like the idea of just having a separate address
space and process for the UML kernel, and have that process ptrace UML 
processes and handle system calls and interrupts on their behalf.  One
context switch at the start of a system call and one at the end, as opposed
to a signal delivery and sigreturn.

This also solves the jail mode mprotect performance horrors.

The one thing standing in my way is the need for the kernel process to
be able to change the address space of its processes.

I made a proposal for that, and Alan didn't like it.  So, we'll see what
he likes better.

				Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 10:40     ` Ingo Molnar
@ 2002-08-05 14:59       ` Larry McVoy
  2002-08-05 15:41       ` Jamie Lokier
  1 sibling, 0 replies; 20+ messages in thread
From: Larry McVoy @ 2002-08-05 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

> > > actually the opposite is true, on a 2.2 GHz P4:
> > > 
> > >   $ ./lat_sig catch
> > >   Signal handler overhead: 3.091 microseconds
> > > 
> > >   $ ./lat_ctx -s 0 2
> > >   2 0.90
> > > 
> > > ie. *process to process* context switches are 3.4 times faster than signal
> > > delivery. Ie. we can switch to a helper thread and back, and still be
> > > faster than a *single* signal.

Has someone gone through the lat_ctx.c and lat_sig.c code and convinced 
themselves these are measuring things which ought to be compared like this?
When I wrote that code I didn't anticipate this comparison, so somebody
should go look.

I'd suggest that if you want to measure how fast you can communicate using
signals versus pipes (or sockets or whatever), someone write up a test
which has two processes bounce a token between each other using signals
and then compare that with lat_pipe.  It's not clear to me that you are
comparing apples to apples.

If someone does write the test, we'll add it to LMbench if it reveals
anything useful.  It should be easy enough to do.  I can do it if it
isn't obvious.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35     ` Linus Torvalds
  2002-08-05  5:42       ` Arnaldo Carvalho de Melo
  2002-08-05  6:37       ` Lincoln Dale
@ 2002-08-05 15:39       ` Jamie Lokier
  2002-08-05 16:38         ` Linus Torvalds
  2002-08-06  5:31       ` Mark Mielke
  3 siblings, 1 reply; 20+ messages in thread
From: Jamie Lokier @ 2002-08-05 15:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> I agree that it is really sad that we have to save/restore FP on
> signals, but I think it's unavoidable.

Couldn't you mark the FPU as unused for the duration of the
handler, and let the lazy FPU mechanism save the state when it is used
by the signal handler?

> And yes, this signal handler thing is clearly visible on benchmarks. 
> MUCH too clearly visible.  I just didn't see any safe alternatives
> (and I still don't ;( )

I use SEGVs to trap access to read-only pages for garbage collection,
and I know I'm not the only one.  That's a lot of SEGVs...

Fwiw, I have timed SIGSEGV handling time on Linux on various Intel CPUs,
on a PA-RISC running HP-UX and on a few Sparcs running Solaris.  Linux
came out faster in all cases.  Best case: 8 microseconds to trap a page
fault, handle the SEGV and mprotect() one page (600MHz P3).  Worst case:
37 microseconds (133MHz Pentium).

That's about 5000 cycles.  I'm sure we can do better than that.

For sophisticated user space uses, like the above, I'd like to see
a trap handling mechanism that saves only the _minimum_ state.
Userspace can take care of the rest.  Maybe even without a sigreturn in
some cases.

-- Jamie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 10:40     ` Ingo Molnar
  2002-08-05 14:59       ` Larry McVoy
@ 2002-08-05 15:41       ` Jamie Lokier
  2002-08-05 15:44         ` Jamie Lokier
  1 sibling, 1 reply; 20+ messages in thread
From: Jamie Lokier @ 2002-08-05 15:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

Ingo Molnar wrote:
> And threads can do queued events that amortizes context switch
> overhead, while queued signals generate per-event signal delivery, so
> signal delivery costs are not amortized.
> 
> (Not that i advocate SIGIO or helper threads for highperformance IO -
> Ben's aio interface is the fastest and most correct approach.)

Isn't the per-event queued signal cost amortised when using sigwaitinfo()?

-- Jamie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 15:41       ` Jamie Lokier
@ 2002-08-05 15:44         ` Jamie Lokier
  0 siblings, 0 replies; 20+ messages in thread
From: Jamie Lokier @ 2002-08-05 15:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

Jamie Lokier wrote:
> Ingo Molnar wrote:
> > And threads can do queued events that amortizes context switch
> > overhead, while queued signals generate per-event signal delivery, so
> > signal delivery costs are not amortized.
> > 
> > (Not that i advocate SIGIO or helper threads for highperformance IO -
> > Ben's aio interface is the fastest and most correct approach.)
> 
> Isn't the per-event queued signal cost amortised when using sigwaitinfo()?

Of course I meant:

  Isn't the per-event queued signal cost amortised when using sigtimedwait()?

cheers,
-- Jamie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  8:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
  2002-08-05 14:24         ` Jeff Dike
@ 2002-08-05 16:19         ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2002-08-05 16:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On 5 Aug 2002, Andi Kleen wrote:
> 
> I think the possibility at least for memcpy is rather remote. Any sane
> SSE memcpy would only kick in for really big arguments (for small
> memcpys it doesn't make any sense at all because of the context save/possible
> reformatting penalty overhead). So only people doing really
> big memcpys could be possibly hurt, and that is rather unlikely.

And this is why the kernel _has_ to save the FP state.

It's the "only happens in a blue moon" bugs that are the absolute _worst_ 
bugs. I want to optimize the kernel until I'm blue in the face, but the 
kernel must NEVER EVER have a "non-stable" interface.

Signal handlers that don't restore state are hard as _hell_ to debug. Most 
of the time it doesn't really matter (unless the lack of restore is 
something really major like one of the most common integer registers), but 
then depending on what libraries you use, and just _exactly_ when the 
signal comes in, you get subtle data corruption that may not show up until 
much later.

At which point your programmer wonders if he mistakenly wandered into 
MS-Windows land.

No thank you. I'll take slow signal handlers over ones that _sometimes_ 
don't work.

> After all Linux should give you enough rope to shot yourself in the foot ;)

On purpose, yes. It's ok to take careful aim, and say "I'm now shooting 
myself in the foot".

And yes, it's also ok to say "I don't know what I'm doing, so I may be
shooting myself in the foot" (this is obviously the most common
foot-shooter).

And if you come to me and complain about how drunk you were, and how you 
shot yourself in the foot by mistake due to that, I'll just ignore you.

BUT - and this is a big BUT - if you are doing everything right, and you 
actually know what you're doing, and you end up shooting yourself in the 
foot because the kernel was taking a shortcut, then I think the kernel is 
_wrong_.

And I'd rather have a slow kernel that does things right, than a fast 
kernel which screws with people.

> In theory you could do a superhack: put the FP context into an unmapped
> page on the stack and only save with lazy FPU or access to the unmapped
> page. 

That would be extremely interesting especially with signal handlers that 
do a longjmp() thing.

The real fix for a lot of programs on x86 would be for them to never ever 
use FP in the first place, in which case the kernel would be able to just 
not save and restore it at all.

However, glibc fiddles with the fpu at startup, even for non-FP programs. 
Dunno what to do about that.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 15:39       ` Jamie Lokier
@ 2002-08-05 16:38         ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2002-08-05 16:38 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

On Mon, 5 Aug 2002, Jamie Lokier wrote:

> Linus Torvalds wrote:
> > I agree that it is really sad that we have to save/restore FP on
> > signals, but I think it's unavoidable.
> 
> Couldn't you mark the FPU as unused for the duration of the
> handler, and let the lazy FPU mechanism save the state when it is used
> by the signal handler?

Nope. Believe me, I gave some thought to clever things to do. 

The kernel won't even _see_ a longjmp() out of a signal handler, so the
kernel has a really hard time trying to do any clever lazy stuff.

Also, people who play games with FP actually change the FP data on the
stack frame, and depend on signal return to reload it. Admittedly I've 
only ever seen this on SIGFPE, but anyway - this is all done with integer 
instructions that just touch bitpatterns on the stack.. The kernel can't 
catch it sanely.

> For sophisticated user space uses, like the above, I'd like to see
> a trap handling mechanism that saves only the _minimum_ state.

I would not mind an extra per-signal flag that says "don't bother with FP
saves" (the same way we already have "don't restart" etc), but I would be
very nervous if glibc used it by default (even if glibc doesn't use SSE2
in memcpy, gcc itself can do it, and obviously _users_ may just do it
themselves).

So it would have to be explicitly enabled with a SA_NOFPSIGHANDLER flag or 
something.

(And yes, it's the FP stuff that takes most of the time. I think the 
lmbench numbers for signal delivery tripled when that went in).

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
       [not found] ` <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>
@ 2002-08-05 16:46   ` Andi Kleen
  2002-08-05 21:30     ` Jamie Lokier
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2002-08-05 16:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds <torvalds@transmeta.com> writes:
> 
> So it would have to be explicitly enabled with a SA_NOFPSIGHANDLER flag or 
> something.

That is all what my patch was doing. It added a SA_NOFP, with default
to off.
Nothing about enabling it by default. The first hunk is an minor optimization.

-Andi

I attached the old patch for 2.4.9 for reference. If you think it is ok
I can submit it for 2.5

--- linux-work/arch/i386/kernel/signal.c-NOFP	Fri Aug 24 16:36:14 2001
+++ linux-work/arch/i386/kernel/signal.c	Fri Aug 31 00:04:24 2001
@@ -322,15 +322,15 @@
 
 static int
 setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate,
-		 struct pt_regs *regs, unsigned long mask)
+		 struct pt_regs *regs, unsigned long mask, int fp)
 {
-	int tmp, err = 0;
+	int err = 0;
+	int tmp;
 
-	tmp = 0;
-	__asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp));
-	err |= __put_user(tmp, (unsigned int *)&sc->gs);
-	__asm__("movl %%fs,%0" : "=r"(tmp): "0"(tmp));
-	err |= __put_user(tmp, (unsigned int *)&sc->fs);
+	__asm__("movl %%gs,%0" : "=r"(tmp));
+	err |= __put_user(tmp & 0xffff, (unsigned int *)&sc->gs);
+	__asm__("movl %%fs,%0" : "=r"(tmp));
+	err |= __put_user(tmp & 0xffff, (unsigned int *)&sc->fs);
 
 	err |= __put_user(regs->xes, (unsigned int *)&sc->es);
 	err |= __put_user(regs->xds, (unsigned int *)&sc->ds);
@@ -350,11 +350,12 @@
 	err |= __put_user(regs->esp, &sc->esp_at_signal);
 	err |= __put_user(regs->xss, (unsigned int *)&sc->ss);
 
-	tmp = save_i387(fpstate);
-	if (tmp < 0)
+	if (fp)
+	  fp = save_i387(fpstate);
+	if (fp < 0)
 	  err = 1;
 	else
-	  err |= __put_user(tmp ? fpstate : NULL, &sc->fpstate);
+	  err |= __put_user(fp ? fpstate : NULL, &sc->fpstate);
 
 	/* non-iBCS2 extensions.. */
 	err |= __put_user(mask, &sc->oldmask);
@@ -410,7 +411,8 @@
 	if (err)
 		goto give_sigsegv;
 
-	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0]);
+	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], 
+				(ka->sa.sa_flags&SA_NOFP));
 	if (err)
 		goto give_sigsegv;
 
@@ -491,7 +493,7 @@
 			  &frame->uc.uc_stack.ss_flags);
 	err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
 	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
-			        regs, set->sig[0]);
+			        regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP));
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 	if (err)
 		goto give_sigsegv;
--- linux-work/arch/i386/kernel/i387.c-NOFP	Fri Feb 23 19:09:08 2001
+++ linux-work/arch/i386/kernel/i387.c	Fri Aug 31 00:01:52 2001
@@ -323,11 +323,6 @@
 	if ( !current->used_math )
 		return 0;
 
-	/* This will cause a "finit" to be triggered by the next
-	 * attempted FPU operation by the 'current' process.
-	 */
-	current->used_math = 0;
-
 	if ( HAVE_HWFP ) {
 		if ( cpu_has_fxsr ) {
 			return save_i387_fxsave( buf );
@@ -335,6 +330,11 @@
 			return save_i387_fsave( buf );
 		}
 	} else {
+		/* This will cause a "finit" to be triggered by the next
+		 * attempted FPU operation by the 'current' process.
+		 */
+		current->used_math = 0;
+       
 		return save_i387_soft( &current->thread.i387.soft, buf );
 	}
 }
--- linux-work/include/asm-i386/signal.h-NOFP	Thu Sep 13 22:27:41 2001
+++ linux-work/include/asm-i386/signal.h	Thu Oct 18 18:31:29 2001
@@ -80,6 +80,7 @@
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
+ * SA_NOFP    Don't save FP state.	
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
  * Unix names RESETHAND and NODEFER respectively.
@@ -97,6 +98,7 @@
 #define SA_INTERRUPT	0x20000000 /* dummy -- ignored */
 
 #define SA_RESTORER	0x04000000
+#define SA_NOFP		0x02000000
 
 /* 
  * sigaltstack controls

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 16:46   ` Andi Kleen
@ 2002-08-05 21:30     ` Jamie Lokier
  2002-08-05 21:35       ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Jamie Lokier @ 2002-08-05 21:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel

A couple of questions.

Andi Kleen wrote:
> +	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], 
> +				(ka->sa.sa_flags&SA_NOFP));

>  	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
> -			        regs, set->sig[0]);
> +			        regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP));

1: Why the inconsistency between the two ways the SA_NOFP flag is checked?

2: What happens when the user's signal handler decides it wants to save
the FPU state itself (after all) and proceed with some FPU use.  Will
sigreturn restore the user-saved FPU state?  Just curious.

-- Jamie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 21:30     ` Jamie Lokier
@ 2002-08-05 21:35       ` Andi Kleen
  2002-08-05 22:09         ` Jamie Lokier
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2002-08-05 21:35 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andi Kleen, Linus Torvalds, linux-kernel

On Mon, Aug 05, 2002 at 10:30:06PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > +	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], 
> > +				(ka->sa.sa_flags&SA_NOFP));
> 
> >  	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
> > -			        regs, set->sig[0]);
> > +			        regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP));
> 
> 1: Why the inconsistency between the two ways the SA_NOFP flag is checked?

I don't remember. Probably there was some reason in an earlier version
of the code. The !! could be probably removed now.

> 
> 2: What happens when the user's signal handler decides it wants to save
> the FPU state itself (after all) and proceed with some FPU use.  Will
> sigreturn restore the user-saved FPU state?  Just curious.

Nope it won't because there is no saved state. The previous context's FPU 
state will be silently corrupted.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 21:35       ` Andi Kleen
@ 2002-08-05 22:09         ` Jamie Lokier
  2002-08-05 22:16           ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Jamie Lokier @ 2002-08-05 22:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel

Andi Kleen wrote:
> > 2: What happens when the user's signal handler decides it wants to save
> > the FPU state itself (after all) and proceed with some FPU use.  Will
> > sigreturn restore the user-saved FPU state?  Just curious.
> 
> Nope it won't because there is no saved state. The previous context's FPU 
> state will be silently corrupted.

I meant if the user's signal handler decides it wants to save the FPU
state directly into the signal context struct, after deciding to do
that.  Won't that work?

-- Jamie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 22:09         ` Jamie Lokier
@ 2002-08-05 22:16           ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2002-08-05 22:16 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andi Kleen, Linus Torvalds, linux-kernel

On Mon, Aug 05, 2002 at 11:09:41PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > > 2: What happens when the user's signal handler decides it wants to save
> > > the FPU state itself (after all) and proceed with some FPU use.  Will
> > > sigreturn restore the user-saved FPU state?  Just curious.
> > 
> > Nope it won't because there is no saved state. The previous context's FPU 
> > state will be silently corrupted.
> 
> I meant if the user's signal handler decides it wants to save the FPU
> state directly into the signal context struct, after deciding to do
> that.  Won't that work?

In theory yes. The space should be already allocated on the stack, it just
has to be filled in.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35     ` Linus Torvalds
                         ` (2 preceding siblings ...)
  2002-08-05 15:39       ` Jamie Lokier
@ 2002-08-06  5:31       ` Mark Mielke
  3 siblings, 0 replies; 20+ messages in thread
From: Mark Mielke @ 2002-08-06  5:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Mon, Aug 05, 2002 at 05:35:13AM +0000, Linus Torvalds wrote:
> And yes, this signal handler thing is clearly visible on benchmarks. 
> MUCH too clearly visible.  I just didn't see any safe alternatives
> (and I still don't ;( )

To some degree, the original approach taken by Intel may be an alternative...

That is, the signal handler is responsible for saving state of all CPU
resources that it intends to use, and restoring state before returning
control to the caller. (the 'interupt' qualifier from C)

I could see this offered as a GCC optimization, but without the compiler
smarts to detect what is needed and what is not, it would be very difficult
to add this support in a seamless manner.

For example:

    typedef void (*__fastsighandler_t) (int) __attribute__ ((signal_handler));

    #define signal(number, handler) \
        (__attribute_enabled__((handler, signal_handler)) \
            ? __signal_fast(number, handler) \
            : __signal(number, handler))

    void handle_sigint (int) __attribute__ ((signal_handler))
    {
        sigint_received++;
    }



mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2002-08-06  5:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1028294887.18635.71.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0208031332120.7531-100000@localhost.localdomain.suse.lists.linux.kernel>
     [not found]   ` <m3u1mb5df3.fsf@averell.firstfloor.org.suse.lists.linux.kernel>
     [not found]     ` <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>
2002-08-05  8:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
2002-08-05 14:24         ` Jeff Dike
2002-08-05 16:19         ` Linus Torvalds
     [not found] <20020805163910.C7130@kushida.apsleyroad.org.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>
2002-08-05 16:46   ` Andi Kleen
2002-08-05 21:30     ` Jamie Lokier
2002-08-05 21:35       ` Andi Kleen
2002-08-05 22:09         ` Jamie Lokier
2002-08-05 22:16           ` Andi Kleen
2002-08-02 13:28 Accelerating user mode linux Alan Cox
2002-08-03 11:38 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
2002-08-04  6:46   ` Andi Kleen
2002-08-05  5:35     ` Linus Torvalds
2002-08-05  5:42       ` Arnaldo Carvalho de Melo
2002-08-05  6:37       ` Lincoln Dale
2002-08-05 15:39       ` Jamie Lokier
2002-08-05 16:38         ` Linus Torvalds
2002-08-06  5:31       ` Mark Mielke
2002-08-05 10:40     ` Ingo Molnar
2002-08-05 14:59       ` Larry McVoy
2002-08-05 15:41       ` Jamie Lokier
2002-08-05 15:44         ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox