Accelerating user mode linux

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Accelerating user mode linux
@ 2002-08-01 20:16 Alan Cox
  2002-08-02  4:40 ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-01 20:16 UTC (permalink / raw)
  To: linux-kernel

Proposal for a sigaltmm()

There is a problem with performance when running virtualised environments
(notably user mode linux). The performance of the mprotect calls needed to 
handle syscalls and protect the UML kernel from its user space are large and
the alternatives like a seperate process and ptrace are not pretty either

The cunning plan goes like this

Add
	current->alt_mm
	A per task flag for 'supervisory' mode

Tasks start with current->alt_mm NULL and the flag set to supervisory
On exec/exit tear down alt_mm as well as mm

Signal delivery checks if alt_mm != NULL && supervisory is clear
if so it sets supervisory and switches mm/alt_mm, flush the tlb and 
continue handling the signal in the new space

We add
	sys_switchmm(address);

This switches to the altmm (creating one if it doesnt exist as a copy of
the current mm), flushes the tlb and jumps to the address given.

Any opinions, spanners to throw in the works ?

Alan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-01 20:16 Accelerating user mode linux Alan Cox
@ 2002-08-02  4:40 ` Jeff Dike
  2002-08-02  9:50   ` Alan Cox
  2002-08-02 11:34   ` Richard Zidlicky
  0 siblings, 2 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-02  4:40 UTC (permalink / raw)
  To: Alan Cox, linux-kernel

alan@redhat.com said:
> We add
> 	sys_switchmm(address);
> This switches to the altmm (creating one if it doesnt exist as a copy
> of the current mm), flushes the tlb and jumps to the address given. 

You didn't explicitly say (and so I had to ask :-) that this is intended
to be the mechanism by which UML returns to userspace, rather than the
normal sigreturn you'd get by just returning from the handler.

So, this would make the entry to userspace look like:
	restore registers
		.
		.
		.
	sys_switchmm(ip);

The problem with this is that it needs to be atomic wrt signals.  There
can't be an interrupt in the middle of that sequence.  So, sys_switchmm
would also have to restore the old signal mask, which you'd have to pass
in unless you're going to read it off the signal frame.  Also, it would
have to be open coded because you've already restored the stack pointer.

So, the entry to userspace starts looking like:
	block signals
	restore registers
	sys_switchmm(ip, new_sigmask);

Well, except for the blocking signals part, this is sigreturn under a 
different name and partly moved into userspace.  

Your objection to returning through sigreturn was performance.  Is performance
a veto of adding an mm switch to sigreturn, or it is possible to make it
acceptible?

Also, is a new sigreturn_mm() reasonable?  This would be close to sys_switchmm,
except that it would restore registers and would be a plug replacement for
sigreturn[_rt].  I don't favor this because it would probably have to choose 
whether to be an _rt return or not, and I'd like the option of having UML 
register some signals as SA_INFO (currently, they are all non SA_INFO).

Comments, brickbats, spanners?

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02  4:40 ` Jeff Dike
@ 2002-08-02  9:50   ` Alan Cox
  2002-08-02 18:28     ` Jeff Dike
  2002-08-02 11:34   ` Richard Zidlicky
  1 sibling, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-02  9:50 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> can't be an interrupt in the middle of that sequence.  So, sys_switchmm
> would also have to restore the old signal mask, which you'd have to pass
> in unless you're going to read it off the signal frame.  Also, it would
> have to be open coded because you've already restored the stack pointer.

Uggh.. you are right. You end up needing sigreturn handling

> Your objection to returning through sigreturn was performance.  Is performance
> a veto of adding an mm switch to sigreturn, or it is possible to make it
> acceptible?

Its not a veto. I was trying to avoid having to add any more branches to
the fast paths in the kernel.  The remaining sigreturn question is 
"how do you get into 'user' mode the first time"

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02  9:50   ` Alan Cox
@ 2002-08-02 18:28     ` Jeff Dike
  2002-08-02 17:48       ` Alan Cox
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-02 18:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@redhat.com said:
> Its not a veto. I was trying to avoid having to add any more branches
> to the fast paths in the kernel.  

Unless I'm missing something, a test for altmm and a branch to out of line
mm switching should be about three instructions on x86 including a correctly
predicted branch not taken in the non-altmm case.

> The remaining sigreturn question is
> "how do you get into 'user' mode the first time"

Last night I told you it was by building a signal frame by hand and returning
through it.  That's no longer true.  Now, every UML thread (except the idle
thread, which I think can be reasonable expected not to try to enter userspace)
is in a host signal handler when in the kernel.

All entrances to userspace happen by returning through that signal frame.
Special userspace returns (exec and fork et al) fiddle the sigcontext in
that frame beforehand.  Normal system calls stuff the return value in the
appropriate slot in the sigcontext before returning, as well.

So, there's nothing special about entering userspace for the first time.
Everything is under a signal frame, so any time something needs to enter
userspace, it just returns through it.

> This switches to the altmm (creating one if it doesnt exist as a copy
> of the current mm)

About this business of creating a UML kernel address space for each UML
user thread - I prefer to have a single kernel address space to which all
signals are delivered.

This has the slight disadvantage that the process address space isn't directly
accessible, but I can live with that.  A virt_to_phys translation isn't too
painful.

A single separate kernel address space has the following attractions for me:
	there are some cases where 3G of KVA would be very useful
	it would make the UML kernel completely invisible to processes, which
is important for honeypots
	apps which consume huge amounts of VM might run on the host, but
crap out inside a UML

This raises the question of how the process address spaces are created.  For
a variety of reasons unrelated to altmm (which I can go into if anyone's
interested), I want address spaces to be separate user-visible objects.  

You'd create a new empty one by opening /proc/new-mm or something and get
back a file descriptor as a handle to it.  mmap/munmap/mprotect would be
extended to take a file descriptor pointing to the address space to be
changed.

So, altmm would look like this:

When it starts up, UML would call sigaltmm, passing a descriptor to its own 
address space and register its signal handlers with a new flag, SA_IN_MM.

sigaction would have an mm field in which this descriptor would be put (and
would contain -1 in the non-altmm case).

The sigcontext would have an extra int in it which would be the descriptor
of the address space to which sigreturn will return.

Like now, UML would arrange that everything is under a host signal handler.
When it enters userspace it would change the address space fd in the sigcontext
if necessary.

Does this sound sane?

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02 18:28     ` Jeff Dike
@ 2002-08-02 17:48       ` Alan Cox
  2002-08-02 22:33         ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-02 17:48 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> So, there's nothing special about entering userspace for the first time.
> Everything is under a signal frame, so any time something needs to enter
> userspace, it just returns through it.

Ok

> This has the slight disadvantage that the process address space isn't directly
> accessible, but I can live with that.  A virt_to_phys translation isn't too
> painful.

Right

> This raises the question of how the process address spaces are created.  For
> a variety of reasons unrelated to altmm (which I can go into if anyone's
> interested), I want address spaces to be separate user-visible objects.  

That really makes all the existing code not work with it. Doing an altmm
is easy in the sense that it doesn't require 20 new syscall and doesnt
slow down the main kernel paths for a single odd case.

I can see why there is a need to manipulate the other mm I need to think
about the right way to handle it.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02 17:48       ` Alan Cox
@ 2002-08-02 22:33         ` Jeff Dike
  2002-08-02 21:57           ` Alan Cox
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-02 22:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@redhat.com said:
> That really makes all the existing code not work with it.

Can you be more specific?  If you're thinking I'm talking about breaking
mmap, munmap, and mprotect by adding another argument, I'm not.  I'm talking
about adding new syscalls, mmap2, munmap2, mprotect2 (or something more
imaginative), which have the extra argument, having them take -1 as meaning
"fiddle the current address space" and pursuading libc to use them instead
of the current syscalls.  Then we would start the current ones on their way
to the happy syscall hunting grounds in the sky.

> Doing an altmm is easy in the sense that it doesn't require 20 new
> syscall

I don't think I mentioned 20 new syscalls anywhere :-)  If you count the
ones above as replacements and not new, I'm talking about one new syscall -
switch_mm(), which I didn't mention before, that would switch to a given
address space.  This would be the basis of UML's switch_mm.

> and doesnt slow down the main kernel paths for a single odd
> case.

Which main kernel paths are you referring to here?

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02 22:33         ` Jeff Dike
@ 2002-08-02 21:57           ` Alan Cox
  2002-08-03  0:54             ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-02 21:57 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> mmap, munmap, and mprotect by adding another argument, I'm not.  I'm talking
> about adding new syscalls, mmap2, munmap2, mprotect2 (or something more
> imaginative), which have the extra argument, having them take -1 as meaning
> "fiddle the current address space" and pursuading libc to use them instead
> of the current syscalls.  Then we would start the current ones on their way
> to the happy syscall hunting grounds in the sky.

Thats a lot more invasive than I want to be

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02 21:57           ` Alan Cox
@ 2002-08-03  0:54             ` Jeff Dike
  0 siblings, 0 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-03  0:54 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@redhat.com said:
> Thats a lot more invasive than I want to be 

OK, that was my best thinking on the subject.  I'll be interested to see what
you like.

				Jeff


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02  4:40 ` Jeff Dike
  2002-08-02  9:50   ` Alan Cox
@ 2002-08-02 11:34   ` Richard Zidlicky
  2002-08-02 13:28     ` Alan Cox
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Zidlicky @ 2002-08-02 11:34 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

On Thu, Aug 01, 2002 at 11:40:28PM -0500, Jeff Dike wrote:
> 
> Your objection to returning through sigreturn was performance.  Is performance
> a veto of adding an mm switch to sigreturn, or it is possible to make it
> acceptible?

I have once ported Basilisk to work native on linux-m68k. It works
*slow* so I looked what the problem is - the signal delivery in
Linux is exorbitantly slow. Eg an SIGILL delivery costs ~ 1650 cycles 
on a 68060, compared to that sigreturn and getpid are 200-250 and 
sched_yield with context switch around 400.

So sigreturn is not the place I would be looking for the biggest
speedups.

Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Accelerating user mode linux
  2002-08-02 11:34   ` Richard Zidlicky
@ 2002-08-02 13:28     ` Alan Cox
  2002-08-03 11:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
  0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-02 13:28 UTC (permalink / raw)
  To: Richard Zidlicky; +Cc: Jeff Dike, Alan Cox, linux-kernel

On Fri, 2002-08-02 at 12:34, Richard Zidlicky wrote:
> I have once ported Basilisk to work native on linux-m68k. It works
> *slow* so I looked what the problem is - the signal delivery in
> Linux is exorbitantly slow. Eg an SIGILL delivery costs ~ 1650 cycles 
> on a 68060, compared to that sigreturn and getpid are 200-250 and 
> sched_yield with context switch around 400.

The numbers look very different on a real processor. Signal delivery is
indeed not stunningly fast but relative to a context switch its very low
indeed.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-02 13:28     ` Alan Cox
@ 2002-08-03 11:38       ` Ingo Molnar
  2002-08-03 12:33         ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox
  2002-08-04  6:46         ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
  0 siblings, 2 replies; 56+ messages in thread
From: Ingo Molnar @ 2002-08-03 11:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

On 2 Aug 2002, Alan Cox wrote:

> The numbers look very different on a real processor. Signal delivery is
> indeed not stunningly fast but relative to a context switch its very low
> indeed.

actually the opposite is true, on a 2.2 GHz P4:

  $ ./lat_sig catch
  Signal handler overhead: 3.091 microseconds

  $ ./lat_ctx -s 0 2
  2 0.90

ie. *process to process* context switches are 3.4 times faster than signal
delivery. Ie. we can switch to a helper thread and back, and still be
faster than a *single* signal.

signals are in essence 'lightweight' threads created and destroyed for the
purpose of a single asynchronous event, it's IMO a very inefficient and
baroque concept for almost anything (but debugging and a number of very
special uses). I'd guess that with a sane threading library a helper
thread is faster for almost everything.

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-03 11:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
@ 2002-08-03 12:33         ` Alan Cox
  2002-08-03 15:29           ` Jeff Dike
  2002-08-04  6:46         ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
  1 sibling, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-03 12:33 UTC (permalink / raw)
  To: mingo; +Cc: Alan Cox, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

> actually the opposite is true, on a 2.2 GHz P4:
> 
>   $ ./lat_sig catch
>   Signal handler overhead: 3.091 microseconds
> 
>   $ ./lat_ctx -s 0 2
>   2 0.90
> 
> ie. *process to process* context switches are 3.4 times faster than signal
> delivery. Ie. we can switch to a helper thread and back, and still be
> faster than a *single* signal.

Thats interesting indeed. I'd not tried it with the O(1) scheduler.

> signals are in essence 'lightweight' threads created and destroyed for the
> purpose of a single asynchronous event, it's IMO a very inefficient and
> baroque concept for almost anything (but debugging and a number of very
> special uses). I'd guess that with a sane threading library a helper
> thread is faster for almost everything.

Which would argue UML ought to have a positively microkernel view of
syscalls - sending a message ?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-03 12:33         ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox
@ 2002-08-03 15:29           ` Jeff Dike
  2002-08-05 13:46             ` Udo A. Steinberg
  2002-08-05 22:06             ` Martin Waitz
  0 siblings, 2 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-03 15:29 UTC (permalink / raw)
  To: Alan Cox, mingo; +Cc: Richard Zidlicky, linux-kernel

alan@redhat.com said:
> Which would argue UML ought to have a positively microkernel view of
> syscalls - sending a message ? 

Indeed.  Ingo's mail got me thinking that

alan@redhat.com said:
> the alternatives like a seperate process and ptrace are not pretty either

might not be so bad after all.

All I would need to make this work is for one process to be able to change
the mm of another.

Then, the current UML tracing thread would handle the kernel side of things
and sit in its own address space nicely protected from its processes.

				Jeff


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-03 15:29           ` Jeff Dike
@ 2002-08-05 13:46             ` Udo A. Steinberg
  2002-08-05 20:44               ` Richard Zidlicky
  2002-08-05 22:06             ` Martin Waitz
  1 sibling, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-05 13:46 UTC (permalink / raw)
  To: Jeff Dike; +Cc: alan, mingo, rz, linux-kernel

On Sat, 03 Aug 2002 10:29:42 -0500
Jeff Dike <jdike@karaya.com> wrote:

> alan@redhat.com said:
> > the alternatives like a seperate process and ptrace are not pretty either

I have implemented a usermode version of the Fiasco µ-kernel that uses
a seperate process for the kernel and one process for each task. The kernel
process attaches to all tasks via ptrace.
When the kernel wants to change the MM of a task it puts some trampoline code
on a page mapped into each task's address space and has the task execute that
code on behalf of the kernel.
With that setup we have complete address space protection without all the
trouble of jail at the expense of a few context switches for each mmap, munmap
or mprotect operation.

I would also very much like an extension that would allow one process to modify
the MM of another, possibly via an extended ptrace interface or a new syscall.
Also it would be nice if there was an alternate way to get at the cr2 register,
trap number and error code other than from a SIGSEGV handler.

> All I would need to make this work is for one process to be able to change
> the mm of another.

Yes, exactly.

> Then, the current UML tracing thread would handle the kernel side of things
> and sit in its own address space nicely protected from its processes.

Yes. I already have this part working for our kernel, so it's not just theory.
I believe things could run yet another bit faster if we didn't have to do the
trampoline map operations.

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-05 13:46             ` Udo A. Steinberg
@ 2002-08-05 20:44               ` Richard Zidlicky
  2002-08-05 22:34                 ` Udo A. Steinberg
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Zidlicky @ 2002-08-05 20:44 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Jeff Dike, alan, mingo, linux-kernel

On Mon, Aug 05, 2002 at 03:46:07PM +0200, Udo A. Steinberg wrote:
> On Sat, 03 Aug 2002 10:29:42 -0500
> Jeff Dike <jdike@karaya.com> wrote:
> 
> > alan@redhat.com said:
> > > the alternatives like a seperate process and ptrace are not pretty either
> 
> I have implemented a usermode version of the Fiasco µ-kernel that uses
> a seperate process for the kernel and one process for each task. The kernel
> process attaches to all tasks via ptrace.
> When the kernel wants to change the MM of a task it puts some trampoline code
> on a page mapped into each task's address space and has the task execute that
> code on behalf of the kernel.
> With that setup we have complete address space protection without all the
> trouble of jail at the expense of a few context switches for each mmap, munmap
> or mprotect operation.

very interesting, what is the handiest way to do "syscalls" in this model?
Ptrace is still basically signal driven so I would expect it has still some 
unnecessary overhead?

> I would also very much like an extension that would allow one process to modify
> the MM of another, possibly via an extended ptrace interface or a new syscall.
> Also it would be nice if there was an alternate way to get at the cr2 register,
> trap number and error code other than from a SIGSEGV handler.

that's what signals are for, too bad they are slow.

> > Then, the current UML tracing thread would handle the kernel side of things
> > and sit in its own address space nicely protected from its processes.
> 
> Yes. I already have this part working for our kernel, so it's not just theory.
> I believe things could run yet another bit faster if we didn't have to do the
> trampoline map operations.

they are very expensive because of the way ptrace accesses the other process
memory, did you try a piece of shared memory ?

Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-05 20:44               ` Richard Zidlicky
@ 2002-08-05 22:34                 ` Udo A. Steinberg
  2002-08-06  0:42                   ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-05 22:34 UTC (permalink / raw)
  To: Richard Zidlicky; +Cc: jdike, alan, mingo, linux-kernel

On Mon, 5 Aug 2002 22:44:15 +0200
Richard Zidlicky <rz@linux-m68k.org> wrote:

> very interesting, what is the handiest way to do "syscalls" in this model?
> Ptrace is still basically signal driven so I would expect it has still some 
> unnecessary overhead?

Task wants to do a syscall (i.e. int 0x30 in Fiasco), the kernel process tracing
the task sees the signal in its SIGCHLD handler. It pulls the registers out of the
task's address space using PTRACE_GETREGS and sets up an interrupt frame on the
kernel stack. EIP and ESP in the saved signal context are frobbed in a way that
the signal handler falls right into the correct interrupt gate when it returns.
iret works the other way round. SIGSEGV handler in the kernel process copies registers
back to task and restarts the task's process after restoring kernel state.

> > I would also very much like an extension that would allow one process to modify
> > the MM of another, possibly via an extended ptrace interface or a new syscall.
> > Also it would be nice if there was an alternate way to get at the cr2 register,
> > trap number and error code other than from a SIGSEGV handler.
> 
> that's what signals are for, too bad they are slow.

As it is now, in order to get at the page fault address one has to invoke a SIGSEGV
handler in the task, then look at the task's signal context to determine the pagefault
address, trapno etc. It would be much faster if the kernel could cancel the SIGSEGV signal
in the task's process and read out the the pagefault info from the TCB via a ptrace
extension. Saves the cost of a running a signal handler in the task and a bunch of context
switches.

> they are very expensive because of the way ptrace accesses the other process
> memory, did you try a piece of shared memory ?

Yes, trampoline page is shared between kernel and task. Nevertheless there are
context switches that wouldn't be necessary if the kernel could tweak the task's
mm directly.

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-05 22:34                 ` Udo A. Steinberg
@ 2002-08-06  0:42                   ` Jeff Dike
  2002-08-06  0:16                     ` Udo A. Steinberg
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-06  0:42 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Richard Zidlicky, alan, mingo, linux-kernel

us15@os.inf.tu-dresden.de said:
> Task wants to do a syscall (i.e. int 0x30 in Fiasco), the kernel
> process tracing the task sees the signal in its SIGCHLD handler. It
> pulls the registers out of the task's address space using
> PTRACE_GETREGS and sets up an interrupt frame on the kernel stack.

Hmmm, I would have the kernel process let the system call bump it out of
wait() rather than delivering a SIGCHLD.  And, I'd be inclined to lomgjmp
over to the kernel stack.

Or, even better, have it already running on the appropriate kernel stack,
so it can just read the system call from PTRACE_GETREGS and call into the
main kernel.

Similarly, with other signals, like the timer, SIGIO, or page faults, it
would just annull the signal and call into the IRQ system.  Although page 
faults will be difficult because of the inability to read err or cr3, as 
you've pointed out.

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06  0:42                   ` Jeff Dike
@ 2002-08-06  0:16                     ` Udo A. Steinberg
  2002-08-06  2:55                       ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-06  0:16 UTC (permalink / raw)
  To: Jeff Dike; +Cc: rz, alan, mingo, linux-kernel

On Mon, 05 Aug 2002 19:42:31 -0500
Jeff Dike <jdike@karaya.com> wrote:

> Similarly, with other signals, like the timer, SIGIO, or page faults, it
> would just annull the signal and call into the IRQ system.  Although page 
> faults will be difficult because of the inability to read err or cr3, as 
> you've pointed out.

Jeff,

If my understanding of UML is right, you implement interrupts with socket
pairs where the interrupt handler writes a byte into one end and the other
end receives an async notification (SIGIO). In order to stop the right task
with a SIGIO, you change the socket owner on each context switch using fcntl.

If you have one process per task and a kernel process, the kernel process
cannot change socket ownership over to the next task's process, because it's
not allowed to. Only the process itself could set the ownership to his pid,
but then each task switch would have to be done with a trampoline too.

The issue boils down to how the kernel process can stop a task process in
order to force the task into kernel. You can of course kill (taskpid, SIG)
but that has a race if the task tries to enter kernel at the same time.
SIG will be pending in the task until it is scheduled next.

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06  0:16                     ` Udo A. Steinberg
@ 2002-08-06  2:55                       ` Jeff Dike
  2002-08-06  8:10                         ` Udo A. Steinberg
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-06  2:55 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: rz, alan, mingo, linux-kernel

us15@os.inf.tu-dresden.de said:
> If my understanding of UML is right, you implement interrupts with
> socket pairs where the interrupt handler writes a byte into one end
> and the other end receives an async notification (SIGIO). 

It sounds like you're confusing two mechanisms.  Device interrupts are 
implemented with something that supports SIGIO (socketpair, tty) with one
end outside UML and one end inside UML generating the SIGIOs.

I use socketpairs in the way you describe to implement context switching.
Out-of-context processes are sleeping in a read on their socket, and are
woken up by an soon-to-be-out-of-context process writing a byte down it.
There's no SIGIO there at all.

I also use socketpairs with SIGIO to implement IPIs on SMP UML.

> In order to
> stop the right task with a SIGIO, you change the socket owner on each
> context switch using fcntl. 

Yup.  More precisely, in order to ensure that the correct process receives
SIGIO when input comes in from the outside, I F_SETOWN the descriptors to
the incoming process during a context switch.

> If you have one process per task and a kernel process, the kernel
> process cannot change socket ownership over to the next task's
> process, because it's not allowed to.

Why not?  I see nothing at all in the implementation of F_SETOWN that requires
that it be called by the current owner:

		case F_SETOWN:
			lock_kernel();
			filp->f_owner.pid = arg;
			filp->f_owner.uid = current->uid;
			filp->f_owner.euid = current->euid;
			...

There are no general checks earlier in do_fcntl or sys_fcntl either.

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06  2:55                       ` Jeff Dike
@ 2002-08-06  8:10                         ` Udo A. Steinberg
  2002-08-06 11:20                           ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-06  8:10 UTC (permalink / raw)
  To: Jeff Dike; +Cc: rz, alan, mingo, linux-kernel

On Mon, 05 Aug 2002 21:55:05 -0500
> 
> > If you have one process per task and a kernel process, the kernel
> > process cannot change socket ownership over to the next task's
> > process, because it's not allowed to.
> 
> Why not?  I see nothing at all in the implementation of F_SETOWN that requires
> that it be called by the current owner:
> 
> 		case F_SETOWN:
> 			lock_kernel();
> 			filp->f_owner.pid = arg;
> 			filp->f_owner.uid = current->uid;
> 			filp->f_owner.euid = current->euid;
> 			...

Ok, I was looking at sockets and not tty's and that has the following in
net/core/sock.c
                   case F_SETOWN:
                        /*
                         * This is a little restrictive, but it's the only
                         * way to make sure that you can't send a sigurg to
                         * another process.
                         */
                        if (current->pgrp != -arg &&
                                current->pid != arg &&
                                !capable(CAP_KILL)) return(-EPERM);
                        sk->proc = arg;
                        return(0);

So it wouldn't work with socketpairs, but with tty's it should.

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06  8:10                         ` Udo A. Steinberg
@ 2002-08-06 11:20                           ` Jeff Dike
  2002-08-06 11:13                             ` Udo A. Steinberg
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-06 11:20 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: rz, alan, mingo, linux-kernel

us15@os.inf.tu-dresden.de said:
>                         if (current->pgrp != -arg &&
>                                 current->pid != arg &&
>                                 !capable(CAP_KILL)) return(-EPERM); 

What's the problem here?  This will let UML do F_SETOWN as well.

				Jeff


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 11:20                           ` Jeff Dike
@ 2002-08-06 11:13                             ` Udo A. Steinberg
  2002-08-06 12:53                               ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-06 11:13 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

On Tue, 06 Aug 2002 06:20:52 -0500
Jeff Dike <jdike@karaya.com> wrote:

> us15@os.inf.tu-dresden.de said:
> >                         if (current->pgrp != -arg &&
> >                                 current->pid != arg &&
> >                                 !capable(CAP_KILL)) return(-EPERM); 
> 
> What's the problem here?  This will let UML do F_SETOWN as well.

It will let the incoming process take over ownership of the socket,
which is probably what you mean and what you currently use.

I'm talking about a setup with the kernel residing in its own process.
On iret it would have to change ownership of the socket to another task,
i.e. process with kernel_pid wants to set task_pid as the owner of the
socket. The above code fragment doesn't permit this, as far as I can see.
What it does permit is the incoming task setting itself to the socket
owner, but that requires that the incoming task always runs a trampoline
first which accomplishes that.

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 11:13                             ` Udo A. Steinberg
@ 2002-08-06 12:53                               ` Jeff Dike
  2002-08-06 13:04                                 ` Udo A. Steinberg
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-06 12:53 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel

us15@os.inf.tu-dresden.de said:
> It will let the incoming process take over ownership of the socket,
> which is probably what you mean and what you currently use. 

Yup.

> On iret it would have to change ownership of the socket to another
> task, i.e. process with kernel_pid wants to set task_pid as the owner
> of the socket. The above code fragment doesn't permit this, as far as
> I can see.

Why not?  There is nothing there that prevents that.

				Jeff


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 12:53                               ` Jeff Dike
@ 2002-08-06 13:04                                 ` Udo A. Steinberg
  2002-08-06 14:12                                   ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-06 13:04 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

On Tue, 06 Aug 2002 08:53:24 -0400
Jeff Dike <jdike@karaya.com> wrote:

> > On iret it would have to change ownership of the socket to another
> > task, i.e. process with kernel_pid wants to set task_pid as the owner
> > of the socket. The above code fragment doesn't permit this, as far as
> > I can see.
> 
> Why not?  There is nothing there that prevents that.

In the following code the parent (i.e. kernel) tries to set the child (i.e. task)
as owner for the socket. Does this work for you? It doesn't for me, for the
reason I described earlier.

#include <sys/types.h>
#include <sys/socket.h>
#include <fcntl.h>
#include <unistd.h>

int main (void) {

  int sockets[2], flags;
  pid_t pid;

  if (socketpair (AF_UNIX, SOCK_STREAM, 0, sockets)) {
    perror ("socketpair");
    return -1;
  }

  switch (pid = fork ()) {

    case -1:
      perror ("fork");
      return -1;

    case 0:
      pause ();

    default:
      if ((flags = fcntl (sockets[0], F_GETFL)) < 0) {
        perror ("fcntl, GETFL");
        return -1;
      }
      if (fcntl (sockets[0], F_SETFL, flags | O_NONBLOCK | O_ASYNC) < 0) {
        perror ("fcntl, SETFL");
        return -1;
      }
      if (fcntl (sockets[0], F_SETOWN, pid) < 0) {
        perror ("fcntl, SETOWN");
        return -1;
      }
  }

  return 0;
}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 13:04                                 ` Udo A. Steinberg
@ 2002-08-06 14:12                                   ` Jeff Dike
  2002-08-06 16:02                                     ` Udo A. Steinberg
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Dike @ 2002-08-06 14:12 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel

us15@os.inf.tu-dresden.de said:
> Does this work for you? 

No :-)

> It doesn't for me, for the reason I described
> earlier.

Indeed.  I misread the !capable(CAP_KILL) as "I am not allowed to kill the
other guy", which clearly you are when you just forked it.

This looks like a bug to me.  If you own the process, you can send it any
signal you want, so you should be allowed to sign it up for SIGURG/SIGIO via
F_SETOWN.

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 14:12                                   ` Jeff Dike
@ 2002-08-06 16:02                                     ` Udo A. Steinberg
  2002-08-06 17:42                                       ` Jeff Dike
  0 siblings, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-06 16:02 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

On Tue, 06 Aug 2002 10:12:25 -0400
Jeff Dike <jdike@karaya.com> wrote:

> Indeed.  I misread the !capable(CAP_KILL) as "I am not allowed to kill the
> other guy", which clearly you are when you just forked it.
> This looks like a bug to me.  If you own the process, you can send it any
> signal you want, so you should be allowed to sign it up for SIGURG/SIGIO via
> F_SETOWN.

I'm glad we agree on that one :)

Considering we're not using sockets with broken SIGIO, but pseudo-terminals
like UML instead, there's still a problem:

When the task is registered as socket owner and is just about to enter the
kernel due to a syscall, it will stop with a SIGTRAP and the tracing kernel
process will run sometime and see a SIGCHLD. But after the task stopped and
before the kernel process can change SIGIO ownership back, a new interrupt
could come in and the SIGIO would remain pending in the task's process until
the task was scheduled to run next time.

How do you solve this?

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 16:02                                     ` Udo A. Steinberg
@ 2002-08-06 17:42                                       ` Jeff Dike
  2002-08-06 18:01                                         ` Udo A. Steinberg
  2002-08-08  1:27                                         ` Udo A. Steinberg
  0 siblings, 2 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-06 17:42 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel

us15@os.inf.tu-dresden.de said:
> I'm glad we agree on that one :) 

Yup, sorry.  That test is wrong, and is slated to be fixed at some point.

> When the task is registered as socket owner and is just about to enter
> the kernel due to a syscall, it will stop with a SIGTRAP and the
> tracing kernel process will run sometime and see a SIGCHLD. But after
> the task stopped and before the kernel process can change SIGIO
> ownership back, a new interrupt could come in and the SIGIO would
> remain pending in the task's process until the task was scheduled to
> run next time.
>
> How do you solve this?

A couple of ways.  The system call path can call sigio_handler to clear
out any pending IO.  The SIGIO that was trapped in the process will cause
another call to sigio_handler which won't turn up any IO, but I don't 
consider that to be a problem.

The kernel process can examine the signal pending mask of the process after
it has transferred SIGIO to itself.  This can be done either through 
/proc/<pid>/status or a ptrace extension, since we're happily postulating 
new things for it to do anyway.  If there is a SIGIO pending, it calls
sigio_handler.

Any other possibilities that you see?

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 17:42                                       ` Jeff Dike
@ 2002-08-06 18:01                                         ` Udo A. Steinberg
  2002-08-08  1:27                                         ` Udo A. Steinberg
  1 sibling, 0 replies; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-06 18:01 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

On Tue, 06 Aug 2002 13:42:18 -0400
Jeff Dike <jdike@karaya.com> wrote:

> A couple of ways.  The system call path can call sigio_handler to clear
> out any pending IO.  The SIGIO that was trapped in the process will cause
> another call to sigio_handler which won't turn up any IO, but I don't 
> consider that to be a problem.

It is not a problem at all, just a small performance penalty.

> The kernel process can examine the signal pending mask of the process after
> it has transferred SIGIO to itself.  This can be done either through 
> /proc/<pid>/status or a ptrace extension, since we're happily postulating 
> new things for it to do anyway.  If there is a SIGIO pending, it calls
> sigio_handler.

I don't like the idea of having to fiddle with the proc filesystem. Some
people might not even mount it. A ptrace extension to look at and modify
the pending signal mask of a traced process would be very handy.

> Any other possibilities that you see?

Right now I'm doing something hackish. If the process enters with a syscall
(int 0x30 in my case) after the kernel expects it to enter due to an interrupt,
I just restart the task until it enters with the pending interrupt signal (SIGIO).
The task will do that before it can step on the int instruction again, and after
it returns to usermode it will step on the int again. This works well with faults.
The problem are traps, because the EIP points behind the instruction. In that
case the EIP needs to be adjusted. Ugly, I know.

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-06 17:42                                       ` Jeff Dike
  2002-08-06 18:01                                         ` Udo A. Steinberg
@ 2002-08-08  1:27                                         ` Udo A. Steinberg
  2002-08-08  3:14                                           ` Jeff Dike
  1 sibling, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-08  1:27 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

On Tue, 06 Aug 2002 13:42:18 -0400
Jeff Dike <jdike@karaya.com> wrote:
> 
> The kernel process can examine the signal pending mask of the process after
> it has transferred SIGIO to itself.  This can be done either through 
> /proc/<pid>/status or a ptrace extension, since we're happily postulating 
> new things for it to do anyway.  If there is a SIGIO pending, it calls
> sigio_handler.
> 
> Any other possibilities that you see?

Another possibility could be the kernel process and the task processes sharing
a pending signal queue, either for one particular signal or all signals. The
kernel process would block SIGIO while the task runs and when the task enters
kernel mode with a SIGIO still trapped in the task process, SIGIO would get
delivered in the kernel and cleared from the shared pending queue, which is
just what we want.

Someone actually already tried implementing it with a clone extension, see
http://www.rhdv.cistron.nl/sigqueue.html

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-08  1:27                                         ` Udo A. Steinberg
@ 2002-08-08  3:14                                           ` Jeff Dike
  2002-08-08  2:21                                             ` Benjamin LaHaise
  2002-08-08  9:03                                             ` Udo A. Steinberg
  0 siblings, 2 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-08  3:14 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel

us15@os.inf.tu-dresden.de said:
> SIGIO would get delivered in the kernel and cleared from the shared
> pending queue, which is just what we want.

Not really.  What we really want is for signals not to be delivered at all.

That's why the ptrace signal annulling capability is nice.

I'm not sure if this makes any sense, but coupling the new aio mechanism with
something that queues up siginfos might be interesting.  It would be a magic
descriptor that would feed you signals when you read it.

Is that at all sane?

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-08  3:14                                           ` Jeff Dike
@ 2002-08-08  2:21                                             ` Benjamin LaHaise
  2002-08-08  9:03                                             ` Udo A. Steinberg
  1 sibling, 0 replies; 56+ messages in thread
From: Benjamin LaHaise @ 2002-08-08  2:21 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Udo A. Steinberg, linux-kernel

On Wed, Aug 07, 2002 at 10:14:42PM -0500, Jeff Dike wrote:
> I'm not sure if this makes any sense, but coupling the new aio mechanism with
> something that queues up siginfos might be interesting.  It would be a magic
> descriptor that would feed you signals when you read it.
> 
> Is that at all sane?

Delivering signals from aio completion is indeed possible.  There is 
even a field in the iocb structure for doing this in order to provide 
complete posix compatibility (well, except for the fact that structure 
initialization is enforced).

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-08  3:14                                           ` Jeff Dike
  2002-08-08  2:21                                             ` Benjamin LaHaise
@ 2002-08-08  9:03                                             ` Udo A. Steinberg
  2002-08-08 17:19                                               ` Jeff Dike
  1 sibling, 1 reply; 56+ messages in thread
From: Udo A. Steinberg @ 2002-08-08  9:03 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

On Wed, 07 Aug 2002 22:14:42 -0500
Jeff Dike <jdike@karaya.com> wrote:
> 
> Not really.  What we really want is for signals not to be delivered at all.
> That's why the ptrace signal annulling capability is nice.
> 
> I'm not sure if this makes any sense, but coupling the new aio mechanism with
> something that queues up siginfos might be interesting.  It would be a magic
> descriptor that would feed you signals when you read it.
> 
> Is that at all sane?

I know that we're trying to avoid signal handlers, because they are expensive.
But the signal would not need to be delivered in the task. We need a mechanism to
stop the task and force it into kernel. The task is uncooperative and doesn't
dequeue signals itself. When it gets a signal it stops. The kernel then sees the
signal and accepts it using sigwaitinfo, at which point it is no longer pending
in the task either. The siginfo structure then provides the necessary info,
i.e. which fd caused the i/o.

When running in a kernel context, you actually need to deliver SIGIO in order
to interrupt the current context.

If you have a magic aio descriptor, how does the task process read signals
from it and stop?

-Udo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-08  9:03                                             ` Udo A. Steinberg
@ 2002-08-08 17:19                                               ` Jeff Dike
  0 siblings, 0 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-08 17:19 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel

us15@os.inf.tu-dresden.de said:
> The task is uncooperative and doesn't dequeue signals itself. When it
> gets a signal it stops. The kernel then sees the signal and accepts it
> using sigwaitinfo, at which point it is no longer pending in the task
> either. The siginfo structure then provides the necessary info, i.e.
> which fd caused the i/o.

I think this is more or less what I had in mind.  The thing that is missing
is for sigwaitinfo to be able to dequeue another process' signals, which is
where the shared signal queue would come in.

> If you have a magic aio descriptor, how does the task process read
> signals from it and stop? 

I was looking at this as a way of dequeueing signals from the other process.
The task process would have the signal queued and wake up the kernel process
as happens now.  The kernel process would have /proc/<task-pid>/sigqueue
or something opened and would read siginfos from it.  Those would then be 
dequeued from the task process.

This almost suffices for getting page fault information, except that, for
some reason, siginfo doesn't say whether the faulting access was a read or
a write.

And now that I'm thinking about it, aio doesn't really come into it.  This
would be strictly synchronous.

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-03 15:29           ` Jeff Dike
  2002-08-05 13:46             ` Udo A. Steinberg
@ 2002-08-05 22:06             ` Martin Waitz
  2002-08-06  0:49               ` Jeff Dike
  1 sibling, 1 reply; 56+ messages in thread
From: Martin Waitz @ 2002-08-05 22:06 UTC (permalink / raw)
  To: linux-kernel, Jeff Dike

[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

hi :)

On Sat, Aug 03, 2002 at 10:29:42AM -0500, Jeff Dike wrote:
> alan@redhat.com said:
> > the alternatives like a seperate process and ptrace are not pretty either
> 
> might not be so bad after all.

there is already a group at our university doing that:
http://www3.informatik.uni-erlangen.de/Research/Projects/UMLinux/umlinux.html

-- 
CU,		  / Friedrich-Alexander University Erlangen, Germany
Martin Waitz	//  [Tali on IRCnet]  [tali.home.pages.de] _________
______________/// - - - - - - - - - - - - - - - - - - - - ///
dies ist eine manuell generierte mail, sie beinhaltet    //
tippfehler und ist auch ohne grossbuchstaben gueltig.   /
			    -
Wer bereit ist, grundlegende Freiheiten aufzugeben, um sich 
kurzfristige Sicherheit zu verschaffen, der hat weder Freiheit 
noch Sicherheit verdient.
			Benjamin Franklin  (1706 - 1790)

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode
  2002-08-05 22:06             ` Martin Waitz
@ 2002-08-06  0:49               ` Jeff Dike
  0 siblings, 0 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-06  0:49 UTC (permalink / raw)
  To: Martin Waitz; +Cc: linux-kernel, Hans-Joerg Hoexer

tali@admingilde.org said:
> there is already a group at our university doing that: http://
> www3.informatik.uni-erlangen.de/Research/Projects/UMLinux/umlinux.html

Yeah, I know.  Hans-Joerg and I have been talking about whether and how
much it makes sense to start sharing code.

				Jeff


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-03 11:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
  2002-08-03 12:33         ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox
@ 2002-08-04  6:46         ` Andi Kleen
  2002-08-05  5:35           ` Linus Torvalds
  2002-08-05 10:40           ` Ingo Molnar
  1 sibling, 2 replies; 56+ messages in thread
From: Andi Kleen @ 2002-08-04  6:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> actually the opposite is true, on a 2.2 GHz P4:
> 
>   $ ./lat_sig catch
>   Signal handler overhead: 3.091 microseconds
> 
>   $ ./lat_ctx -s 0 2
>   2 0.90
> 
> ie. *process to process* context switches are 3.4 times faster than signal
> delivery. Ie. we can switch to a helper thread and back, and still be
> faster than a *single* signal.

This is because the signal save/restore does a lot of unnecessary stuff.
One optimization I implemented at one time was adding a SA_NOFP signal
bit that told the kernel that the signal handler did not intend 
to modify floating point state (few signal handlers need FP) It would 
not save the FPU state then and reached quite some speedup in signal
latency. 

Linux got a lot slower in signal delivery when the SSE2 support was
added. That got this speed back.

The target were certain applications that use signal handlers for async
IO. 

If there is interest I can dig up the old patches. They were really simple.

x86-64 does it also faster by FXSAVE'ing directly to the user space
frame with exception handling instead of copying manually. But that's
not possible in i386 because it still has to use the baroque iBCS 
FP context format on the stack.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-04  6:46         ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
@ 2002-08-05  5:35           ` Linus Torvalds
  2002-08-05  5:42             ` Arnaldo Carvalho de Melo
                               ` (3 more replies)
  2002-08-05 10:40           ` Ingo Molnar
  1 sibling, 4 replies; 56+ messages in thread
From: Linus Torvalds @ 2002-08-05  5:35 UTC (permalink / raw)
  To: linux-kernel

In article <m3u1mb5df3.fsf@averell.firstfloor.org>,
Andi Kleen  <ak@muc.de> wrote:
>Ingo Molnar <mingo@elte.hu> writes:
>
>
>> actually the opposite is true, on a 2.2 GHz P4:
>> 
>>   $ ./lat_sig catch
>>   Signal handler overhead: 3.091 microseconds
>> 
>>   $ ./lat_ctx -s 0 2
>>   2 0.90
>> 
>> ie. *process to process* context switches are 3.4 times faster than signal
>> delivery. Ie. we can switch to a helper thread and back, and still be
>> faster than a *single* signal.
>
>This is because the signal save/restore does a lot of unnecessary stuff.
>One optimization I implemented at one time was adding a SA_NOFP signal
>bit that told the kernel that the signal handler did not intend 
>to modify floating point state (few signal handlers need FP) It would 
>not save the FPU state then and reached quite some speedup in signal
>latency. 
>
>Linux got a lot slower in signal delivery when the SSE2 support was
>added. That got this speed back.

This will break _horribly_ when (if) glibc starts using SSE2 for things
like memcpy() etc.

I agree that it is really sad that we have to save/restore FP on
signals, but I think it's unavoidable. Your hack may work for you, but
it just gets really dangerous in general. having signals randomly
subtly corrupt some SSE2 state just because the signal handler uses
something like memcpy (without even realizing that that could lead to
trouble) is bad, bad, bad.

In other words, "not intending to" does not imply "will not".  It's just
potentially too easy to change SSE2 state by mistake. 

And yes, this signal handler thing is clearly visible on benchmarks. 
MUCH too clearly visible.  I just didn't see any safe alternatives
(and I still don't ;( )

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35           ` Linus Torvalds
@ 2002-08-05  5:42             ` Arnaldo Carvalho de Melo
  2002-08-05  6:37             ` Lincoln Dale
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Arnaldo Carvalho de Melo @ 2002-08-05  5:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Em Mon, Aug 05, 2002 at 05:35:13AM +0000, Linus Torvalds escreveu:
> This will break _horribly_ when (if) glibc starts using SSE2 for things
> like memcpy() etc.

Humm, related, wasn't one way of having userspace have access to the kernel
optimized versions of memcpy et al, thru a page with these functions that would
be mapped into the process address space (don't remember exact details)
something still being considered?

- Arnaldo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35           ` Linus Torvalds
  2002-08-05  5:42             ` Arnaldo Carvalho de Melo
@ 2002-08-05  6:37             ` Lincoln Dale
  2002-08-05 15:39             ` Jamie Lokier
  2002-08-06  5:31             ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Mark Mielke
  3 siblings, 0 replies; 56+ messages in thread
From: Lincoln Dale @ 2002-08-05  6:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

At 05:35 AM 5/08/2002 +0000, Linus Torvalds wrote:
> >Linux got a lot slower in signal delivery when the SSE2 support was
> >added. That got this speed back.
>
>This will break _horribly_ when (if) glibc starts using SSE2 for things
>like memcpy() etc.
>
>I agree that it is really sad that we have to save/restore FP on
>signals, but I think it's unavoidable. Your hack may work for you, but
>it just gets really dangerous in general. having signals randomly
>subtly corrupt some SSE2 state just because the signal handler uses
>something like memcpy (without even realizing that that could lead to
>trouble) is bad, bad, bad.

how about putting the onus on userspace to tell the kernel if/when it uses 
extensions that require FP state to be saved/restored?
if/when glibc starts using SSE2, it could then use these extensions.

could be as simple as user-space setting some bit somewhere.

>And yes, this signal handler thing is clearly visible on benchmarks.
>MUCH too clearly visible.  I just didn't see any safe alternatives
>(and I still don't ;( )

it probably isn't worthwhile penalising all users of signal just for those 
few userspace apps that actually do use SSE2.


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35           ` Linus Torvalds
  2002-08-05  5:42             ` Arnaldo Carvalho de Melo
  2002-08-05  6:37             ` Lincoln Dale
@ 2002-08-05 15:39             ` Jamie Lokier
  2002-08-05 16:38               ` Linus Torvalds
  2002-08-06  5:31             ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Mark Mielke
  3 siblings, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2002-08-05 15:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> I agree that it is really sad that we have to save/restore FP on
> signals, but I think it's unavoidable.

Couldn't you mark the FPU as unused for the duration of the
handler, and let the lazy FPU mechanism save the state when it is used
by the signal handler?

> And yes, this signal handler thing is clearly visible on benchmarks. 
> MUCH too clearly visible.  I just didn't see any safe alternatives
> (and I still don't ;( )

I use SEGVs to trap access to read-only pages for garbage collection,
and I know I'm not the only one.  That's a lot of SEGVs...

Fwiw, I have timed SIGSEGV handling time on Linux on various Intel CPUs,
on a PA-RISC running HP-UX and on a few Sparcs running Solaris.  Linux
came out faster in all cases.  Best case: 8 microseconds to trap a page
fault, handle the SEGV and mprotect() one page (600MHz P3).  Worst case:
37 microseconds (133MHz Pentium).

That's about 5000 cycles.  I'm sure we can do better than that.

For sophisticated user space uses, like the above, I'd like to see
a trap handling mechanism that saves only the _minimum_ state.
Userspace can take care of the rest.  Maybe even without a sigreturn in
some cases.

-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 15:39             ` Jamie Lokier
@ 2002-08-05 16:38               ` Linus Torvalds
  2002-08-05 20:01                 ` context switch vs. signal delivery [was: Re: Accelerating usermode linux] Oliver Neukum
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2002-08-05 16:38 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

On Mon, 5 Aug 2002, Jamie Lokier wrote:

> Linus Torvalds wrote:
> > I agree that it is really sad that we have to save/restore FP on
> > signals, but I think it's unavoidable.
> 
> Couldn't you mark the FPU as unused for the duration of the
> handler, and let the lazy FPU mechanism save the state when it is used
> by the signal handler?

Nope. Believe me, I gave some thought to clever things to do. 

The kernel won't even _see_ a longjmp() out of a signal handler, so the
kernel has a really hard time trying to do any clever lazy stuff.

Also, people who play games with FP actually change the FP data on the
stack frame, and depend on signal return to reload it. Admittedly I've 
only ever seen this on SIGFPE, but anyway - this is all done with integer 
instructions that just touch bitpatterns on the stack.. The kernel can't 
catch it sanely.

> For sophisticated user space uses, like the above, I'd like to see
> a trap handling mechanism that saves only the _minimum_ state.

I would not mind an extra per-signal flag that says "don't bother with FP
saves" (the same way we already have "don't restart" etc), but I would be
very nervous if glibc used it by default (even if glibc doesn't use SSE2
in memcpy, gcc itself can do it, and obviously _users_ may just do it
themselves).

So it would have to be explicitly enabled with a SA_NOFPSIGHANDLER flag or 
something.

(And yes, it's the FP stuff that takes most of the time. I think the 
lmbench numbers for signal delivery tripled when that went in).

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating usermode linux]
  2002-08-05 16:38               ` Linus Torvalds
@ 2002-08-05 20:01                 ` Oliver Neukum
  2002-08-05 20:23                   ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Oliver Neukum @ 2002-08-05 20:01 UTC (permalink / raw)
  To: Linus Torvalds, Jamie Lokier; +Cc: linux-kernel


> Also, people who play games with FP actually change the FP data on the
> stack frame, and depend on signal return to reload it. Admittedly I've
> only ever seen this on SIGFPE, but anyway - this is all done with integer
> instructions that just touch bitpatterns on the stack.. The kernel can't
> catch it sanely.

Could the fp state be put on its own page and the dirty bit
evaluated in the decision whether to restore fpu state ?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating usermode linux]
  2002-08-05 20:01                 ` context switch vs. signal delivery [was: Re: Accelerating usermode linux] Oliver Neukum
@ 2002-08-05 20:23                   ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2002-08-05 20:23 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Jamie Lokier, linux-kernel

On Mon, 5 Aug 2002, Oliver Neukum wrote:
> 
> > Also, people who play games with FP actually change the FP data on the
> > stack frame, and depend on signal return to reload it. Admittedly I've
> > only ever seen this on SIGFPE, but anyway - this is all done with integer
> > instructions that just touch bitpatterns on the stack.. The kernel can't
> > catch it sanely.
> 
> Could the fp state be put on its own page and the dirty bit
> evaluated in the decision whether to restore fpu state ?

I'm sure anything is _possible_, but there are a few problems with that 
approach. In particular, playing VM games tends to be quite expensive on 
SMP, since you need to make sure that the TLB entry for that page is 
invalidated on all the other CPU's before you insert the FPU page.

Also, you'd need to play games with dirty bit handling, since the page
_is_ dirty (it contains FP data), so the VM must know to write it out if
it pages things. That's ok - we have separate per-page and per-TLB-entry 
dirty bits anyway, but right now the VM layer knows it can move the TLB 
entry dirty bit into the per-page dirty bit and drop it - which wouldn't 
be the case if we also have a FPU dirty bit.

That's fixable - we could just make a "software TLB dirty bit" that it
updated whenever the hardware TLB dirty bit is cleared and moved into the
per-page dirty bit.

But the end result sounds rather complicated, especially since all the
page table walking necessary for setting this all up is likely to be about 
as expensive as the thing we're trying to avoid..

Rule of thumb: it almost never pays to be "clever".

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  5:35           ` Linus Torvalds
                               ` (2 preceding siblings ...)
  2002-08-05 15:39             ` Jamie Lokier
@ 2002-08-06  5:31             ` Mark Mielke
  3 siblings, 0 replies; 56+ messages in thread
From: Mark Mielke @ 2002-08-06  5:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Mon, Aug 05, 2002 at 05:35:13AM +0000, Linus Torvalds wrote:
> And yes, this signal handler thing is clearly visible on benchmarks. 
> MUCH too clearly visible.  I just didn't see any safe alternatives
> (and I still don't ;( )

To some degree, the original approach taken by Intel may be an alternative...

That is, the signal handler is responsible for saving state of all CPU
resources that it intends to use, and restoring state before returning
control to the caller. (the 'interupt' qualifier from C)

I could see this offered as a GCC optimization, but without the compiler
smarts to detect what is needed and what is not, it would be very difficult
to add this support in a seamless manner.

For example:

    typedef void (*__fastsighandler_t) (int) __attribute__ ((signal_handler));

    #define signal(number, handler) \
        (__attribute_enabled__((handler, signal_handler)) \
            ? __signal_fast(number, handler) \
            : __signal(number, handler))

    void handle_sigint (int) __attribute__ ((signal_handler))
    {
        sigint_received++;
    }



mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-04  6:46         ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
  2002-08-05  5:35           ` Linus Torvalds
@ 2002-08-05 10:40           ` Ingo Molnar
  2002-08-05 14:59             ` Larry McVoy
  2002-08-05 15:41             ` Jamie Lokier
  1 sibling, 2 replies; 56+ messages in thread
From: Ingo Molnar @ 2002-08-05 10:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel


On 4 Aug 2002, Andi Kleen wrote:

> > actually the opposite is true, on a 2.2 GHz P4:
> > 
> >   $ ./lat_sig catch
> >   Signal handler overhead: 3.091 microseconds
> > 
> >   $ ./lat_ctx -s 0 2
> >   2 0.90
> > 
> > ie. *process to process* context switches are 3.4 times faster than signal
> > delivery. Ie. we can switch to a helper thread and back, and still be
> > faster than a *single* signal.
> 
> This is because the signal save/restore does a lot of unnecessary stuff.
> One optimization I implemented at one time was adding a SA_NOFP signal
> bit that told the kernel that the signal handler did not intend to
> modify floating point state (few signal handlers need FP) It would not
> save the FPU state then and reached quite some speedup in signal
> latency.

well, we have an optimization in this area already - if the thread
receiving the signal has not used any FPU registers during its current
scheduled atom yet then we do not save the FPU state into the signal
frame.

lat_sig uses the FPU so this cost is added. If the FPU saving cost is
removed then signal delivery latency is still 2.0 usecs - slightly more
than twice as expensive as a context-switch - so it's not a win. And
threads can do queued events that amortizes context switch overhead, while
queued signals generate per-event signal delivery, so signal delivery
costs are not amortized.

(Not that i advocate SIGIO or helper threads for highperformance IO -
Ben's aio interface is the fastest and most correct approach.)

	Ingo


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 10:40           ` Ingo Molnar
@ 2002-08-05 14:59             ` Larry McVoy
  2002-08-05 15:41             ` Jamie Lokier
  1 sibling, 0 replies; 56+ messages in thread
From: Larry McVoy @ 2002-08-05 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

> > > actually the opposite is true, on a 2.2 GHz P4:
> > > 
> > >   $ ./lat_sig catch
> > >   Signal handler overhead: 3.091 microseconds
> > > 
> > >   $ ./lat_ctx -s 0 2
> > >   2 0.90
> > > 
> > > ie. *process to process* context switches are 3.4 times faster than signal
> > > delivery. Ie. we can switch to a helper thread and back, and still be
> > > faster than a *single* signal.

Has someone gone through the lat_ctx.c and lat_sig.c code and convinced 
themselves these are measuring things which ought to be compared like this?
When I wrote that code I didn't anticipate this comparison, so somebody
should go look.

I'd suggest that if you want to measure how fast you can communicate using
signals versus pipes (or sockets or whatever), someone write up a test
which has two processes bounce a token between each other using signals
and then compare that with lat_pipe.  It's not clear to me that you are
comparing apples to apples.

If someone does write the test, we'll add it to LMbench if it reveals
anything useful.  It should be easy enough to do.  I can do it if it
isn't obvious.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 10:40           ` Ingo Molnar
  2002-08-05 14:59             ` Larry McVoy
@ 2002-08-05 15:41             ` Jamie Lokier
  2002-08-05 15:44               ` Jamie Lokier
  1 sibling, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2002-08-05 15:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

Ingo Molnar wrote:
> And threads can do queued events that amortizes context switch
> overhead, while queued signals generate per-event signal delivery, so
> signal delivery costs are not amortized.
> 
> (Not that i advocate SIGIO or helper threads for highperformance IO -
> Ben's aio interface is the fastest and most correct approach.)

Isn't the per-event queued signal cost amortised when using sigwaitinfo()?

-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05 15:41             ` Jamie Lokier
@ 2002-08-05 15:44               ` Jamie Lokier
  0 siblings, 0 replies; 56+ messages in thread
From: Jamie Lokier @ 2002-08-05 15:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel

Jamie Lokier wrote:
> Ingo Molnar wrote:
> > And threads can do queued events that amortizes context switch
> > overhead, while queued signals generate per-event signal delivery, so
> > signal delivery costs are not amortized.
> > 
> > (Not that i advocate SIGIO or helper threads for highperformance IO -
> > Ben's aio interface is the fastest and most correct approach.)
> 
> Isn't the per-event queued signal cost amortised when using sigwaitinfo()?

Of course I meant:

  Isn't the per-event queued signal cost amortised when using sigtimedwait()?

cheers,
-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <1028294887.18635.71.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>]

[parent not found: <Pine.LNX.4.44.0208031332120.7531-100000@localhost.localdomain.suse.lists.linux.kernel>]

[parent not found: <m3u1mb5df3.fsf@averell.firstfloor.org.suse.lists.linux.kernel>]

[parent not found: <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>]

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
       [not found]     ` <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>
@ 2002-08-05  8:38       ` Andi Kleen
  2002-08-05 14:24         ` Jeff Dike
  2002-08-05 16:19         ` Linus Torvalds
  0 siblings, 2 replies; 56+ messages in thread
From: Andi Kleen @ 2002-08-05  8:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

torvalds@transmeta.com (Linus Torvalds) writes:

> >This is because the signal save/restore does a lot of unnecessary stuff.
> >One optimization I implemented at one time was adding a SA_NOFP signal
> >bit that told the kernel that the signal handler did not intend 
> >to modify floating point state (few signal handlers need FP) It would 
> >not save the FPU state then and reached quite some speedup in signal
> >latency. 
> >
> >Linux got a lot slower in signal delivery when the SSE2 support was
> >added. That got this speed back.
> 
> This will break _horribly_ when (if) glibc starts using SSE2 for things
> like memcpy() etc.
> 
> I agree that it is really sad that we have to save/restore FP on
> signals, but I think it's unavoidable. Your hack may work for you, but
> it just gets really dangerous in general. having signals randomly
> subtly corrupt some SSE2 state just because the signal handler uses
> something like memcpy (without even realizing that that could lead to
> trouble) is bad, bad, bad.

I think the possibility at least for memcpy is rather remote. Any sane
SSE memcpy would only kick in for really big arguments (for small
memcpys it doesn't make any sense at all because of the context save/possible
reformatting penalty overhead). So only people doing really
big memcpys could be possibly hurt, and that is rather unlikely.

But your point stands, one definitely needs to be very careful with it.

Also for special things like UML who can ensure their environment is sane it 
could be still an useful optimization. I did it originally for async IO 
handling in some project. At least offering the choice does not hurt.
If it wcould speed up UML I think it would be certainly worth it.

After all Linux should give you enough rope to shot yourself in the foot ;)

> 
> In other words, "not intending to" does not imply "will not".  It's just
> potentially too easy to change SSE2 state by mistake. 
> 
> And yes, this signal handler thing is clearly visible on benchmarks. 
> MUCH too clearly visible.  I just didn't see any safe alternatives
> (and I still don't ;( )

In theory you could do a superhack: put the FP context into an unmapped
page on the stack and only save with lazy FPU or access to the unmapped
page. Unfortunately the details get too nasty
(where to find the unmapped page? is the tlb manipulation worth it if the
page was mapped? how to store the address of the unmapped page for nested 
signal handlers for the page fault handler?) so I discarded this idea.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  8:38       ` Andi Kleen
@ 2002-08-05 14:24         ` Jeff Dike
  2002-08-05 16:19         ` Linus Torvalds
  1 sibling, 0 replies; 56+ messages in thread
From: Jeff Dike @ 2002-08-05 14:24 UTC (permalink / raw)
  To: Andi Kleen, Linus Torvalds; +Cc: linux-kernel

ak@suse.de said:
> Also for special things like UML who can ensure their environment is
> sane it  could be still an useful optimization. 

I use libc, and I haven't been able to convince myself that it isn't
going to use FP instructions or registers on my behalf.  I use it as little
as possible, but it still makes me nervous.

> If it wcould speed up UML I think it would be certainly
> worth it.

After Ingo's numbers, I like the idea of just having a separate address
space and process for the UML kernel, and have that process ptrace UML 
processes and handle system calls and interrupts on their behalf.  One
context switch at the start of a system call and one at the end, as opposed
to a signal delivery and sigreturn.

This also solves the jail mode mprotect performance horrors.

The one thing standing in my way is the need for the kernel process to
be able to change the address space of its processes.

I made a proposal for that, and Alan didn't like it.  So, we'll see what
he likes better.

				Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux]
  2002-08-05  8:38       ` Andi Kleen
  2002-08-05 14:24         ` Jeff Dike
@ 2002-08-05 16:19         ` Linus Torvalds
  1 sibling, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2002-08-05 16:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On 5 Aug 2002, Andi Kleen wrote:
> 
> I think the possibility at least for memcpy is rather remote. Any sane
> SSE memcpy would only kick in for really big arguments (for small
> memcpys it doesn't make any sense at all because of the context save/possible
> reformatting penalty overhead). So only people doing really
> big memcpys could be possibly hurt, and that is rather unlikely.

And this is why the kernel _has_ to save the FP state.

It's the "only happens in a blue moon" bugs that are the absolute _worst_ 
bugs. I want to optimize the kernel until I'm blue in the face, but the 
kernel must NEVER EVER have a "non-stable" interface.

Signal handlers that don't restore state are hard as _hell_ to debug. Most 
of the time it doesn't really matter (unless the lack of restore is 
something really major like one of the most common integer registers), but 
then depending on what libraries you use, and just _exactly_ when the 
signal comes in, you get subtle data corruption that may not show up until 
much later.

At which point your programmer wonders if he mistakenly wandered into 
MS-Windows land.

No thank you. I'll take slow signal handlers over ones that _sometimes_ 
don't work.

> After all Linux should give you enough rope to shot yourself in the foot ;)

On purpose, yes. It's ok to take careful aim, and say "I'm now shooting 
myself in the foot".

And yes, it's also ok to say "I don't know what I'm doing, so I may be
shooting myself in the foot" (this is obviously the most common
foot-shooter).

And if you come to me and complain about how drunk you were, and how you 
shot yourself in the foot by mistake due to that, I'll just ignore you.

BUT - and this is a big BUT - if you are doing everything right, and you 
actually know what you're doing, and you end up shooting yourself in the 
foot because the kernel was taking a shortcut, then I think the kernel is 
_wrong_.

And I'd rather have a slow kernel that does things right, than a fast 
kernel which screws with people.

> In theory you could do a superhack: put the FP context into an unmapped
> page on the stack and only save with lazy FPU or access to the unmapped
> page. 

That would be extremely interesting especially with signal handlers that 
do a longjmp() thing.

The real fix for a lot of programs on x86 would be for them to never ever 
use FP in the first place, in which case the kernel would be able to just 
not save and restore it at all.

However, glibc fiddles with the fpu at startup, even for non-FP programs. 
Dunno what to do about that.

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <20020805163910.C7130@kushida.apsleyroad.org.suse.lists.linux.kernel>]

[parent not found: <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>]

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
       [not found] ` <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>
@ 2002-08-05 16:46   ` Andi Kleen
  2002-08-05 21:30     ` Jamie Lokier
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2002-08-05 16:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds <torvalds@transmeta.com> writes:
> 
> So it would have to be explicitly enabled with a SA_NOFPSIGHANDLER flag or 
> something.

That is all what my patch was doing. It added a SA_NOFP, with default
to off.
Nothing about enabling it by default. The first hunk is an minor optimization.

-Andi

I attached the old patch for 2.4.9 for reference. If you think it is ok
I can submit it for 2.5

--- linux-work/arch/i386/kernel/signal.c-NOFP	Fri Aug 24 16:36:14 2001
+++ linux-work/arch/i386/kernel/signal.c	Fri Aug 31 00:04:24 2001
@@ -322,15 +322,15 @@
 
 static int
 setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate,
-		 struct pt_regs *regs, unsigned long mask)
+		 struct pt_regs *regs, unsigned long mask, int fp)
 {
-	int tmp, err = 0;
+	int err = 0;
+	int tmp;
 
-	tmp = 0;
-	__asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp));
-	err |= __put_user(tmp, (unsigned int *)&sc->gs);
-	__asm__("movl %%fs,%0" : "=r"(tmp): "0"(tmp));
-	err |= __put_user(tmp, (unsigned int *)&sc->fs);
+	__asm__("movl %%gs,%0" : "=r"(tmp));
+	err |= __put_user(tmp & 0xffff, (unsigned int *)&sc->gs);
+	__asm__("movl %%fs,%0" : "=r"(tmp));
+	err |= __put_user(tmp & 0xffff, (unsigned int *)&sc->fs);
 
 	err |= __put_user(regs->xes, (unsigned int *)&sc->es);
 	err |= __put_user(regs->xds, (unsigned int *)&sc->ds);
@@ -350,11 +350,12 @@
 	err |= __put_user(regs->esp, &sc->esp_at_signal);
 	err |= __put_user(regs->xss, (unsigned int *)&sc->ss);
 
-	tmp = save_i387(fpstate);
-	if (tmp < 0)
+	if (fp)
+	  fp = save_i387(fpstate);
+	if (fp < 0)
 	  err = 1;
 	else
-	  err |= __put_user(tmp ? fpstate : NULL, &sc->fpstate);
+	  err |= __put_user(fp ? fpstate : NULL, &sc->fpstate);
 
 	/* non-iBCS2 extensions.. */
 	err |= __put_user(mask, &sc->oldmask);
@@ -410,7 +411,8 @@
 	if (err)
 		goto give_sigsegv;
 
-	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0]);
+	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], 
+				(ka->sa.sa_flags&SA_NOFP));
 	if (err)
 		goto give_sigsegv;
 
@@ -491,7 +493,7 @@
 			  &frame->uc.uc_stack.ss_flags);
 	err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
 	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
-			        regs, set->sig[0]);
+			        regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP));
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 	if (err)
 		goto give_sigsegv;
--- linux-work/arch/i386/kernel/i387.c-NOFP	Fri Feb 23 19:09:08 2001
+++ linux-work/arch/i386/kernel/i387.c	Fri Aug 31 00:01:52 2001
@@ -323,11 +323,6 @@
 	if ( !current->used_math )
 		return 0;
 
-	/* This will cause a "finit" to be triggered by the next
-	 * attempted FPU operation by the 'current' process.
-	 */
-	current->used_math = 0;
-
 	if ( HAVE_HWFP ) {
 		if ( cpu_has_fxsr ) {
 			return save_i387_fxsave( buf );
@@ -335,6 +330,11 @@
 			return save_i387_fsave( buf );
 		}
 	} else {
+		/* This will cause a "finit" to be triggered by the next
+		 * attempted FPU operation by the 'current' process.
+		 */
+		current->used_math = 0;
+       
 		return save_i387_soft( &current->thread.i387.soft, buf );
 	}
 }
--- linux-work/include/asm-i386/signal.h-NOFP	Thu Sep 13 22:27:41 2001
+++ linux-work/include/asm-i386/signal.h	Thu Oct 18 18:31:29 2001
@@ -80,6 +80,7 @@
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
+ * SA_NOFP    Don't save FP state.	
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
  * Unix names RESETHAND and NODEFER respectively.
@@ -97,6 +98,7 @@
 #define SA_INTERRUPT	0x20000000 /* dummy -- ignored */
 
 #define SA_RESTORER	0x04000000
+#define SA_NOFP		0x02000000
 
 /* 
  * sigaltstack controls

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 16:46   ` Andi Kleen
@ 2002-08-05 21:30     ` Jamie Lokier
  2002-08-05 21:35       ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2002-08-05 21:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel

A couple of questions.

Andi Kleen wrote:
> +	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], 
> +				(ka->sa.sa_flags&SA_NOFP));

>  	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
> -			        regs, set->sig[0]);
> +			        regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP));

1: Why the inconsistency between the two ways the SA_NOFP flag is checked?

2: What happens when the user's signal handler decides it wants to save
the FPU state itself (after all) and proceed with some FPU use.  Will
sigreturn restore the user-saved FPU state?  Just curious.

-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 21:30     ` Jamie Lokier
@ 2002-08-05 21:35       ` Andi Kleen
  2002-08-05 22:09         ` Jamie Lokier
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2002-08-05 21:35 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andi Kleen, Linus Torvalds, linux-kernel

On Mon, Aug 05, 2002 at 10:30:06PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > +	err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], 
> > +				(ka->sa.sa_flags&SA_NOFP));
> 
> >  	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
> > -			        regs, set->sig[0]);
> > +			        regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP));
> 
> 1: Why the inconsistency between the two ways the SA_NOFP flag is checked?

I don't remember. Probably there was some reason in an earlier version
of the code. The !! could be probably removed now.

> 
> 2: What happens when the user's signal handler decides it wants to save
> the FPU state itself (after all) and proceed with some FPU use.  Will
> sigreturn restore the user-saved FPU state?  Just curious.

Nope it won't because there is no saved state. The previous context's FPU 
state will be silently corrupted.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 21:35       ` Andi Kleen
@ 2002-08-05 22:09         ` Jamie Lokier
  2002-08-05 22:16           ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2002-08-05 22:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel

Andi Kleen wrote:
> > 2: What happens when the user's signal handler decides it wants to save
> > the FPU state itself (after all) and proceed with some FPU use.  Will
> > sigreturn restore the user-saved FPU state?  Just curious.
> 
> Nope it won't because there is no saved state. The previous context's FPU 
> state will be silently corrupted.

I meant if the user's signal handler decides it wants to save the FPU
state directly into the signal context struct, after deciding to do
that.  Won't that work?

-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: context switch vs. signal delivery [was: Re: Accelerating user  mode linux]
  2002-08-05 22:09         ` Jamie Lokier
@ 2002-08-05 22:16           ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2002-08-05 22:16 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andi Kleen, Linus Torvalds, linux-kernel

On Mon, Aug 05, 2002 at 11:09:41PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > > 2: What happens when the user's signal handler decides it wants to save
> > > the FPU state itself (after all) and proceed with some FPU use.  Will
> > > sigreturn restore the user-saved FPU state?  Just curious.
> > 
> > Nope it won't because there is no saved state. The previous context's FPU 
> > state will be silently corrupted.
> 
> I meant if the user's signal handler decides it wants to save the FPU
> state directly into the signal context struct, after deciding to do
> that.  Won't that work?

In theory yes. The space should be already allocated on the stack, it just
has to be filled in.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2002-08-08 16:13 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-01 20:16 Accelerating user mode linux Alan Cox
2002-08-02  4:40 ` Jeff Dike
2002-08-02  9:50   ` Alan Cox
2002-08-02 18:28     ` Jeff Dike
2002-08-02 17:48       ` Alan Cox
2002-08-02 22:33         ` Jeff Dike
2002-08-02 21:57           ` Alan Cox
2002-08-03  0:54             ` Jeff Dike
2002-08-02 11:34   ` Richard Zidlicky
2002-08-02 13:28     ` Alan Cox
2002-08-03 11:38       ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
2002-08-03 12:33         ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox
2002-08-03 15:29           ` Jeff Dike
2002-08-05 13:46             ` Udo A. Steinberg
2002-08-05 20:44               ` Richard Zidlicky
2002-08-05 22:34                 ` Udo A. Steinberg
2002-08-06  0:42                   ` Jeff Dike
2002-08-06  0:16                     ` Udo A. Steinberg
2002-08-06  2:55                       ` Jeff Dike
2002-08-06  8:10                         ` Udo A. Steinberg
2002-08-06 11:20                           ` Jeff Dike
2002-08-06 11:13                             ` Udo A. Steinberg
2002-08-06 12:53                               ` Jeff Dike
2002-08-06 13:04                                 ` Udo A. Steinberg
2002-08-06 14:12                                   ` Jeff Dike
2002-08-06 16:02                                     ` Udo A. Steinberg
2002-08-06 17:42                                       ` Jeff Dike
2002-08-06 18:01                                         ` Udo A. Steinberg
2002-08-08  1:27                                         ` Udo A. Steinberg
2002-08-08  3:14                                           ` Jeff Dike
2002-08-08  2:21                                             ` Benjamin LaHaise
2002-08-08  9:03                                             ` Udo A. Steinberg
2002-08-08 17:19                                               ` Jeff Dike
2002-08-05 22:06             ` Martin Waitz
2002-08-06  0:49               ` Jeff Dike
2002-08-04  6:46         ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
2002-08-05  5:35           ` Linus Torvalds
2002-08-05  5:42             ` Arnaldo Carvalho de Melo
2002-08-05  6:37             ` Lincoln Dale
2002-08-05 15:39             ` Jamie Lokier
2002-08-05 16:38               ` Linus Torvalds
2002-08-05 20:01                 ` context switch vs. signal delivery [was: Re: Accelerating usermode linux] Oliver Neukum
2002-08-05 20:23                   ` Linus Torvalds
2002-08-06  5:31             ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Mark Mielke
2002-08-05 10:40           ` Ingo Molnar
2002-08-05 14:59             ` Larry McVoy
2002-08-05 15:41             ` Jamie Lokier
2002-08-05 15:44               ` Jamie Lokier
     [not found] <1028294887.18635.71.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0208031332120.7531-100000@localhost.localdomain.suse.lists.linux.kernel>
     [not found]   ` <m3u1mb5df3.fsf@averell.firstfloor.org.suse.lists.linux.kernel>
     [not found]     ` <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>
2002-08-05  8:38       ` Andi Kleen
2002-08-05 14:24         ` Jeff Dike
2002-08-05 16:19         ` Linus Torvalds
     [not found] <20020805163910.C7130@kushida.apsleyroad.org.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>
2002-08-05 16:46   ` Andi Kleen
2002-08-05 21:30     ` Jamie Lokier
2002-08-05 21:35       ` Andi Kleen
2002-08-05 22:09         ` Jamie Lokier
2002-08-05 22:16           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox