* Accelerating user mode linux @ 2002-08-01 20:16 Alan Cox 2002-08-02 4:40 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-01 20:16 UTC (permalink / raw) To: linux-kernel Proposal for a sigaltmm() There is a problem with performance when running virtualised environments (notably user mode linux). The performance of the mprotect calls needed to handle syscalls and protect the UML kernel from its user space are large and the alternatives like a seperate process and ptrace are not pretty either The cunning plan goes like this Add current->alt_mm A per task flag for 'supervisory' mode Tasks start with current->alt_mm NULL and the flag set to supervisory On exec/exit tear down alt_mm as well as mm Signal delivery checks if alt_mm != NULL && supervisory is clear if so it sets supervisory and switches mm/alt_mm, flush the tlb and continue handling the signal in the new space We add sys_switchmm(address); This switches to the altmm (creating one if it doesnt exist as a copy of the current mm), flushes the tlb and jumps to the address given. Any opinions, spanners to throw in the works ? Alan ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-01 20:16 Accelerating user mode linux Alan Cox @ 2002-08-02 4:40 ` Jeff Dike 2002-08-02 9:50 ` Alan Cox 2002-08-02 11:34 ` Richard Zidlicky 0 siblings, 2 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-02 4:40 UTC (permalink / raw) To: Alan Cox, linux-kernel alan@redhat.com said: > We add > sys_switchmm(address); > This switches to the altmm (creating one if it doesnt exist as a copy > of the current mm), flushes the tlb and jumps to the address given. You didn't explicitly say (and so I had to ask :-) that this is intended to be the mechanism by which UML returns to userspace, rather than the normal sigreturn you'd get by just returning from the handler. So, this would make the entry to userspace look like: restore registers . . . sys_switchmm(ip); The problem with this is that it needs to be atomic wrt signals. There can't be an interrupt in the middle of that sequence. So, sys_switchmm would also have to restore the old signal mask, which you'd have to pass in unless you're going to read it off the signal frame. Also, it would have to be open coded because you've already restored the stack pointer. So, the entry to userspace starts looking like: block signals restore registers sys_switchmm(ip, new_sigmask); Well, except for the blocking signals part, this is sigreturn under a different name and partly moved into userspace. Your objection to returning through sigreturn was performance. Is performance a veto of adding an mm switch to sigreturn, or it is possible to make it acceptible? Also, is a new sigreturn_mm() reasonable? This would be close to sys_switchmm, except that it would restore registers and would be a plug replacement for sigreturn[_rt]. I don't favor this because it would probably have to choose whether to be an _rt return or not, and I'd like the option of having UML register some signals as SA_INFO (currently, they are all non SA_INFO). Comments, brickbats, spanners? Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 4:40 ` Jeff Dike @ 2002-08-02 9:50 ` Alan Cox 2002-08-02 18:28 ` Jeff Dike 2002-08-02 11:34 ` Richard Zidlicky 1 sibling, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-02 9:50 UTC (permalink / raw) To: Jeff Dike; +Cc: Alan Cox, linux-kernel > can't be an interrupt in the middle of that sequence. So, sys_switchmm > would also have to restore the old signal mask, which you'd have to pass > in unless you're going to read it off the signal frame. Also, it would > have to be open coded because you've already restored the stack pointer. Uggh.. you are right. You end up needing sigreturn handling > Your objection to returning through sigreturn was performance. Is performance > a veto of adding an mm switch to sigreturn, or it is possible to make it > acceptible? Its not a veto. I was trying to avoid having to add any more branches to the fast paths in the kernel. The remaining sigreturn question is "how do you get into 'user' mode the first time" ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 9:50 ` Alan Cox @ 2002-08-02 18:28 ` Jeff Dike 2002-08-02 17:48 ` Alan Cox 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-02 18:28 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel alan@redhat.com said: > Its not a veto. I was trying to avoid having to add any more branches > to the fast paths in the kernel. Unless I'm missing something, a test for altmm and a branch to out of line mm switching should be about three instructions on x86 including a correctly predicted branch not taken in the non-altmm case. > The remaining sigreturn question is > "how do you get into 'user' mode the first time" Last night I told you it was by building a signal frame by hand and returning through it. That's no longer true. Now, every UML thread (except the idle thread, which I think can be reasonable expected not to try to enter userspace) is in a host signal handler when in the kernel. All entrances to userspace happen by returning through that signal frame. Special userspace returns (exec and fork et al) fiddle the sigcontext in that frame beforehand. Normal system calls stuff the return value in the appropriate slot in the sigcontext before returning, as well. So, there's nothing special about entering userspace for the first time. Everything is under a signal frame, so any time something needs to enter userspace, it just returns through it. > This switches to the altmm (creating one if it doesnt exist as a copy > of the current mm) About this business of creating a UML kernel address space for each UML user thread - I prefer to have a single kernel address space to which all signals are delivered. This has the slight disadvantage that the process address space isn't directly accessible, but I can live with that. A virt_to_phys translation isn't too painful. A single separate kernel address space has the following attractions for me: there are some cases where 3G of KVA would be very useful it would make the UML kernel completely invisible to processes, which is important for honeypots apps which consume huge amounts of VM might run on the host, but crap out inside a UML This raises the question of how the process address spaces are created. For a variety of reasons unrelated to altmm (which I can go into if anyone's interested), I want address spaces to be separate user-visible objects. You'd create a new empty one by opening /proc/new-mm or something and get back a file descriptor as a handle to it. mmap/munmap/mprotect would be extended to take a file descriptor pointing to the address space to be changed. So, altmm would look like this: When it starts up, UML would call sigaltmm, passing a descriptor to its own address space and register its signal handlers with a new flag, SA_IN_MM. sigaction would have an mm field in which this descriptor would be put (and would contain -1 in the non-altmm case). The sigcontext would have an extra int in it which would be the descriptor of the address space to which sigreturn will return. Like now, UML would arrange that everything is under a host signal handler. When it enters userspace it would change the address space fd in the sigcontext if necessary. Does this sound sane? Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 18:28 ` Jeff Dike @ 2002-08-02 17:48 ` Alan Cox 2002-08-02 22:33 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-02 17:48 UTC (permalink / raw) To: Jeff Dike; +Cc: Alan Cox, linux-kernel > So, there's nothing special about entering userspace for the first time. > Everything is under a signal frame, so any time something needs to enter > userspace, it just returns through it. Ok > This has the slight disadvantage that the process address space isn't directly > accessible, but I can live with that. A virt_to_phys translation isn't too > painful. Right > This raises the question of how the process address spaces are created. For > a variety of reasons unrelated to altmm (which I can go into if anyone's > interested), I want address spaces to be separate user-visible objects. That really makes all the existing code not work with it. Doing an altmm is easy in the sense that it doesn't require 20 new syscall and doesnt slow down the main kernel paths for a single odd case. I can see why there is a need to manipulate the other mm I need to think about the right way to handle it. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 17:48 ` Alan Cox @ 2002-08-02 22:33 ` Jeff Dike 2002-08-02 21:57 ` Alan Cox 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-02 22:33 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel alan@redhat.com said: > That really makes all the existing code not work with it. Can you be more specific? If you're thinking I'm talking about breaking mmap, munmap, and mprotect by adding another argument, I'm not. I'm talking about adding new syscalls, mmap2, munmap2, mprotect2 (or something more imaginative), which have the extra argument, having them take -1 as meaning "fiddle the current address space" and pursuading libc to use them instead of the current syscalls. Then we would start the current ones on their way to the happy syscall hunting grounds in the sky. > Doing an altmm is easy in the sense that it doesn't require 20 new > syscall I don't think I mentioned 20 new syscalls anywhere :-) If you count the ones above as replacements and not new, I'm talking about one new syscall - switch_mm(), which I didn't mention before, that would switch to a given address space. This would be the basis of UML's switch_mm. > and doesnt slow down the main kernel paths for a single odd > case. Which main kernel paths are you referring to here? Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 22:33 ` Jeff Dike @ 2002-08-02 21:57 ` Alan Cox 2002-08-03 0:54 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-02 21:57 UTC (permalink / raw) To: Jeff Dike; +Cc: Alan Cox, linux-kernel > mmap, munmap, and mprotect by adding another argument, I'm not. I'm talking > about adding new syscalls, mmap2, munmap2, mprotect2 (or something more > imaginative), which have the extra argument, having them take -1 as meaning > "fiddle the current address space" and pursuading libc to use them instead > of the current syscalls. Then we would start the current ones on their way > to the happy syscall hunting grounds in the sky. Thats a lot more invasive than I want to be ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 21:57 ` Alan Cox @ 2002-08-03 0:54 ` Jeff Dike 0 siblings, 0 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-03 0:54 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel alan@redhat.com said: > Thats a lot more invasive than I want to be OK, that was my best thinking on the subject. I'll be interested to see what you like. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 4:40 ` Jeff Dike 2002-08-02 9:50 ` Alan Cox @ 2002-08-02 11:34 ` Richard Zidlicky 2002-08-02 13:28 ` Alan Cox 1 sibling, 1 reply; 56+ messages in thread From: Richard Zidlicky @ 2002-08-02 11:34 UTC (permalink / raw) To: Jeff Dike; +Cc: Alan Cox, linux-kernel On Thu, Aug 01, 2002 at 11:40:28PM -0500, Jeff Dike wrote: > > Your objection to returning through sigreturn was performance. Is performance > a veto of adding an mm switch to sigreturn, or it is possible to make it > acceptible? I have once ported Basilisk to work native on linux-m68k. It works *slow* so I looked what the problem is - the signal delivery in Linux is exorbitantly slow. Eg an SIGILL delivery costs ~ 1650 cycles on a 68060, compared to that sigreturn and getpid are 200-250 and sched_yield with context switch around 400. So sigreturn is not the place I would be looking for the biggest speedups. Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Accelerating user mode linux 2002-08-02 11:34 ` Richard Zidlicky @ 2002-08-02 13:28 ` Alan Cox 2002-08-03 11:38 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar 0 siblings, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-02 13:28 UTC (permalink / raw) To: Richard Zidlicky; +Cc: Jeff Dike, Alan Cox, linux-kernel On Fri, 2002-08-02 at 12:34, Richard Zidlicky wrote: > I have once ported Basilisk to work native on linux-m68k. It works > *slow* so I looked what the problem is - the signal delivery in > Linux is exorbitantly slow. Eg an SIGILL delivery costs ~ 1650 cycles > on a 68060, compared to that sigreturn and getpid are 200-250 and > sched_yield with context switch around 400. The numbers look very different on a real processor. Signal delivery is indeed not stunningly fast but relative to a context switch its very low indeed. ^ permalink raw reply [flat|nested] 56+ messages in thread
* context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-02 13:28 ` Alan Cox @ 2002-08-03 11:38 ` Ingo Molnar 2002-08-03 12:33 ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox 2002-08-04 6:46 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen 0 siblings, 2 replies; 56+ messages in thread From: Ingo Molnar @ 2002-08-03 11:38 UTC (permalink / raw) To: Alan Cox; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel On 2 Aug 2002, Alan Cox wrote: > The numbers look very different on a real processor. Signal delivery is > indeed not stunningly fast but relative to a context switch its very low > indeed. actually the opposite is true, on a 2.2 GHz P4: $ ./lat_sig catch Signal handler overhead: 3.091 microseconds $ ./lat_ctx -s 0 2 2 0.90 ie. *process to process* context switches are 3.4 times faster than signal delivery. Ie. we can switch to a helper thread and back, and still be faster than a *single* signal. signals are in essence 'lightweight' threads created and destroyed for the purpose of a single asynchronous event, it's IMO a very inefficient and baroque concept for almost anything (but debugging and a number of very special uses). I'd guess that with a sane threading library a helper thread is faster for almost everything. Ingo ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-03 11:38 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar @ 2002-08-03 12:33 ` Alan Cox 2002-08-03 15:29 ` Jeff Dike 2002-08-04 6:46 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen 1 sibling, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-03 12:33 UTC (permalink / raw) To: mingo; +Cc: Alan Cox, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel > actually the opposite is true, on a 2.2 GHz P4: > > $ ./lat_sig catch > Signal handler overhead: 3.091 microseconds > > $ ./lat_ctx -s 0 2 > 2 0.90 > > ie. *process to process* context switches are 3.4 times faster than signal > delivery. Ie. we can switch to a helper thread and back, and still be > faster than a *single* signal. Thats interesting indeed. I'd not tried it with the O(1) scheduler. > signals are in essence 'lightweight' threads created and destroyed for the > purpose of a single asynchronous event, it's IMO a very inefficient and > baroque concept for almost anything (but debugging and a number of very > special uses). I'd guess that with a sane threading library a helper > thread is faster for almost everything. Which would argue UML ought to have a positively microkernel view of syscalls - sending a message ? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-03 12:33 ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox @ 2002-08-03 15:29 ` Jeff Dike 2002-08-05 13:46 ` Udo A. Steinberg 2002-08-05 22:06 ` Martin Waitz 0 siblings, 2 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-03 15:29 UTC (permalink / raw) To: Alan Cox, mingo; +Cc: Richard Zidlicky, linux-kernel alan@redhat.com said: > Which would argue UML ought to have a positively microkernel view of > syscalls - sending a message ? Indeed. Ingo's mail got me thinking that alan@redhat.com said: > the alternatives like a seperate process and ptrace are not pretty either might not be so bad after all. All I would need to make this work is for one process to be able to change the mm of another. Then, the current UML tracing thread would handle the kernel side of things and sit in its own address space nicely protected from its processes. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-03 15:29 ` Jeff Dike @ 2002-08-05 13:46 ` Udo A. Steinberg 2002-08-05 20:44 ` Richard Zidlicky 2002-08-05 22:06 ` Martin Waitz 1 sibling, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-05 13:46 UTC (permalink / raw) To: Jeff Dike; +Cc: alan, mingo, rz, linux-kernel On Sat, 03 Aug 2002 10:29:42 -0500 Jeff Dike <jdike@karaya.com> wrote: > alan@redhat.com said: > > the alternatives like a seperate process and ptrace are not pretty either I have implemented a usermode version of the Fiasco µ-kernel that uses a seperate process for the kernel and one process for each task. The kernel process attaches to all tasks via ptrace. When the kernel wants to change the MM of a task it puts some trampoline code on a page mapped into each task's address space and has the task execute that code on behalf of the kernel. With that setup we have complete address space protection without all the trouble of jail at the expense of a few context switches for each mmap, munmap or mprotect operation. I would also very much like an extension that would allow one process to modify the MM of another, possibly via an extended ptrace interface or a new syscall. Also it would be nice if there was an alternate way to get at the cr2 register, trap number and error code other than from a SIGSEGV handler. > All I would need to make this work is for one process to be able to change > the mm of another. Yes, exactly. > Then, the current UML tracing thread would handle the kernel side of things > and sit in its own address space nicely protected from its processes. Yes. I already have this part working for our kernel, so it's not just theory. I believe things could run yet another bit faster if we didn't have to do the trampoline map operations. -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-05 13:46 ` Udo A. Steinberg @ 2002-08-05 20:44 ` Richard Zidlicky 2002-08-05 22:34 ` Udo A. Steinberg 0 siblings, 1 reply; 56+ messages in thread From: Richard Zidlicky @ 2002-08-05 20:44 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: Jeff Dike, alan, mingo, linux-kernel On Mon, Aug 05, 2002 at 03:46:07PM +0200, Udo A. Steinberg wrote: > On Sat, 03 Aug 2002 10:29:42 -0500 > Jeff Dike <jdike@karaya.com> wrote: > > > alan@redhat.com said: > > > the alternatives like a seperate process and ptrace are not pretty either > > I have implemented a usermode version of the Fiasco µ-kernel that uses > a seperate process for the kernel and one process for each task. The kernel > process attaches to all tasks via ptrace. > When the kernel wants to change the MM of a task it puts some trampoline code > on a page mapped into each task's address space and has the task execute that > code on behalf of the kernel. > With that setup we have complete address space protection without all the > trouble of jail at the expense of a few context switches for each mmap, munmap > or mprotect operation. very interesting, what is the handiest way to do "syscalls" in this model? Ptrace is still basically signal driven so I would expect it has still some unnecessary overhead? > I would also very much like an extension that would allow one process to modify > the MM of another, possibly via an extended ptrace interface or a new syscall. > Also it would be nice if there was an alternate way to get at the cr2 register, > trap number and error code other than from a SIGSEGV handler. that's what signals are for, too bad they are slow. > > Then, the current UML tracing thread would handle the kernel side of things > > and sit in its own address space nicely protected from its processes. > > Yes. I already have this part working for our kernel, so it's not just theory. > I believe things could run yet another bit faster if we didn't have to do the > trampoline map operations. they are very expensive because of the way ptrace accesses the other process memory, did you try a piece of shared memory ? Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-05 20:44 ` Richard Zidlicky @ 2002-08-05 22:34 ` Udo A. Steinberg 2002-08-06 0:42 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-05 22:34 UTC (permalink / raw) To: Richard Zidlicky; +Cc: jdike, alan, mingo, linux-kernel On Mon, 5 Aug 2002 22:44:15 +0200 Richard Zidlicky <rz@linux-m68k.org> wrote: > very interesting, what is the handiest way to do "syscalls" in this model? > Ptrace is still basically signal driven so I would expect it has still some > unnecessary overhead? Task wants to do a syscall (i.e. int 0x30 in Fiasco), the kernel process tracing the task sees the signal in its SIGCHLD handler. It pulls the registers out of the task's address space using PTRACE_GETREGS and sets up an interrupt frame on the kernel stack. EIP and ESP in the saved signal context are frobbed in a way that the signal handler falls right into the correct interrupt gate when it returns. iret works the other way round. SIGSEGV handler in the kernel process copies registers back to task and restarts the task's process after restoring kernel state. > > I would also very much like an extension that would allow one process to modify > > the MM of another, possibly via an extended ptrace interface or a new syscall. > > Also it would be nice if there was an alternate way to get at the cr2 register, > > trap number and error code other than from a SIGSEGV handler. > > that's what signals are for, too bad they are slow. As it is now, in order to get at the page fault address one has to invoke a SIGSEGV handler in the task, then look at the task's signal context to determine the pagefault address, trapno etc. It would be much faster if the kernel could cancel the SIGSEGV signal in the task's process and read out the the pagefault info from the TCB via a ptrace extension. Saves the cost of a running a signal handler in the task and a bunch of context switches. > they are very expensive because of the way ptrace accesses the other process > memory, did you try a piece of shared memory ? Yes, trampoline page is shared between kernel and task. Nevertheless there are context switches that wouldn't be necessary if the kernel could tweak the task's mm directly. -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-05 22:34 ` Udo A. Steinberg @ 2002-08-06 0:42 ` Jeff Dike 2002-08-06 0:16 ` Udo A. Steinberg 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-06 0:42 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: Richard Zidlicky, alan, mingo, linux-kernel us15@os.inf.tu-dresden.de said: > Task wants to do a syscall (i.e. int 0x30 in Fiasco), the kernel > process tracing the task sees the signal in its SIGCHLD handler. It > pulls the registers out of the task's address space using > PTRACE_GETREGS and sets up an interrupt frame on the kernel stack. Hmmm, I would have the kernel process let the system call bump it out of wait() rather than delivering a SIGCHLD. And, I'd be inclined to lomgjmp over to the kernel stack. Or, even better, have it already running on the appropriate kernel stack, so it can just read the system call from PTRACE_GETREGS and call into the main kernel. Similarly, with other signals, like the timer, SIGIO, or page faults, it would just annull the signal and call into the IRQ system. Although page faults will be difficult because of the inability to read err or cr3, as you've pointed out. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 0:42 ` Jeff Dike @ 2002-08-06 0:16 ` Udo A. Steinberg 2002-08-06 2:55 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-06 0:16 UTC (permalink / raw) To: Jeff Dike; +Cc: rz, alan, mingo, linux-kernel On Mon, 05 Aug 2002 19:42:31 -0500 Jeff Dike <jdike@karaya.com> wrote: > Similarly, with other signals, like the timer, SIGIO, or page faults, it > would just annull the signal and call into the IRQ system. Although page > faults will be difficult because of the inability to read err or cr3, as > you've pointed out. Jeff, If my understanding of UML is right, you implement interrupts with socket pairs where the interrupt handler writes a byte into one end and the other end receives an async notification (SIGIO). In order to stop the right task with a SIGIO, you change the socket owner on each context switch using fcntl. If you have one process per task and a kernel process, the kernel process cannot change socket ownership over to the next task's process, because it's not allowed to. Only the process itself could set the ownership to his pid, but then each task switch would have to be done with a trampoline too. The issue boils down to how the kernel process can stop a task process in order to force the task into kernel. You can of course kill (taskpid, SIG) but that has a race if the task tries to enter kernel at the same time. SIG will be pending in the task until it is scheduled next. -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 0:16 ` Udo A. Steinberg @ 2002-08-06 2:55 ` Jeff Dike 2002-08-06 8:10 ` Udo A. Steinberg 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-06 2:55 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: rz, alan, mingo, linux-kernel us15@os.inf.tu-dresden.de said: > If my understanding of UML is right, you implement interrupts with > socket pairs where the interrupt handler writes a byte into one end > and the other end receives an async notification (SIGIO). It sounds like you're confusing two mechanisms. Device interrupts are implemented with something that supports SIGIO (socketpair, tty) with one end outside UML and one end inside UML generating the SIGIOs. I use socketpairs in the way you describe to implement context switching. Out-of-context processes are sleeping in a read on their socket, and are woken up by an soon-to-be-out-of-context process writing a byte down it. There's no SIGIO there at all. I also use socketpairs with SIGIO to implement IPIs on SMP UML. > In order to > stop the right task with a SIGIO, you change the socket owner on each > context switch using fcntl. Yup. More precisely, in order to ensure that the correct process receives SIGIO when input comes in from the outside, I F_SETOWN the descriptors to the incoming process during a context switch. > If you have one process per task and a kernel process, the kernel > process cannot change socket ownership over to the next task's > process, because it's not allowed to. Why not? I see nothing at all in the implementation of F_SETOWN that requires that it be called by the current owner: case F_SETOWN: lock_kernel(); filp->f_owner.pid = arg; filp->f_owner.uid = current->uid; filp->f_owner.euid = current->euid; ... There are no general checks earlier in do_fcntl or sys_fcntl either. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 2:55 ` Jeff Dike @ 2002-08-06 8:10 ` Udo A. Steinberg 2002-08-06 11:20 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-06 8:10 UTC (permalink / raw) To: Jeff Dike; +Cc: rz, alan, mingo, linux-kernel On Mon, 05 Aug 2002 21:55:05 -0500 > > > If you have one process per task and a kernel process, the kernel > > process cannot change socket ownership over to the next task's > > process, because it's not allowed to. > > Why not? I see nothing at all in the implementation of F_SETOWN that requires > that it be called by the current owner: > > case F_SETOWN: > lock_kernel(); > filp->f_owner.pid = arg; > filp->f_owner.uid = current->uid; > filp->f_owner.euid = current->euid; > ... Ok, I was looking at sockets and not tty's and that has the following in net/core/sock.c case F_SETOWN: /* * This is a little restrictive, but it's the only * way to make sure that you can't send a sigurg to * another process. */ if (current->pgrp != -arg && current->pid != arg && !capable(CAP_KILL)) return(-EPERM); sk->proc = arg; return(0); So it wouldn't work with socketpairs, but with tty's it should. -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 8:10 ` Udo A. Steinberg @ 2002-08-06 11:20 ` Jeff Dike 2002-08-06 11:13 ` Udo A. Steinberg 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-06 11:20 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: rz, alan, mingo, linux-kernel us15@os.inf.tu-dresden.de said: > if (current->pgrp != -arg && > current->pid != arg && > !capable(CAP_KILL)) return(-EPERM); What's the problem here? This will let UML do F_SETOWN as well. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 11:20 ` Jeff Dike @ 2002-08-06 11:13 ` Udo A. Steinberg 2002-08-06 12:53 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-06 11:13 UTC (permalink / raw) To: Jeff Dike; +Cc: linux-kernel On Tue, 06 Aug 2002 06:20:52 -0500 Jeff Dike <jdike@karaya.com> wrote: > us15@os.inf.tu-dresden.de said: > > if (current->pgrp != -arg && > > current->pid != arg && > > !capable(CAP_KILL)) return(-EPERM); > > What's the problem here? This will let UML do F_SETOWN as well. It will let the incoming process take over ownership of the socket, which is probably what you mean and what you currently use. I'm talking about a setup with the kernel residing in its own process. On iret it would have to change ownership of the socket to another task, i.e. process with kernel_pid wants to set task_pid as the owner of the socket. The above code fragment doesn't permit this, as far as I can see. What it does permit is the incoming task setting itself to the socket owner, but that requires that the incoming task always runs a trampoline first which accomplishes that. -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 11:13 ` Udo A. Steinberg @ 2002-08-06 12:53 ` Jeff Dike 2002-08-06 13:04 ` Udo A. Steinberg 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-06 12:53 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel us15@os.inf.tu-dresden.de said: > It will let the incoming process take over ownership of the socket, > which is probably what you mean and what you currently use. Yup. > On iret it would have to change ownership of the socket to another > task, i.e. process with kernel_pid wants to set task_pid as the owner > of the socket. The above code fragment doesn't permit this, as far as > I can see. Why not? There is nothing there that prevents that. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 12:53 ` Jeff Dike @ 2002-08-06 13:04 ` Udo A. Steinberg 2002-08-06 14:12 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-06 13:04 UTC (permalink / raw) To: Jeff Dike; +Cc: linux-kernel On Tue, 06 Aug 2002 08:53:24 -0400 Jeff Dike <jdike@karaya.com> wrote: > > On iret it would have to change ownership of the socket to another > > task, i.e. process with kernel_pid wants to set task_pid as the owner > > of the socket. The above code fragment doesn't permit this, as far as > > I can see. > > Why not? There is nothing there that prevents that. In the following code the parent (i.e. kernel) tries to set the child (i.e. task) as owner for the socket. Does this work for you? It doesn't for me, for the reason I described earlier. #include <sys/types.h> #include <sys/socket.h> #include <fcntl.h> #include <unistd.h> int main (void) { int sockets[2], flags; pid_t pid; if (socketpair (AF_UNIX, SOCK_STREAM, 0, sockets)) { perror ("socketpair"); return -1; } switch (pid = fork ()) { case -1: perror ("fork"); return -1; case 0: pause (); default: if ((flags = fcntl (sockets[0], F_GETFL)) < 0) { perror ("fcntl, GETFL"); return -1; } if (fcntl (sockets[0], F_SETFL, flags | O_NONBLOCK | O_ASYNC) < 0) { perror ("fcntl, SETFL"); return -1; } if (fcntl (sockets[0], F_SETOWN, pid) < 0) { perror ("fcntl, SETOWN"); return -1; } } return 0; } ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 13:04 ` Udo A. Steinberg @ 2002-08-06 14:12 ` Jeff Dike 2002-08-06 16:02 ` Udo A. Steinberg 0 siblings, 1 reply; 56+ messages in thread From: Jeff Dike @ 2002-08-06 14:12 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel us15@os.inf.tu-dresden.de said: > Does this work for you? No :-) > It doesn't for me, for the reason I described > earlier. Indeed. I misread the !capable(CAP_KILL) as "I am not allowed to kill the other guy", which clearly you are when you just forked it. This looks like a bug to me. If you own the process, you can send it any signal you want, so you should be allowed to sign it up for SIGURG/SIGIO via F_SETOWN. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 14:12 ` Jeff Dike @ 2002-08-06 16:02 ` Udo A. Steinberg 2002-08-06 17:42 ` Jeff Dike 0 siblings, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-06 16:02 UTC (permalink / raw) To: Jeff Dike; +Cc: linux-kernel On Tue, 06 Aug 2002 10:12:25 -0400 Jeff Dike <jdike@karaya.com> wrote: > Indeed. I misread the !capable(CAP_KILL) as "I am not allowed to kill the > other guy", which clearly you are when you just forked it. > This looks like a bug to me. If you own the process, you can send it any > signal you want, so you should be allowed to sign it up for SIGURG/SIGIO via > F_SETOWN. I'm glad we agree on that one :) Considering we're not using sockets with broken SIGIO, but pseudo-terminals like UML instead, there's still a problem: When the task is registered as socket owner and is just about to enter the kernel due to a syscall, it will stop with a SIGTRAP and the tracing kernel process will run sometime and see a SIGCHLD. But after the task stopped and before the kernel process can change SIGIO ownership back, a new interrupt could come in and the SIGIO would remain pending in the task's process until the task was scheduled to run next time. How do you solve this? -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 16:02 ` Udo A. Steinberg @ 2002-08-06 17:42 ` Jeff Dike 2002-08-06 18:01 ` Udo A. Steinberg 2002-08-08 1:27 ` Udo A. Steinberg 0 siblings, 2 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-06 17:42 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel us15@os.inf.tu-dresden.de said: > I'm glad we agree on that one :) Yup, sorry. That test is wrong, and is slated to be fixed at some point. > When the task is registered as socket owner and is just about to enter > the kernel due to a syscall, it will stop with a SIGTRAP and the > tracing kernel process will run sometime and see a SIGCHLD. But after > the task stopped and before the kernel process can change SIGIO > ownership back, a new interrupt could come in and the SIGIO would > remain pending in the task's process until the task was scheduled to > run next time. > > How do you solve this? A couple of ways. The system call path can call sigio_handler to clear out any pending IO. The SIGIO that was trapped in the process will cause another call to sigio_handler which won't turn up any IO, but I don't consider that to be a problem. The kernel process can examine the signal pending mask of the process after it has transferred SIGIO to itself. This can be done either through /proc/<pid>/status or a ptrace extension, since we're happily postulating new things for it to do anyway. If there is a SIGIO pending, it calls sigio_handler. Any other possibilities that you see? Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 17:42 ` Jeff Dike @ 2002-08-06 18:01 ` Udo A. Steinberg 2002-08-08 1:27 ` Udo A. Steinberg 1 sibling, 0 replies; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-06 18:01 UTC (permalink / raw) To: Jeff Dike; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1501 bytes --] On Tue, 06 Aug 2002 13:42:18 -0400 Jeff Dike <jdike@karaya.com> wrote: > A couple of ways. The system call path can call sigio_handler to clear > out any pending IO. The SIGIO that was trapped in the process will cause > another call to sigio_handler which won't turn up any IO, but I don't > consider that to be a problem. It is not a problem at all, just a small performance penalty. > The kernel process can examine the signal pending mask of the process after > it has transferred SIGIO to itself. This can be done either through > /proc/<pid>/status or a ptrace extension, since we're happily postulating > new things for it to do anyway. If there is a SIGIO pending, it calls > sigio_handler. I don't like the idea of having to fiddle with the proc filesystem. Some people might not even mount it. A ptrace extension to look at and modify the pending signal mask of a traced process would be very handy. > Any other possibilities that you see? Right now I'm doing something hackish. If the process enters with a syscall (int 0x30 in my case) after the kernel expects it to enter due to an interrupt, I just restart the task until it enters with the pending interrupt signal (SIGIO). The task will do that before it can step on the int instruction again, and after it returns to usermode it will step on the int again. This works well with faults. The problem are traps, because the EIP points behind the instruction. In that case the EIP needs to be adjusted. Ugly, I know. -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-06 17:42 ` Jeff Dike 2002-08-06 18:01 ` Udo A. Steinberg @ 2002-08-08 1:27 ` Udo A. Steinberg 2002-08-08 3:14 ` Jeff Dike 1 sibling, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-08 1:27 UTC (permalink / raw) To: Jeff Dike; +Cc: linux-kernel On Tue, 06 Aug 2002 13:42:18 -0400 Jeff Dike <jdike@karaya.com> wrote: > > The kernel process can examine the signal pending mask of the process after > it has transferred SIGIO to itself. This can be done either through > /proc/<pid>/status or a ptrace extension, since we're happily postulating > new things for it to do anyway. If there is a SIGIO pending, it calls > sigio_handler. > > Any other possibilities that you see? Another possibility could be the kernel process and the task processes sharing a pending signal queue, either for one particular signal or all signals. The kernel process would block SIGIO while the task runs and when the task enters kernel mode with a SIGIO still trapped in the task process, SIGIO would get delivered in the kernel and cleared from the shared pending queue, which is just what we want. Someone actually already tried implementing it with a clone extension, see http://www.rhdv.cistron.nl/sigqueue.html -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-08 1:27 ` Udo A. Steinberg @ 2002-08-08 3:14 ` Jeff Dike 2002-08-08 2:21 ` Benjamin LaHaise 2002-08-08 9:03 ` Udo A. Steinberg 0 siblings, 2 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-08 3:14 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel us15@os.inf.tu-dresden.de said: > SIGIO would get delivered in the kernel and cleared from the shared > pending queue, which is just what we want. Not really. What we really want is for signals not to be delivered at all. That's why the ptrace signal annulling capability is nice. I'm not sure if this makes any sense, but coupling the new aio mechanism with something that queues up siginfos might be interesting. It would be a magic descriptor that would feed you signals when you read it. Is that at all sane? Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-08 3:14 ` Jeff Dike @ 2002-08-08 2:21 ` Benjamin LaHaise 2002-08-08 9:03 ` Udo A. Steinberg 1 sibling, 0 replies; 56+ messages in thread From: Benjamin LaHaise @ 2002-08-08 2:21 UTC (permalink / raw) To: Jeff Dike; +Cc: Udo A. Steinberg, linux-kernel On Wed, Aug 07, 2002 at 10:14:42PM -0500, Jeff Dike wrote: > I'm not sure if this makes any sense, but coupling the new aio mechanism with > something that queues up siginfos might be interesting. It would be a magic > descriptor that would feed you signals when you read it. > > Is that at all sane? Delivering signals from aio completion is indeed possible. There is even a field in the iocb structure for doing this in order to provide complete posix compatibility (well, except for the fact that structure initialization is enforced). -ben -- "You will be reincarnated as a toad; and you will be much happier." ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-08 3:14 ` Jeff Dike 2002-08-08 2:21 ` Benjamin LaHaise @ 2002-08-08 9:03 ` Udo A. Steinberg 2002-08-08 17:19 ` Jeff Dike 1 sibling, 1 reply; 56+ messages in thread From: Udo A. Steinberg @ 2002-08-08 9:03 UTC (permalink / raw) To: Jeff Dike; +Cc: linux-kernel On Wed, 07 Aug 2002 22:14:42 -0500 Jeff Dike <jdike@karaya.com> wrote: > > Not really. What we really want is for signals not to be delivered at all. > That's why the ptrace signal annulling capability is nice. > > I'm not sure if this makes any sense, but coupling the new aio mechanism with > something that queues up siginfos might be interesting. It would be a magic > descriptor that would feed you signals when you read it. > > Is that at all sane? I know that we're trying to avoid signal handlers, because they are expensive. But the signal would not need to be delivered in the task. We need a mechanism to stop the task and force it into kernel. The task is uncooperative and doesn't dequeue signals itself. When it gets a signal it stops. The kernel then sees the signal and accepts it using sigwaitinfo, at which point it is no longer pending in the task either. The siginfo structure then provides the necessary info, i.e. which fd caused the i/o. When running in a kernel context, you actually need to deliver SIGIO in order to interrupt the current context. If you have a magic aio descriptor, how does the task process read signals from it and stop? -Udo. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-08 9:03 ` Udo A. Steinberg @ 2002-08-08 17:19 ` Jeff Dike 0 siblings, 0 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-08 17:19 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel us15@os.inf.tu-dresden.de said: > The task is uncooperative and doesn't dequeue signals itself. When it > gets a signal it stops. The kernel then sees the signal and accepts it > using sigwaitinfo, at which point it is no longer pending in the task > either. The siginfo structure then provides the necessary info, i.e. > which fd caused the i/o. I think this is more or less what I had in mind. The thing that is missing is for sigwaitinfo to be able to dequeue another process' signals, which is where the shared signal queue would come in. > If you have a magic aio descriptor, how does the task process read > signals from it and stop? I was looking at this as a way of dequeueing signals from the other process. The task process would have the signal queued and wake up the kernel process as happens now. The kernel process would have /proc/<task-pid>/sigqueue or something opened and would read siginfos from it. Those would then be dequeued from the task process. This almost suffices for getting page fault information, except that, for some reason, siginfo doesn't say whether the faulting access was a read or a write. And now that I'm thinking about it, aio doesn't really come into it. This would be strictly synchronous. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-03 15:29 ` Jeff Dike 2002-08-05 13:46 ` Udo A. Steinberg @ 2002-08-05 22:06 ` Martin Waitz 2002-08-06 0:49 ` Jeff Dike 1 sibling, 1 reply; 56+ messages in thread From: Martin Waitz @ 2002-08-05 22:06 UTC (permalink / raw) To: linux-kernel, Jeff Dike [-- Attachment #1: Type: text/plain, Size: 865 bytes --] hi :) On Sat, Aug 03, 2002 at 10:29:42AM -0500, Jeff Dike wrote: > alan@redhat.com said: > > the alternatives like a seperate process and ptrace are not pretty either > > might not be so bad after all. there is already a group at our university doing that: http://www3.informatik.uni-erlangen.de/Research/Projects/UMLinux/umlinux.html -- CU, / Friedrich-Alexander University Erlangen, Germany Martin Waitz // [Tali on IRCnet] [tali.home.pages.de] _________ ______________/// - - - - - - - - - - - - - - - - - - - - /// dies ist eine manuell generierte mail, sie beinhaltet // tippfehler und ist auch ohne grossbuchstaben gueltig. / - Wer bereit ist, grundlegende Freiheiten aufzugeben, um sich kurzfristige Sicherheit zu verschaffen, der hat weder Freiheit noch Sicherheit verdient. Benjamin Franklin (1706 - 1790) [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode 2002-08-05 22:06 ` Martin Waitz @ 2002-08-06 0:49 ` Jeff Dike 0 siblings, 0 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-06 0:49 UTC (permalink / raw) To: Martin Waitz; +Cc: linux-kernel, Hans-Joerg Hoexer tali@admingilde.org said: > there is already a group at our university doing that: http:// > www3.informatik.uni-erlangen.de/Research/Projects/UMLinux/umlinux.html Yeah, I know. Hans-Joerg and I have been talking about whether and how much it makes sense to start sharing code. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-03 11:38 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar 2002-08-03 12:33 ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox @ 2002-08-04 6:46 ` Andi Kleen 2002-08-05 5:35 ` Linus Torvalds 2002-08-05 10:40 ` Ingo Molnar 1 sibling, 2 replies; 56+ messages in thread From: Andi Kleen @ 2002-08-04 6:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel Ingo Molnar <mingo@elte.hu> writes: > actually the opposite is true, on a 2.2 GHz P4: > > $ ./lat_sig catch > Signal handler overhead: 3.091 microseconds > > $ ./lat_ctx -s 0 2 > 2 0.90 > > ie. *process to process* context switches are 3.4 times faster than signal > delivery. Ie. we can switch to a helper thread and back, and still be > faster than a *single* signal. This is because the signal save/restore does a lot of unnecessary stuff. One optimization I implemented at one time was adding a SA_NOFP signal bit that told the kernel that the signal handler did not intend to modify floating point state (few signal handlers need FP) It would not save the FPU state then and reached quite some speedup in signal latency. Linux got a lot slower in signal delivery when the SSE2 support was added. That got this speed back. The target were certain applications that use signal handlers for async IO. If there is interest I can dig up the old patches. They were really simple. x86-64 does it also faster by FXSAVE'ing directly to the user space frame with exception handling instead of copying manually. But that's not possible in i386 because it still has to use the baroque iBCS FP context format on the stack. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-04 6:46 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen @ 2002-08-05 5:35 ` Linus Torvalds 2002-08-05 5:42 ` Arnaldo Carvalho de Melo ` (3 more replies) 2002-08-05 10:40 ` Ingo Molnar 1 sibling, 4 replies; 56+ messages in thread From: Linus Torvalds @ 2002-08-05 5:35 UTC (permalink / raw) To: linux-kernel In article <m3u1mb5df3.fsf@averell.firstfloor.org>, Andi Kleen <ak@muc.de> wrote: >Ingo Molnar <mingo@elte.hu> writes: > > >> actually the opposite is true, on a 2.2 GHz P4: >> >> $ ./lat_sig catch >> Signal handler overhead: 3.091 microseconds >> >> $ ./lat_ctx -s 0 2 >> 2 0.90 >> >> ie. *process to process* context switches are 3.4 times faster than signal >> delivery. Ie. we can switch to a helper thread and back, and still be >> faster than a *single* signal. > >This is because the signal save/restore does a lot of unnecessary stuff. >One optimization I implemented at one time was adding a SA_NOFP signal >bit that told the kernel that the signal handler did not intend >to modify floating point state (few signal handlers need FP) It would >not save the FPU state then and reached quite some speedup in signal >latency. > >Linux got a lot slower in signal delivery when the SSE2 support was >added. That got this speed back. This will break _horribly_ when (if) glibc starts using SSE2 for things like memcpy() etc. I agree that it is really sad that we have to save/restore FP on signals, but I think it's unavoidable. Your hack may work for you, but it just gets really dangerous in general. having signals randomly subtly corrupt some SSE2 state just because the signal handler uses something like memcpy (without even realizing that that could lead to trouble) is bad, bad, bad. In other words, "not intending to" does not imply "will not". It's just potentially too easy to change SSE2 state by mistake. And yes, this signal handler thing is clearly visible on benchmarks. MUCH too clearly visible. I just didn't see any safe alternatives (and I still don't ;( ) Linus ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 5:35 ` Linus Torvalds @ 2002-08-05 5:42 ` Arnaldo Carvalho de Melo 2002-08-05 6:37 ` Lincoln Dale ` (2 subsequent siblings) 3 siblings, 0 replies; 56+ messages in thread From: Arnaldo Carvalho de Melo @ 2002-08-05 5:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Em Mon, Aug 05, 2002 at 05:35:13AM +0000, Linus Torvalds escreveu: > This will break _horribly_ when (if) glibc starts using SSE2 for things > like memcpy() etc. Humm, related, wasn't one way of having userspace have access to the kernel optimized versions of memcpy et al, thru a page with these functions that would be mapped into the process address space (don't remember exact details) something still being considered? - Arnaldo ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 5:35 ` Linus Torvalds 2002-08-05 5:42 ` Arnaldo Carvalho de Melo @ 2002-08-05 6:37 ` Lincoln Dale 2002-08-05 15:39 ` Jamie Lokier 2002-08-06 5:31 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Mark Mielke 3 siblings, 0 replies; 56+ messages in thread From: Lincoln Dale @ 2002-08-05 6:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel At 05:35 AM 5/08/2002 +0000, Linus Torvalds wrote: > >Linux got a lot slower in signal delivery when the SSE2 support was > >added. That got this speed back. > >This will break _horribly_ when (if) glibc starts using SSE2 for things >like memcpy() etc. > >I agree that it is really sad that we have to save/restore FP on >signals, but I think it's unavoidable. Your hack may work for you, but >it just gets really dangerous in general. having signals randomly >subtly corrupt some SSE2 state just because the signal handler uses >something like memcpy (without even realizing that that could lead to >trouble) is bad, bad, bad. how about putting the onus on userspace to tell the kernel if/when it uses extensions that require FP state to be saved/restored? if/when glibc starts using SSE2, it could then use these extensions. could be as simple as user-space setting some bit somewhere. >And yes, this signal handler thing is clearly visible on benchmarks. >MUCH too clearly visible. I just didn't see any safe alternatives >(and I still don't ;( ) it probably isn't worthwhile penalising all users of signal just for those few userspace apps that actually do use SSE2. cheers, lincoln. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 5:35 ` Linus Torvalds 2002-08-05 5:42 ` Arnaldo Carvalho de Melo 2002-08-05 6:37 ` Lincoln Dale @ 2002-08-05 15:39 ` Jamie Lokier 2002-08-05 16:38 ` Linus Torvalds 2002-08-06 5:31 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Mark Mielke 3 siblings, 1 reply; 56+ messages in thread From: Jamie Lokier @ 2002-08-05 15:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds wrote: > I agree that it is really sad that we have to save/restore FP on > signals, but I think it's unavoidable. Couldn't you mark the FPU as unused for the duration of the handler, and let the lazy FPU mechanism save the state when it is used by the signal handler? > And yes, this signal handler thing is clearly visible on benchmarks. > MUCH too clearly visible. I just didn't see any safe alternatives > (and I still don't ;( ) I use SEGVs to trap access to read-only pages for garbage collection, and I know I'm not the only one. That's a lot of SEGVs... Fwiw, I have timed SIGSEGV handling time on Linux on various Intel CPUs, on a PA-RISC running HP-UX and on a few Sparcs running Solaris. Linux came out faster in all cases. Best case: 8 microseconds to trap a page fault, handle the SEGV and mprotect() one page (600MHz P3). Worst case: 37 microseconds (133MHz Pentium). That's about 5000 cycles. I'm sure we can do better than that. For sophisticated user space uses, like the above, I'd like to see a trap handling mechanism that saves only the _minimum_ state. Userspace can take care of the rest. Maybe even without a sigreturn in some cases. -- Jamie ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 15:39 ` Jamie Lokier @ 2002-08-05 16:38 ` Linus Torvalds 2002-08-05 20:01 ` context switch vs. signal delivery [was: Re: Accelerating usermode linux] Oliver Neukum 0 siblings, 1 reply; 56+ messages in thread From: Linus Torvalds @ 2002-08-05 16:38 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel On Mon, 5 Aug 2002, Jamie Lokier wrote: > Linus Torvalds wrote: > > I agree that it is really sad that we have to save/restore FP on > > signals, but I think it's unavoidable. > > Couldn't you mark the FPU as unused for the duration of the > handler, and let the lazy FPU mechanism save the state when it is used > by the signal handler? Nope. Believe me, I gave some thought to clever things to do. The kernel won't even _see_ a longjmp() out of a signal handler, so the kernel has a really hard time trying to do any clever lazy stuff. Also, people who play games with FP actually change the FP data on the stack frame, and depend on signal return to reload it. Admittedly I've only ever seen this on SIGFPE, but anyway - this is all done with integer instructions that just touch bitpatterns on the stack.. The kernel can't catch it sanely. > For sophisticated user space uses, like the above, I'd like to see > a trap handling mechanism that saves only the _minimum_ state. I would not mind an extra per-signal flag that says "don't bother with FP saves" (the same way we already have "don't restart" etc), but I would be very nervous if glibc used it by default (even if glibc doesn't use SSE2 in memcpy, gcc itself can do it, and obviously _users_ may just do it themselves). So it would have to be explicitly enabled with a SA_NOFPSIGHANDLER flag or something. (And yes, it's the FP stuff that takes most of the time. I think the lmbench numbers for signal delivery tripled when that went in). Linus ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating usermode linux] 2002-08-05 16:38 ` Linus Torvalds @ 2002-08-05 20:01 ` Oliver Neukum 2002-08-05 20:23 ` Linus Torvalds 0 siblings, 1 reply; 56+ messages in thread From: Oliver Neukum @ 2002-08-05 20:01 UTC (permalink / raw) To: Linus Torvalds, Jamie Lokier; +Cc: linux-kernel > Also, people who play games with FP actually change the FP data on the > stack frame, and depend on signal return to reload it. Admittedly I've > only ever seen this on SIGFPE, but anyway - this is all done with integer > instructions that just touch bitpatterns on the stack.. The kernel can't > catch it sanely. Could the fp state be put on its own page and the dirty bit evaluated in the decision whether to restore fpu state ? Regards Oliver ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating usermode linux] 2002-08-05 20:01 ` context switch vs. signal delivery [was: Re: Accelerating usermode linux] Oliver Neukum @ 2002-08-05 20:23 ` Linus Torvalds 0 siblings, 0 replies; 56+ messages in thread From: Linus Torvalds @ 2002-08-05 20:23 UTC (permalink / raw) To: Oliver Neukum; +Cc: Jamie Lokier, linux-kernel On Mon, 5 Aug 2002, Oliver Neukum wrote: > > > Also, people who play games with FP actually change the FP data on the > > stack frame, and depend on signal return to reload it. Admittedly I've > > only ever seen this on SIGFPE, but anyway - this is all done with integer > > instructions that just touch bitpatterns on the stack.. The kernel can't > > catch it sanely. > > Could the fp state be put on its own page and the dirty bit > evaluated in the decision whether to restore fpu state ? I'm sure anything is _possible_, but there are a few problems with that approach. In particular, playing VM games tends to be quite expensive on SMP, since you need to make sure that the TLB entry for that page is invalidated on all the other CPU's before you insert the FPU page. Also, you'd need to play games with dirty bit handling, since the page _is_ dirty (it contains FP data), so the VM must know to write it out if it pages things. That's ok - we have separate per-page and per-TLB-entry dirty bits anyway, but right now the VM layer knows it can move the TLB entry dirty bit into the per-page dirty bit and drop it - which wouldn't be the case if we also have a FPU dirty bit. That's fixable - we could just make a "software TLB dirty bit" that it updated whenever the hardware TLB dirty bit is cleared and moved into the per-page dirty bit. But the end result sounds rather complicated, especially since all the page table walking necessary for setting this all up is likely to be about as expensive as the thing we're trying to avoid.. Rule of thumb: it almost never pays to be "clever". Linus ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 5:35 ` Linus Torvalds ` (2 preceding siblings ...) 2002-08-05 15:39 ` Jamie Lokier @ 2002-08-06 5:31 ` Mark Mielke 3 siblings, 0 replies; 56+ messages in thread From: Mark Mielke @ 2002-08-06 5:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Mon, Aug 05, 2002 at 05:35:13AM +0000, Linus Torvalds wrote: > And yes, this signal handler thing is clearly visible on benchmarks. > MUCH too clearly visible. I just didn't see any safe alternatives > (and I still don't ;( ) To some degree, the original approach taken by Intel may be an alternative... That is, the signal handler is responsible for saving state of all CPU resources that it intends to use, and restoring state before returning control to the caller. (the 'interupt' qualifier from C) I could see this offered as a GCC optimization, but without the compiler smarts to detect what is needed and what is not, it would be very difficult to add this support in a seamless manner. For example: typedef void (*__fastsighandler_t) (int) __attribute__ ((signal_handler)); #define signal(number, handler) \ (__attribute_enabled__((handler, signal_handler)) \ ? __signal_fast(number, handler) \ : __signal(number, handler)) void handle_sigint (int) __attribute__ ((signal_handler)) { sigint_received++; } mark -- mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-04 6:46 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen 2002-08-05 5:35 ` Linus Torvalds @ 2002-08-05 10:40 ` Ingo Molnar 2002-08-05 14:59 ` Larry McVoy 2002-08-05 15:41 ` Jamie Lokier 1 sibling, 2 replies; 56+ messages in thread From: Ingo Molnar @ 2002-08-05 10:40 UTC (permalink / raw) To: Andi Kleen; +Cc: Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel On 4 Aug 2002, Andi Kleen wrote: > > actually the opposite is true, on a 2.2 GHz P4: > > > > $ ./lat_sig catch > > Signal handler overhead: 3.091 microseconds > > > > $ ./lat_ctx -s 0 2 > > 2 0.90 > > > > ie. *process to process* context switches are 3.4 times faster than signal > > delivery. Ie. we can switch to a helper thread and back, and still be > > faster than a *single* signal. > > This is because the signal save/restore does a lot of unnecessary stuff. > One optimization I implemented at one time was adding a SA_NOFP signal > bit that told the kernel that the signal handler did not intend to > modify floating point state (few signal handlers need FP) It would not > save the FPU state then and reached quite some speedup in signal > latency. well, we have an optimization in this area already - if the thread receiving the signal has not used any FPU registers during its current scheduled atom yet then we do not save the FPU state into the signal frame. lat_sig uses the FPU so this cost is added. If the FPU saving cost is removed then signal delivery latency is still 2.0 usecs - slightly more than twice as expensive as a context-switch - so it's not a win. And threads can do queued events that amortizes context switch overhead, while queued signals generate per-event signal delivery, so signal delivery costs are not amortized. (Not that i advocate SIGIO or helper threads for highperformance IO - Ben's aio interface is the fastest and most correct approach.) Ingo ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 10:40 ` Ingo Molnar @ 2002-08-05 14:59 ` Larry McVoy 2002-08-05 15:41 ` Jamie Lokier 1 sibling, 0 replies; 56+ messages in thread From: Larry McVoy @ 2002-08-05 14:59 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel > > > actually the opposite is true, on a 2.2 GHz P4: > > > > > > $ ./lat_sig catch > > > Signal handler overhead: 3.091 microseconds > > > > > > $ ./lat_ctx -s 0 2 > > > 2 0.90 > > > > > > ie. *process to process* context switches are 3.4 times faster than signal > > > delivery. Ie. we can switch to a helper thread and back, and still be > > > faster than a *single* signal. Has someone gone through the lat_ctx.c and lat_sig.c code and convinced themselves these are measuring things which ought to be compared like this? When I wrote that code I didn't anticipate this comparison, so somebody should go look. I'd suggest that if you want to measure how fast you can communicate using signals versus pipes (or sockets or whatever), someone write up a test which has two processes bounce a token between each other using signals and then compare that with lat_pipe. It's not clear to me that you are comparing apples to apples. If someone does write the test, we'll add it to LMbench if it reveals anything useful. It should be easy enough to do. I can do it if it isn't obvious. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 10:40 ` Ingo Molnar 2002-08-05 14:59 ` Larry McVoy @ 2002-08-05 15:41 ` Jamie Lokier 2002-08-05 15:44 ` Jamie Lokier 1 sibling, 1 reply; 56+ messages in thread From: Jamie Lokier @ 2002-08-05 15:41 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel Ingo Molnar wrote: > And threads can do queued events that amortizes context switch > overhead, while queued signals generate per-event signal delivery, so > signal delivery costs are not amortized. > > (Not that i advocate SIGIO or helper threads for highperformance IO - > Ben's aio interface is the fastest and most correct approach.) Isn't the per-event queued signal cost amortised when using sigwaitinfo()? -- Jamie ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 15:41 ` Jamie Lokier @ 2002-08-05 15:44 ` Jamie Lokier 0 siblings, 0 replies; 56+ messages in thread From: Jamie Lokier @ 2002-08-05 15:44 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, Richard Zidlicky, Jeff Dike, Alan Cox, linux-kernel Jamie Lokier wrote: > Ingo Molnar wrote: > > And threads can do queued events that amortizes context switch > > overhead, while queued signals generate per-event signal delivery, so > > signal delivery costs are not amortized. > > > > (Not that i advocate SIGIO or helper threads for highperformance IO - > > Ben's aio interface is the fastest and most correct approach.) > > Isn't the per-event queued signal cost amortised when using sigwaitinfo()? Of course I meant: Isn't the per-event queued signal cost amortised when using sigtimedwait()? cheers, -- Jamie ^ permalink raw reply [flat|nested] 56+ messages in thread
[parent not found: <1028294887.18635.71.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>]
[parent not found: <Pine.LNX.4.44.0208031332120.7531-100000@localhost.localdomain.suse.lists.linux.kernel>]
[parent not found: <m3u1mb5df3.fsf@averell.firstfloor.org.suse.lists.linux.kernel>]
[parent not found: <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>]
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] [not found] ` <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel> @ 2002-08-05 8:38 ` Andi Kleen 2002-08-05 14:24 ` Jeff Dike 2002-08-05 16:19 ` Linus Torvalds 0 siblings, 2 replies; 56+ messages in thread From: Andi Kleen @ 2002-08-05 8:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel torvalds@transmeta.com (Linus Torvalds) writes: > >This is because the signal save/restore does a lot of unnecessary stuff. > >One optimization I implemented at one time was adding a SA_NOFP signal > >bit that told the kernel that the signal handler did not intend > >to modify floating point state (few signal handlers need FP) It would > >not save the FPU state then and reached quite some speedup in signal > >latency. > > > >Linux got a lot slower in signal delivery when the SSE2 support was > >added. That got this speed back. > > This will break _horribly_ when (if) glibc starts using SSE2 for things > like memcpy() etc. > > I agree that it is really sad that we have to save/restore FP on > signals, but I think it's unavoidable. Your hack may work for you, but > it just gets really dangerous in general. having signals randomly > subtly corrupt some SSE2 state just because the signal handler uses > something like memcpy (without even realizing that that could lead to > trouble) is bad, bad, bad. I think the possibility at least for memcpy is rather remote. Any sane SSE memcpy would only kick in for really big arguments (for small memcpys it doesn't make any sense at all because of the context save/possible reformatting penalty overhead). So only people doing really big memcpys could be possibly hurt, and that is rather unlikely. But your point stands, one definitely needs to be very careful with it. Also for special things like UML who can ensure their environment is sane it could be still an useful optimization. I did it originally for async IO handling in some project. At least offering the choice does not hurt. If it wcould speed up UML I think it would be certainly worth it. After all Linux should give you enough rope to shot yourself in the foot ;) > > In other words, "not intending to" does not imply "will not". It's just > potentially too easy to change SSE2 state by mistake. > > And yes, this signal handler thing is clearly visible on benchmarks. > MUCH too clearly visible. I just didn't see any safe alternatives > (and I still don't ;( ) In theory you could do a superhack: put the FP context into an unmapped page on the stack and only save with lazy FPU or access to the unmapped page. Unfortunately the details get too nasty (where to find the unmapped page? is the tlb manipulation worth it if the page was mapped? how to store the address of the unmapped page for nested signal handlers for the page fault handler?) so I discarded this idea. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 8:38 ` Andi Kleen @ 2002-08-05 14:24 ` Jeff Dike 2002-08-05 16:19 ` Linus Torvalds 1 sibling, 0 replies; 56+ messages in thread From: Jeff Dike @ 2002-08-05 14:24 UTC (permalink / raw) To: Andi Kleen, Linus Torvalds; +Cc: linux-kernel ak@suse.de said: > Also for special things like UML who can ensure their environment is > sane it could be still an useful optimization. I use libc, and I haven't been able to convince myself that it isn't going to use FP instructions or registers on my behalf. I use it as little as possible, but it still makes me nervous. > If it wcould speed up UML I think it would be certainly > worth it. After Ingo's numbers, I like the idea of just having a separate address space and process for the UML kernel, and have that process ptrace UML processes and handle system calls and interrupts on their behalf. One context switch at the start of a system call and one at the end, as opposed to a signal delivery and sigreturn. This also solves the jail mode mprotect performance horrors. The one thing standing in my way is the need for the kernel process to be able to change the address space of its processes. I made a proposal for that, and Alan didn't like it. So, we'll see what he likes better. Jeff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 8:38 ` Andi Kleen 2002-08-05 14:24 ` Jeff Dike @ 2002-08-05 16:19 ` Linus Torvalds 1 sibling, 0 replies; 56+ messages in thread From: Linus Torvalds @ 2002-08-05 16:19 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On 5 Aug 2002, Andi Kleen wrote: > > I think the possibility at least for memcpy is rather remote. Any sane > SSE memcpy would only kick in for really big arguments (for small > memcpys it doesn't make any sense at all because of the context save/possible > reformatting penalty overhead). So only people doing really > big memcpys could be possibly hurt, and that is rather unlikely. And this is why the kernel _has_ to save the FP state. It's the "only happens in a blue moon" bugs that are the absolute _worst_ bugs. I want to optimize the kernel until I'm blue in the face, but the kernel must NEVER EVER have a "non-stable" interface. Signal handlers that don't restore state are hard as _hell_ to debug. Most of the time it doesn't really matter (unless the lack of restore is something really major like one of the most common integer registers), but then depending on what libraries you use, and just _exactly_ when the signal comes in, you get subtle data corruption that may not show up until much later. At which point your programmer wonders if he mistakenly wandered into MS-Windows land. No thank you. I'll take slow signal handlers over ones that _sometimes_ don't work. > After all Linux should give you enough rope to shot yourself in the foot ;) On purpose, yes. It's ok to take careful aim, and say "I'm now shooting myself in the foot". And yes, it's also ok to say "I don't know what I'm doing, so I may be shooting myself in the foot" (this is obviously the most common foot-shooter). And if you come to me and complain about how drunk you were, and how you shot yourself in the foot by mistake due to that, I'll just ignore you. BUT - and this is a big BUT - if you are doing everything right, and you actually know what you're doing, and you end up shooting yourself in the foot because the kernel was taking a shortcut, then I think the kernel is _wrong_. And I'd rather have a slow kernel that does things right, than a fast kernel which screws with people. > In theory you could do a superhack: put the FP context into an unmapped > page on the stack and only save with lazy FPU or access to the unmapped > page. That would be extremely interesting especially with signal handlers that do a longjmp() thing. The real fix for a lot of programs on x86 would be for them to never ever use FP in the first place, in which case the kernel would be able to just not save and restore it at all. However, glibc fiddles with the fpu at startup, even for non-FP programs. Dunno what to do about that. Linus ^ permalink raw reply [flat|nested] 56+ messages in thread
[parent not found: <20020805163910.C7130@kushida.apsleyroad.org.suse.lists.linux.kernel>]
[parent not found: <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>]
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] [not found] ` <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel> @ 2002-08-05 16:46 ` Andi Kleen 2002-08-05 21:30 ` Jamie Lokier 0 siblings, 1 reply; 56+ messages in thread From: Andi Kleen @ 2002-08-05 16:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds <torvalds@transmeta.com> writes: > > So it would have to be explicitly enabled with a SA_NOFPSIGHANDLER flag or > something. That is all what my patch was doing. It added a SA_NOFP, with default to off. Nothing about enabling it by default. The first hunk is an minor optimization. -Andi I attached the old patch for 2.4.9 for reference. If you think it is ok I can submit it for 2.5 --- linux-work/arch/i386/kernel/signal.c-NOFP Fri Aug 24 16:36:14 2001 +++ linux-work/arch/i386/kernel/signal.c Fri Aug 31 00:04:24 2001 @@ -322,15 +322,15 @@ static int setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate, - struct pt_regs *regs, unsigned long mask) + struct pt_regs *regs, unsigned long mask, int fp) { - int tmp, err = 0; + int err = 0; + int tmp; - tmp = 0; - __asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp)); - err |= __put_user(tmp, (unsigned int *)&sc->gs); - __asm__("movl %%fs,%0" : "=r"(tmp): "0"(tmp)); - err |= __put_user(tmp, (unsigned int *)&sc->fs); + __asm__("movl %%gs,%0" : "=r"(tmp)); + err |= __put_user(tmp & 0xffff, (unsigned int *)&sc->gs); + __asm__("movl %%fs,%0" : "=r"(tmp)); + err |= __put_user(tmp & 0xffff, (unsigned int *)&sc->fs); err |= __put_user(regs->xes, (unsigned int *)&sc->es); err |= __put_user(regs->xds, (unsigned int *)&sc->ds); @@ -350,11 +350,12 @@ err |= __put_user(regs->esp, &sc->esp_at_signal); err |= __put_user(regs->xss, (unsigned int *)&sc->ss); - tmp = save_i387(fpstate); - if (tmp < 0) + if (fp) + fp = save_i387(fpstate); + if (fp < 0) err = 1; else - err |= __put_user(tmp ? fpstate : NULL, &sc->fpstate); + err |= __put_user(fp ? fpstate : NULL, &sc->fpstate); /* non-iBCS2 extensions.. */ err |= __put_user(mask, &sc->oldmask); @@ -410,7 +411,8 @@ if (err) goto give_sigsegv; - err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0]); + err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], + (ka->sa.sa_flags&SA_NOFP)); if (err) goto give_sigsegv; @@ -491,7 +493,7 @@ &frame->uc.uc_stack.ss_flags); err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size); err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate, - regs, set->sig[0]); + regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP)); err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); if (err) goto give_sigsegv; --- linux-work/arch/i386/kernel/i387.c-NOFP Fri Feb 23 19:09:08 2001 +++ linux-work/arch/i386/kernel/i387.c Fri Aug 31 00:01:52 2001 @@ -323,11 +323,6 @@ if ( !current->used_math ) return 0; - /* This will cause a "finit" to be triggered by the next - * attempted FPU operation by the 'current' process. - */ - current->used_math = 0; - if ( HAVE_HWFP ) { if ( cpu_has_fxsr ) { return save_i387_fxsave( buf ); @@ -335,6 +330,11 @@ return save_i387_fsave( buf ); } } else { + /* This will cause a "finit" to be triggered by the next + * attempted FPU operation by the 'current' process. + */ + current->used_math = 0; + return save_i387_soft( ¤t->thread.i387.soft, buf ); } } --- linux-work/include/asm-i386/signal.h-NOFP Thu Sep 13 22:27:41 2001 +++ linux-work/include/asm-i386/signal.h Thu Oct 18 18:31:29 2001 @@ -80,6 +80,7 @@ * SA_RESETHAND clears the handler when the signal is delivered. * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies. * SA_NODEFER prevents the current signal from being masked in the handler. + * SA_NOFP Don't save FP state. * * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single * Unix names RESETHAND and NODEFER respectively. @@ -97,6 +98,7 @@ #define SA_INTERRUPT 0x20000000 /* dummy -- ignored */ #define SA_RESTORER 0x04000000 +#define SA_NOFP 0x02000000 /* * sigaltstack controls ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 16:46 ` Andi Kleen @ 2002-08-05 21:30 ` Jamie Lokier 2002-08-05 21:35 ` Andi Kleen 0 siblings, 1 reply; 56+ messages in thread From: Jamie Lokier @ 2002-08-05 21:30 UTC (permalink / raw) To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel A couple of questions. Andi Kleen wrote: > + err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], > + (ka->sa.sa_flags&SA_NOFP)); > err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate, > - regs, set->sig[0]); > + regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP)); 1: Why the inconsistency between the two ways the SA_NOFP flag is checked? 2: What happens when the user's signal handler decides it wants to save the FPU state itself (after all) and proceed with some FPU use. Will sigreturn restore the user-saved FPU state? Just curious. -- Jamie ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 21:30 ` Jamie Lokier @ 2002-08-05 21:35 ` Andi Kleen 2002-08-05 22:09 ` Jamie Lokier 0 siblings, 1 reply; 56+ messages in thread From: Andi Kleen @ 2002-08-05 21:35 UTC (permalink / raw) To: Jamie Lokier; +Cc: Andi Kleen, Linus Torvalds, linux-kernel On Mon, Aug 05, 2002 at 10:30:06PM +0100, Jamie Lokier wrote: > Andi Kleen wrote: > > + err |= setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0], > > + (ka->sa.sa_flags&SA_NOFP)); > > > err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate, > > - regs, set->sig[0]); > > + regs, set->sig[0], !!(ka->sa.sa_flags&SA_NOFP)); > > 1: Why the inconsistency between the two ways the SA_NOFP flag is checked? I don't remember. Probably there was some reason in an earlier version of the code. The !! could be probably removed now. > > 2: What happens when the user's signal handler decides it wants to save > the FPU state itself (after all) and proceed with some FPU use. Will > sigreturn restore the user-saved FPU state? Just curious. Nope it won't because there is no saved state. The previous context's FPU state will be silently corrupted. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 21:35 ` Andi Kleen @ 2002-08-05 22:09 ` Jamie Lokier 2002-08-05 22:16 ` Andi Kleen 0 siblings, 1 reply; 56+ messages in thread From: Jamie Lokier @ 2002-08-05 22:09 UTC (permalink / raw) To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel Andi Kleen wrote: > > 2: What happens when the user's signal handler decides it wants to save > > the FPU state itself (after all) and proceed with some FPU use. Will > > sigreturn restore the user-saved FPU state? Just curious. > > Nope it won't because there is no saved state. The previous context's FPU > state will be silently corrupted. I meant if the user's signal handler decides it wants to save the FPU state directly into the signal context struct, after deciding to do that. Won't that work? -- Jamie ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: context switch vs. signal delivery [was: Re: Accelerating user mode linux] 2002-08-05 22:09 ` Jamie Lokier @ 2002-08-05 22:16 ` Andi Kleen 0 siblings, 0 replies; 56+ messages in thread From: Andi Kleen @ 2002-08-05 22:16 UTC (permalink / raw) To: Jamie Lokier; +Cc: Andi Kleen, Linus Torvalds, linux-kernel On Mon, Aug 05, 2002 at 11:09:41PM +0100, Jamie Lokier wrote: > Andi Kleen wrote: > > > 2: What happens when the user's signal handler decides it wants to save > > > the FPU state itself (after all) and proceed with some FPU use. Will > > > sigreturn restore the user-saved FPU state? Just curious. > > > > Nope it won't because there is no saved state. The previous context's FPU > > state will be silently corrupted. > > I meant if the user's signal handler decides it wants to save the FPU > state directly into the signal context struct, after deciding to do > that. Won't that work? In theory yes. The space should be already allocated on the stack, it just has to be filled in. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2002-08-08 16:13 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-01 20:16 Accelerating user mode linux Alan Cox
2002-08-02 4:40 ` Jeff Dike
2002-08-02 9:50 ` Alan Cox
2002-08-02 18:28 ` Jeff Dike
2002-08-02 17:48 ` Alan Cox
2002-08-02 22:33 ` Jeff Dike
2002-08-02 21:57 ` Alan Cox
2002-08-03 0:54 ` Jeff Dike
2002-08-02 11:34 ` Richard Zidlicky
2002-08-02 13:28 ` Alan Cox
2002-08-03 11:38 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Ingo Molnar
2002-08-03 12:33 ` context switch vs. signal delivery [was: Re: Accelerating user mode Alan Cox
2002-08-03 15:29 ` Jeff Dike
2002-08-05 13:46 ` Udo A. Steinberg
2002-08-05 20:44 ` Richard Zidlicky
2002-08-05 22:34 ` Udo A. Steinberg
2002-08-06 0:42 ` Jeff Dike
2002-08-06 0:16 ` Udo A. Steinberg
2002-08-06 2:55 ` Jeff Dike
2002-08-06 8:10 ` Udo A. Steinberg
2002-08-06 11:20 ` Jeff Dike
2002-08-06 11:13 ` Udo A. Steinberg
2002-08-06 12:53 ` Jeff Dike
2002-08-06 13:04 ` Udo A. Steinberg
2002-08-06 14:12 ` Jeff Dike
2002-08-06 16:02 ` Udo A. Steinberg
2002-08-06 17:42 ` Jeff Dike
2002-08-06 18:01 ` Udo A. Steinberg
2002-08-08 1:27 ` Udo A. Steinberg
2002-08-08 3:14 ` Jeff Dike
2002-08-08 2:21 ` Benjamin LaHaise
2002-08-08 9:03 ` Udo A. Steinberg
2002-08-08 17:19 ` Jeff Dike
2002-08-05 22:06 ` Martin Waitz
2002-08-06 0:49 ` Jeff Dike
2002-08-04 6:46 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Andi Kleen
2002-08-05 5:35 ` Linus Torvalds
2002-08-05 5:42 ` Arnaldo Carvalho de Melo
2002-08-05 6:37 ` Lincoln Dale
2002-08-05 15:39 ` Jamie Lokier
2002-08-05 16:38 ` Linus Torvalds
2002-08-05 20:01 ` context switch vs. signal delivery [was: Re: Accelerating usermode linux] Oliver Neukum
2002-08-05 20:23 ` Linus Torvalds
2002-08-06 5:31 ` context switch vs. signal delivery [was: Re: Accelerating user mode linux] Mark Mielke
2002-08-05 10:40 ` Ingo Molnar
2002-08-05 14:59 ` Larry McVoy
2002-08-05 15:41 ` Jamie Lokier
2002-08-05 15:44 ` Jamie Lokier
[not found] <1028294887.18635.71.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.44.0208031332120.7531-100000@localhost.localdomain.suse.lists.linux.kernel>
[not found] ` <m3u1mb5df3.fsf@averell.firstfloor.org.suse.lists.linux.kernel>
[not found] ` <ail2qh$bf0$1@penguin.transmeta.com.suse.lists.linux.kernel>
2002-08-05 8:38 ` Andi Kleen
2002-08-05 14:24 ` Jeff Dike
2002-08-05 16:19 ` Linus Torvalds
[not found] <20020805163910.C7130@kushida.apsleyroad.org.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.44.0208050922570.1753-100000@home.transmeta.com.suse.lists.linux.kernel>
2002-08-05 16:46 ` Andi Kleen
2002-08-05 21:30 ` Jamie Lokier
2002-08-05 21:35 ` Andi Kleen
2002-08-05 22:09 ` Jamie Lokier
2002-08-05 22:16 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox