* Using %cr2 to reference "current"
@ 2001-11-06 7:18 H. Peter Anvin
2001-11-06 8:01 ` Robert Love
` (2 more replies)
0 siblings, 3 replies; 59+ messages in thread
From: H. Peter Anvin @ 2001-11-06 7:18 UTC (permalink / raw)
To: linux-kernel
2.4.13-ac8 uses %cr2 rather than (%esp & 0xfffe0000) to get "current".
I've been trying to figure out the point of this... writing a control
register is microcode on all the x86 implementations I know (and you
have to re-set it after every pagefault), and reading one probably is
one on most (not Transmeta, but...)
On the other hand, %esp is a GPR and available to the core directly,
and so are usually plain immediates.
Is using %cr2 really faster than the old implementation, or is there
another reason? It seems that the alignment constraints on the stack
still remains, since the %esp solution still remains in places...
It might also be worth considering a segment-register based
implementation instead. The reason we're not using %fs and %gs in the
kernel anymore is because of the setup slowness, but perhaps using
them (use %fs since it's much more likely to be NULL and thus faster
to restore) would be faster than using %cr2?
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <amsp@zytor.com>
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: Using %cr2 to reference "current" 2001-11-06 7:18 Using %cr2 to reference "current" H. Peter Anvin @ 2001-11-06 8:01 ` Robert Love 2001-11-06 10:55 ` Alan Cox 2001-11-06 14:14 ` Manfred Spraul 2001-11-06 10:58 ` Alan Cox 2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds 2 siblings, 2 replies; 59+ messages in thread From: Robert Love @ 2001-11-06 8:01 UTC (permalink / raw) To: manfred; +Cc: linux-kernel, hpa On Tue, 2001-11-06 at 02:18, H. Peter Anvin wrote: > 2.4.13-ac8 uses %cr2 rather than (%esp & 0xfffe0000) to get "current". > I've been trying to figure out the point of this... <snip> I too am confused. More so, the difference between hard_get_current and get_current is confusing. I further question things because I suspect there is a problem: hard_get_current is commented as "for within NMI, do_page_fault, cpu_init" but all these functions call other functions that may very well use get_current. How is this going to work? Further, the preemptible kernel patch oopses with this patch (IOW, don't use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I did :>). I think it may be because of: Manfred Spraul wrote: > error_code: > [...] > - GET_CURRENT(%ebx) > call *%edi > addl $8,%esp > + GET_CURRENT(%ebx) > The pointer to current was loaded into %ebx before the call to the error > handler, now that only happens after the call. As far as I can see the > load before the call is not required. this change but I am unsure. Would Manfred or someone knowledgeable in this mind letting me pick their brain? Robert Love ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 8:01 ` Robert Love @ 2001-11-06 10:55 ` Alan Cox 2001-11-06 17:31 ` Michael Barabanov 2001-11-06 14:14 ` Manfred Spraul 1 sibling, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-06 10:55 UTC (permalink / raw) To: Robert Love; +Cc: manfred, linux-kernel, hpa > I too am confused. More so, the difference between hard_get_current and > get_current is confusing. I further question things because I suspect hard_get_current always works get_current assumes %cr2 is loaded correctly > do_page_fault, cpu_init" but all these functions call other functions > that may very well use get_current. How is this going to work? do_page_fault and cpu_init load %cr2 > Further, the preemptible kernel patch oopses with this patch (IOW, don't > use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I > did :>). I think it may be because of: You must ensure that you don't pre-empt until %cr2 is loaded. Obviously this isnt a problem with the traditional low latency patch but if you pre-empty very early in page fault handling then I suspect you might get the odd suprise. The reasoning behind all this is to fix the cache pessimal nature of the x86 stack layout - we had all task structs on the same cache colour and all stacks aligned within pages (so every apache thread waiting at the same point is on the same colour too and each wait queue entry on their stacks is linked to entries all the same colour) Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 10:55 ` Alan Cox @ 2001-11-06 17:31 ` Michael Barabanov 0 siblings, 0 replies; 59+ messages in thread From: Michael Barabanov @ 2001-11-06 17:31 UTC (permalink / raw) To: Alan Cox; +Cc: Robert Love, manfred, linux-kernel, hpa, Victor Yodaiken Here's my version of hard cpu id (RTLinux version): extern inline int rtl_getcpuid(void) { unsigned cpu; __asm__ ( "str %%ax\n\t" "shr $5, %%eax\n\t" "sub $3, %%eax\n\t" : "=a"(cpu)); return cpu; } No cr2 involved; extremely fast. This takes advantage of the fact that TSS-CPU mapping is 1-1 in 2.4. Michael. Alan Cox (alan@lxorguk.ukuu.org.uk) wrote: > > I too am confused. More so, the difference between hard_get_current and > > get_current is confusing. I further question things because I suspect > > hard_get_current always works > get_current assumes %cr2 is loaded correctly > > > do_page_fault, cpu_init" but all these functions call other functions > > that may very well use get_current. How is this going to work? > > do_page_fault and cpu_init load %cr2 > > > Further, the preemptible kernel patch oopses with this patch (IOW, don't > > use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I > > did :>). I think it may be because of: > > You must ensure that you don't pre-empt until %cr2 is loaded. Obviously this > isnt a problem with the traditional low latency patch but if you pre-empty > very early in page fault handling then I suspect you might get the odd > suprise. > > The reasoning behind all this is to fix the cache pessimal nature of the x86 > stack layout - we had all task structs on the same cache colour and all > stacks aligned within pages (so every apache thread waiting at the same > point is on the same colour too and each wait queue entry on their stacks > is linked to entries all the same colour) > > Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 8:01 ` Robert Love 2001-11-06 10:55 ` Alan Cox @ 2001-11-06 14:14 ` Manfred Spraul 1 sibling, 0 replies; 59+ messages in thread From: Manfred Spraul @ 2001-11-06 14:14 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel, hpa Robert Love wrote: > > Further, the preemptible kernel patch oopses with this patch (IOW, don't > use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I > did :>). I think it may be because of: > Could you send me an oops? I assume that a set_current(hard_get_current()); is missing somewhere. The assumption is that get_current() is faster than hard_get_current(), and that there are so many get_current() calls that the overhead for the set_current() in __switch_to and do_page_fault is small. > Manfred Spraul wrote: > > error_code: > > [...] > > - GET_CURRENT(%ebx) > > call *%edi > > addl $8,%esp > > + GET_CURRENT(%ebx) > > The pointer to current was loaded into %ebx before the call to the error > > handler, now that only happens after the call. As far as I can see the > > load before the call is not required. > > this change but I am unsure. Would Manfred or someone knowledgeable in > this mind letting me pick their brain? > I would be very surprised if that's a problem: the error handlers are C functions, and they don't expect parameters in register %ebx. -- Manfred ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 7:18 Using %cr2 to reference "current" H. Peter Anvin 2001-11-06 8:01 ` Robert Love @ 2001-11-06 10:58 ` Alan Cox 2001-11-06 17:04 ` Linus Torvalds 2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds 2 siblings, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-06 10:58 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-kernel > Is using %cr2 really faster than the old implementation, or is there > another reason? It seems that the alignment constraints on the stack > still remains, since the %esp solution still remains in places... The stack is no longer aligned. We allocate two pages and disturb the stack by upto 1.5K. We slab the task structs. > It might also be worth considering a segment-register based > implementation instead. The reason we're not using %fs and %gs in the > kernel anymore is because of the setup slowness, but perhaps using > them (use %fs since it's much more likely to be NULL and thus faster > to restore) would be faster than using %cr2? It may be. Likewise its not clear if %cr2 should hold current or a cpu ident pointer (so you dont reload on switch of task). This needs more benchmarking. Its in current -ac to verify the theory is correct not the tuning. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 10:58 ` Alan Cox @ 2001-11-06 17:04 ` Linus Torvalds 2001-11-06 17:46 ` Alan Cox 0 siblings, 1 reply; 59+ messages in thread From: Linus Torvalds @ 2001-11-06 17:04 UTC (permalink / raw) To: linux-kernel In article <E1613vx-00005r-00@the-village.bc.nu>, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > >It may be. Likewise its not clear if %cr2 should hold current or a cpu ident >pointer (so you dont reload on switch of task). This needs more >benchmarking. Its in current -ac to verify the theory is correct not the >tuning. We pretty much know the _theory_ is not correct, just by virtue of depending on non-architected behaviour. The only thing -ac can do is test whether it works in practice. Which is a totally different thing. Especially on x86 chips. Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:04 ` Linus Torvalds @ 2001-11-06 17:46 ` Alan Cox 2001-11-06 17:59 ` Linus Torvalds 0 siblings, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-06 17:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel > We pretty much know the _theory_ is not correct, just by virtue of > depending on non-architected behaviour. The only thing -ac can do is > test whether it works in practice. Which is a totally different thing. Yep > Especially on x86 chips. Well so far I've found one laptop that eats %cr2 on APM calls, and we have some mystery cases. Peter's suggestion of using %fs or %gs looks more promising at the moment ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:46 ` Alan Cox @ 2001-11-06 17:59 ` Linus Torvalds 2001-11-06 18:14 ` Alan Cox 0 siblings, 1 reply; 59+ messages in thread From: Linus Torvalds @ 2001-11-06 17:59 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Tue, 6 Nov 2001, Alan Cox wrote: > > > Especially on x86 chips. > > Well so far I've found one laptop that eats %cr2 on APM calls, and we have > some mystery cases. Well, APM is going away, and it should be easy enough to work around it (and I don't _think_ you can reasonably do the same in ACPI or SMM: SMM will save the whole CPU state and has to do that anyway, and ACPI doesn't actually get to touch things like %cr2). So I'd be more nervous about future CPU's just not having the register writable (or having only parts of it, or..) > Peter's suggestion of using %fs or %gs looks more > promising at the moment The problem with using a segment register is that then you have to save/restore it over system calls - pretty much whether the call needs it or not. Ie you can pretty much _guarantee_ that any system call will be slowed down by something on the order of 10-15 cycles (on a good day, some CPU's are slower at it). Same goes for task switch etc. Which is why I'd much rather just color using the high bits of %esp, and spend a few more cycles inside "get_current()". I can guarantee you that it won't slow down paths that don't even need current at all (unlike the segment register approach), and even the paths that _do_ need current will only be ~5 cycles slower (plus possible the cache miss of doing the function call, but the call-site itself will actually be slightly smaller than the current in-lined 32-bit immediate and "andl"). Using high bits of %esp has zero impact on task-switch, and makes "get_current" interrupt safe (ie switching tasks is totally atomic, as it's the one single "movl ..,%esp" instruction that does the real switch as far as the kernel is concerned). It does require using an order-2 allocation, which the current VM will allow anyway, but which is obviously nastier than an order-1. Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:59 ` Linus Torvalds @ 2001-11-06 18:14 ` Alan Cox 2001-11-06 16:55 ` Marcelo Tosatti ` (2 more replies) 0 siblings, 3 replies; 59+ messages in thread From: Alan Cox @ 2001-11-06 18:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, linux-kernel > "get_current" interrupt safe (ie switching tasks is totally atomic, as > it's the one single "movl ..,%esp" instruction that does the real switch > as far as the kernel is concerned). > > It does require using an order-2 allocation, which the current VM will > allow anyway, but which is obviously nastier than an order-1. I've seen boxes dead in the water from 8K NFS (ie 16K order-2 allocations), let alone the huge memory hit. Michael's rtlinux approach looks even more interesting and I may have to play with that (using the TSS to ident the cpu) Our memory bloat is already pretty gross in 2.4 without adding 16K task stacks to the oversided struct page, bootmem and excess double linked lists. I also need to try sticking a pointer to the task struct at the top of the stack and loading that - since that should be a cache line that isnt being shared around or swapped between processors ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:14 ` Alan Cox @ 2001-11-06 16:55 ` Marcelo Tosatti 2001-11-06 18:14 ` Linus Torvalds 2001-11-07 0:00 ` Martin Dalecki 2 siblings, 0 replies; 59+ messages in thread From: Marcelo Tosatti @ 2001-11-06 16:55 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, linux-kernel On Tue, 6 Nov 2001, Alan Cox wrote: > > "get_current" interrupt safe (ie switching tasks is totally atomic, as > > it's the one single "movl ..,%esp" instruction that does the real switch > > as far as the kernel is concerned). > > > > It does require using an order-2 allocation, which the current VM will > > allow anyway, but which is obviously nastier than an order-1. > > I've seen boxes dead in the water from 8K NFS (ie 16K order-2 allocations), > let alone the huge memory hit. Michael's rtlinux approach looks even more > interesting and I may have to play with that (using the TSS to ident the > cpu) Btw, I also want to see what intense "for-optimization" high-order allocators are going to do to the current VM. Think about the possible intensive pressure (and CPU wasted) caused by, for example, SCSI code which _always_ tries to do 1-order allocations (or bigger?) to allocate scatter/gather tables. We want those allocations to fail to 0-order allocations instead looping madly inside the VM freeing routines. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:14 ` Alan Cox 2001-11-06 16:55 ` Marcelo Tosatti @ 2001-11-06 18:14 ` Linus Torvalds 2001-11-06 18:31 ` Alan Cox 2001-11-07 0:00 ` Martin Dalecki 2 siblings, 1 reply; 59+ messages in thread From: Linus Torvalds @ 2001-11-06 18:14 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Tue, 6 Nov 2001, Alan Cox wrote: > > Our memory bloat is already pretty gross in 2.4 without adding 16K task > stacks to the oversided struct page, bootmem and excess double linked lists. There are some people who think that the 5kB stack we have now is too small ;( > I also need to try sticking a pointer to the task struct at the top of the > stack and loading that - since that should be a cache line that isnt being > shared around or swapped between processors That should work fairly well, and has the advantage that you can hide more state there if you want (ie it allows us, on demand, to move hot state of "struct task_struct" up there). There is a subset of "struct task_struct" that is basically completely local to the task, and could be advantageous to move around. Things like - need_resched/sigpending/process attributes - ptrace - processor - addr_limit are all things that we don't actually _need_ to go all the way to the task structure to fetch, and that we mostly need to modify anyway on task switch (ie "need_resched" and "processor" both need to be written on task-switch anyway, and are not touched by anything other CPU) So it would basically be a small per-CPU/thread area, not just the "struct task_struct". Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:14 ` Linus Torvalds @ 2001-11-06 18:31 ` Alan Cox 2001-11-06 22:38 ` Linus Torvalds 0 siblings, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-06 18:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, linux-kernel > > Our memory bloat is already pretty gross in 2.4 without adding 16K task > > stacks to the oversided struct page, bootmem and excess double linked lists. > > There are some people who think that the 5kB stack we have now is too > small ;( Yes but we dont want to let them win or next year 16K will be too small and then they'll want to 16K C++ stack objects. At the very least we should make them have to use really_slow_vmalloc_and_switch_to_big_temporary_stack() really_slow_vfree_and_return_to_old_stack() _and_ make them type function names that long. Granted its less of an issue in 2.5 because we can afford to finally make DMA off the stack a crime (right now its an offence but one that is violated in too many places to be sure of killing them all off) - scsi for one does it. > That should work fairly well, and has the advantage that you can hide more > state there if you want (ie it allows us, on demand, to move hot state of > "struct task_struct" up there). Sweet. Now that I'd completely missed. Task private state and task public state splitting > So it would basically be a small per-CPU/thread area, not just the "struct > task_struct". Yep Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:31 ` Alan Cox @ 2001-11-06 22:38 ` Linus Torvalds 0 siblings, 0 replies; 59+ messages in thread From: Linus Torvalds @ 2001-11-06 22:38 UTC (permalink / raw) To: linux-kernel In article <E161B0f-0001Io-00@the-village.bc.nu>, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > >> That should work fairly well, and has the advantage that you can hide more >> state there if you want (ie it allows us, on demand, to move hot state of >> "struct task_struct" up there). > >Sweet. Now that I'd completely missed. Task private state and task >public state splitting Yes. It would be a waste to have to bring in a cache-line into the L1 cache, and then only use 4 bytes of it. So it should make sense to set this up somewhat like: struct local_task_struct { struct task_struct *tsk; .. other fields .. }; and then use the _exact_ existing infrastructure to get "local_task_struct" instead of "task_struct", and let the compiler do all the rest at a higher level. So we'd just rename "get_current()" to "get_local_current()", and then do #define get_current() (get_local_current()->tsk) and people who want to know about the local task struct can use that. Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:14 ` Alan Cox 2001-11-06 16:55 ` Marcelo Tosatti 2001-11-06 18:14 ` Linus Torvalds @ 2001-11-07 0:00 ` Martin Dalecki 2001-11-06 23:19 ` Alan Cox 2 siblings, 1 reply; 59+ messages in thread From: Martin Dalecki @ 2001-11-07 0:00 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, linux-kernel Alan Cox wrote: > > > "get_current" interrupt safe (ie switching tasks is totally atomic, as > > it's the one single "movl ..,%esp" instruction that does the real switch > > as far as the kernel is concerned). > > > > It does require using an order-2 allocation, which the current VM will > > allow anyway, but which is obviously nastier than an order-1. > > I've seen boxes dead in the water from 8K NFS (ie 16K order-2 allocations), > let alone the huge memory hit. Michael's rtlinux approach looks even more > interesting and I may have to play with that (using the TSS to ident the > cpu) > > Our memory bloat is already pretty gross in 2.4 without adding 16K task > stacks to the oversided struct page, bootmem and excess double linked lists. If we are talking about memmory bload. Let's usk a question. Is somebody there working seriously on changing the default function call conventions on IA32 from stack parameter pushing to register passing throughout the kernel? The implications on in esp. the I-cache pressure seem to be quite significant and apparently one of there areas where the GCC got much better is precisely this. The recent comparisions of gcc against the intel compiler show as well that this may be really worth it. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 0:00 ` Martin Dalecki @ 2001-11-06 23:19 ` Alan Cox 2001-11-07 0:43 ` Martin Dalecki 2001-11-07 14:00 ` Martin Dalecki 0 siblings, 2 replies; 59+ messages in thread From: Alan Cox @ 2001-11-06 23:19 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel > If we are talking about memmory bload. Let's usk a question. Is somebody > there > working seriously on changing the default function call conventions on > IA32 Thats pure noise On a 256Mb machine you have 65536 page map entries. Those are 64 bytes but its not hard to get it down to 56 bytes (.5Mb saved) and probably to 48 bytes. We can probably also shave 8 bytes off each cached inode if not more (the nfs changes in -ac are a big help there already) - thats typically another 200K on a reasonable size box - and the new bootmem code can save a chunk too Im not sure how much the code change for function call patterns would be but I doubt its so big for such little effort Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 23:19 ` Alan Cox @ 2001-11-07 0:43 ` Martin Dalecki 2001-11-07 0:27 ` Alan Cox 2001-11-07 0:35 ` Jeff Garzik 2001-11-07 14:00 ` Martin Dalecki 1 sibling, 2 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-07 0:43 UTC (permalink / raw) To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel Alan Cox wrote: > > > If we are talking about memmory bload. Let's usk a question. Is somebody > > there > > working seriously on changing the default function call conventions on > > IA32 > > Thats pure noise > > On a 256Mb machine you have 65536 page map entries. Those are 64 bytes but > its not hard to get it down to 56 bytes (.5Mb saved) and probably to 48 > bytes. We can probably also shave 8 bytes off each cached inode if not > more (the nfs changes in -ac are a big help there already) - thats typically > another 200K on a reasonable size box - and the new bootmem code can save a > chunk too > > Im not sure how much the code change for function call patterns would be > but I doubt its so big for such little effort Please count the removal of the *very* sparse read_ahead array as well (patch went to this list a long time ago) in. It doesn't cost anything and saves some few pages depending on the number of drivers you have loaded... (Well in comparision to the above that's nit picking, but...) And then there is the overloaded struct inde. It would be worth quite a bit of memmory to not overlay the private,filesystem specific parts but to attach them by a pointer instead, in esp. if you make this in a way where the private part would be used without the public interface in drivers. Currently the most rudiculous inode layout is deterministic for the overall size in the compiled kernel. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 0:43 ` Martin Dalecki @ 2001-11-07 0:27 ` Alan Cox 2001-11-07 0:35 ` Jeff Garzik 1 sibling, 0 replies; 59+ messages in thread From: Alan Cox @ 2001-11-07 0:27 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel > Please count the removal of the *very* sparse read_ahead array as > well (patch went to this list a long time ago) in. > It doesn't cost anything and saves some few pages depending on the > number of drivers you have loaded... (Well in comparision to the above > that's nit picking, but...) Sounds quite believable. Several of the hashes are oversize too it seems > And then there is the overloaded struct inde. It would be worth > quite a bit of memmory to not overlay the private,filesystem > specific parts but to attach them by a pointer instead, in esp. Thats what -ac has started doing. Al Viro has done the worst case ones so far. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 0:43 ` Martin Dalecki 2001-11-07 0:27 ` Alan Cox @ 2001-11-07 0:35 ` Jeff Garzik 1 sibling, 0 replies; 59+ messages in thread From: Jeff Garzik @ 2001-11-07 0:35 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel Martin Dalecki wrote: > And then there is the overloaded struct inde. It would be worth > quite a bit of memmory to not overlay the private,filesystem > specific parts but to attach them by a pointer instead, in esp. > if you make this in a way where the private part would be used > without the public interface in drivers. I think there are plans for several filesystems to use the generic_ip and generic_sbp members of the unions, instead of further adding to the unions. FreeVxFS is an example of one such filesystem which already does this. -- Jeff Garzik | Only so many songs can be sung Building 1024 | with two lips, two lungs, and one tongue. MandrakeSoft | - nomeansno ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 23:19 ` Alan Cox 2001-11-07 0:43 ` Martin Dalecki @ 2001-11-07 14:00 ` Martin Dalecki 2001-11-07 13:38 ` Alan Cox 1 sibling, 1 reply; 59+ messages in thread From: Martin Dalecki @ 2001-11-07 14:00 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, linux-kernel Alan Cox wrote: > Im not sure how much the code change for function call patterns would be > but I doubt its so big for such little effort Let numbers talk to us, or allow me to quote the georieously politically incorrect Dave: "Numbers talk - billshit walks!": Without register passing, we have the following size situation: text data bss dec hex filename 1332132 260804 288080 1881016 1cb3b8 vmlinux With the following options enabled we get: -freg-struct-return -mrtd -mregparm=3 text data bss dec hex filename 1302372 260804 288080 1851256 1c3f78 vmlinux Quite significant difference if you ask me!!! With the following options enabled we get: -mrtd -mregparm=3 text data bss dec hex filename 1302404 260804 288080 1851288 1c3f98 vmlinux Here it's just a few bytes here and there not really significant, becouse the kernel apparently doesn't use structs as return values frequently. With the following options enabled we get: -mregparm=3 text data bss dec hex filename 1303476 260804 288080 1852360 1c43c8 vmlinux So apparently the -mrtd options is quite significant as well. With the following options enabled we get: -mregparm=2 text data bss dec hex filename 1307876 260804 288080 1856760 1c54f8 vmlinux As expected the influence here isn't too significant. So the conclusion is that apparetly the change in calling convention can result in a saving of about 2.3% in code size. This may not sound grat in relative numbers, but for a compiler designer this would already sound hilarious and in absolute numbers it's: 29760 bytes. Not withstanding the speed improvement... Oh for compleatness sake, the compiler used was: gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-99) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:00 ` Martin Dalecki @ 2001-11-07 13:38 ` Alan Cox 2001-11-07 14:59 ` Martin Dalecki ` (2 more replies) 0 siblings, 3 replies; 59+ messages in thread From: Alan Cox @ 2001-11-07 13:38 UTC (permalink / raw) To: Martin Dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel > With the following options enabled we get: > -freg-struct-return -mrtd -mregparm=3 > > text data bss dec hex filename > 1302372 260804 288080 1851256 1c3f78 vmlinux > > Quite significant difference if you ask me!!! 30K is nice have but still a scratch on the surface compared with 500K 8) > in a saving of about 2.3% in code size. This may not sound grat in > relative > numbers, but for a compiler designer this would already sound hilarious > and in > absolute numbers it's: 29760 bytes. Not withstanding the speed > improvement... The obvious question is - have you tried running the kernel built like that with any asm fixups needed ? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 13:38 ` Alan Cox @ 2001-11-07 14:59 ` Martin Dalecki 2001-11-07 14:17 ` Alan Cox 2001-11-07 20:04 ` Using %cr2 to reference "current" Andrew Morton 2001-11-11 13:16 ` Martin Dalecki 2 siblings, 1 reply; 59+ messages in thread From: Martin Dalecki @ 2001-11-07 14:59 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, linux-kernel Alan Cox wrote: > > > With the following options enabled we get: > > -freg-struct-return -mrtd -mregparm=3 > > > > text data bss dec hex filename > > 1302372 260804 288080 1851256 1c3f78 vmlinux > > > > Quite significant difference if you ask me!!! > > 30K is nice have but still a scratch on the surface compared with 500K 8) > > > in a saving of about 2.3% in code size. This may not sound grat in > > relative > > numbers, but for a compiler designer this would already sound hilarious > > and in > > absolute numbers it's: 29760 bytes. Not withstanding the speed > > improvement... > > The obvious question is - have you tried running the kernel built like that > with any asm fixups needed ? Once a long time ago I tried already to do the fixups myself, and got to the stage of init starting... It wasn't THAT difficult. However somehow encouraged by the compiler comparisions between gcc and intel's free compiler, which use the register passing for anything local to the actual code, where the speed gains are up to 20% im currently quite inclined to do the redo and finish the experiment. BTW.> It's not just asm fixpus that have to be done for this to work. For example all the c files with -fno-omit-frame-pointer as additional compilatoin flag have to be looked seriously at again. And of course UML makes the debugging of at least this easier. -- - phone: +49 214 8656 283 - job: eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!) - langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort: ru_RU.KOI8-R ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:59 ` Martin Dalecki @ 2001-11-07 14:17 ` Alan Cox 2001-11-07 14:34 ` Dirk Moerenhout ` (4 more replies) 0 siblings, 5 replies; 59+ messages in thread From: Alan Cox @ 2001-11-07 14:17 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel > somehow encouraged by the compiler comparisions between gcc and intel's > free compiler, which use the register passing for anything local > to the actual code, where the speed gains are up to 20% im currently I was under the impression intels compiler was profoundly non-free ? > quite inclined to do the redo and finish the experiment. > BTW.> It's not just asm fixpus that have to be done for this > to work. For example all the c files with -fno-omit-frame-pointer 20% is a nice large number Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:17 ` Alan Cox @ 2001-11-07 14:34 ` Dirk Moerenhout 2001-11-07 14:54 ` Alan Cox 2001-11-07 14:39 ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl ` (3 subsequent siblings) 4 siblings, 1 reply; 59+ messages in thread From: Dirk Moerenhout @ 2001-11-07 14:34 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel > > somehow encouraged by the compiler comparisions between gcc and intel's > > free compiler, which use the register passing for anything local > > to the actual code, where the speed gains are up to 20% im currently > > I was under the impression intels compiler was profoundly non-free ? Thought that too untill a minute ago. Went to the Intel site and read the information. http://developer.intel.com/software/products/eval/ Gives details about _two_ ways to get it free. The known 30 day free trial with support but also a less known "non commercial unsupported" option. So for non-commercial use you can use it as much as you want, you just don't get support. Downloading it now to play some with it :-) Dirk Moerenhout ///// System Administrator ///// Planet Internet NV ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:34 ` Dirk Moerenhout @ 2001-11-07 14:54 ` Alan Cox 2001-11-07 15:32 ` David Howells 0 siblings, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-07 14:54 UTC (permalink / raw) To: Dirk Moerenhout; +Cc: Alan Cox, linux-kernel > Thought that too untill a minute ago. Went to the Intel site and read the > information. > > http://developer.intel.com/software/products/eval/ > Gives details about _two_ ways to get it free. The known 30 day free trial Seems to be non free to me May well be non-fee non-free but its still most definitely non-free ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:54 ` Alan Cox @ 2001-11-07 15:32 ` David Howells 0 siblings, 0 replies; 59+ messages in thread From: David Howells @ 2001-11-07 15:32 UTC (permalink / raw) To: Alan Cox; +Cc: Linux Torvalds, bcrl, linux-kernel Instead of using %cr2, how about giving each CPU it's own GDT (the GDT doesn't need to contain many entries). Have one segment number point to a CPU specific data area that contains things like the current task pointer for that CPU, the CPU number, etc, etc. This same segment number will be used on all CPU's, but will be multiplexed via the per-CPU GDTs instead. Then you can load up a segment register with this segment on entry to the kernel, and then make CPU data accesses relative to that. David ^ permalink raw reply [flat|nested] 59+ messages in thread
* Intel compiler [Re: Using %cr2 to reference "current"] 2001-11-07 14:17 ` Alan Cox 2001-11-07 14:34 ` Dirk Moerenhout @ 2001-11-07 14:39 ` Sebastian Heidl 2001-11-07 22:05 ` lists 2001-11-07 15:36 ` Using %cr2 to reference "current" Martin Dalecki ` (2 subsequent siblings) 4 siblings, 1 reply; 59+ messages in thread From: Sebastian Heidl @ 2001-11-07 14:39 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Wed, Nov 07, 2001 at 02:17:33PM +0000, Alan Cox wrote: > > somehow encouraged by the compiler comparisions between gcc and intel's > > free compiler, which use the register passing for anything local > > to the actual code, where the speed gains are up to 20% im currently > > I was under the impression intels compiler was profoundly non-free ? have a look: http://developer.intel.com/software/products/eval/ _sh_ ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Intel compiler [Re: Using %cr2 to reference "current"] 2001-11-07 14:39 ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl @ 2001-11-07 22:05 ` lists 0 siblings, 0 replies; 59+ messages in thread From: lists @ 2001-11-07 22:05 UTC (permalink / raw) To: linux-kernel; +Cc: Sebastian Heidl Just as another data point - a simple test, I ran intel compiler on flops v2. Run 3 ways - gcc3, icc (v 5) and the beta 6 icc. All run on dual p4 with 1 Gb mem on Rh 7.2 At least on this test the differences are quite dramatic. Regards, gene/ --------------------------------------------------------------------- Summary ------ gcc -DUNIX -O3 -march=i686 flops2.c icc -xMKW -o flops2 -DUNIX -O3 flops2.c FLOPS C Program (Double Precision), V2.0 18 Dec 1992 Module MFLOPS gcc icc 5 icc 6 -------- --------- ---------- 1 444.9410 439.4850 674.3180 2 265.4815 362.3862 362.3862 3 298.1843 604.0250 1270.6569 4 337.7309 1224.8804 1373.8819 5 392.7003 1138.6503 1131.7073 6 391.7678 1334.0521 1422.2222 7 163.5783 193.3900 193.5118 8 395.7743 1317.3242 1372.6542 Iterations = 512000000 512000000 512000000 NullTime (usec) = 0.0029 0.0000 0.0000 MFLOPS(1) = 275.3542 416.9120 472.8952 MFLOPS(2) = 264.7165 413.4297 448.2175 MFLOPS(3) = 339.5966 714.7146 834.5651 MFLOPS(4) = 362.1891 1071.8196 1367.5374 --------------------------------------------------------------------- On Wed, Nov 07, 2001 at 03:39:46PM +0100, Sebastian Heidl wrote: > On Wed, Nov 07, 2001 at 02:17:33PM +0000, Alan Cox wrote: > > > somehow encouraged by the compiler comparisions between gcc and intel's > > > free compiler, which use the register passing for anything local > > > to the actual code, where the speed gains are up to 20% im currently > > > > I was under the impression intels compiler was profoundly non-free ? > > have a look: > http://developer.intel.com/software/products/eval/ ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:17 ` Alan Cox 2001-11-07 14:34 ` Dirk Moerenhout 2001-11-07 14:39 ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl @ 2001-11-07 15:36 ` Martin Dalecki 2001-11-08 14:08 ` Martin Dalecki 2001-11-13 16:49 ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki 4 siblings, 0 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-07 15:36 UTC (permalink / raw) To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel Alan Cox wrote: > > > somehow encouraged by the compiler comparisions between gcc and intel's > > free compiler, which use the register passing for anything local > > to the actual code, where the speed gains are up to 20% im currently > > I was under the impression intels compiler was profoundly non-free ? Well it's free in terms of money, read: download and "personal usage" blabla. This doesn't deterr me from having a look at it ;-). > > > quite inclined to do the redo and finish the experiment. > > BTW.> It's not just asm fixpus that have to be done for this > > to work. For example all the c files with -fno-omit-frame-pointer > > 20% is a nice large number. Yes I was impressed as well and twiddeling with compiler flags is actually indicating that the calling convention stuff is one of the main contributors to this. BTW.> The speed differences can go up to 40% for floating point, OK this is irrelevant for the kernel but it is showing very well that there is still plenty of room for improvement. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 14:17 ` Alan Cox ` (2 preceding siblings ...) 2001-11-07 15:36 ` Using %cr2 to reference "current" Martin Dalecki @ 2001-11-08 14:08 ` Martin Dalecki 2001-11-13 16:49 ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki 4 siblings, 0 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-08 14:08 UTC (permalink / raw) To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel Alan Cox wrote: > > > somehow encouraged by the compiler comparisions between gcc and intel's > > free compiler, which use the register passing for anything local > > to the actual code, where the speed gains are up to 20% im currently > > I was under the impression intels compiler was profoundly non-free ? > > > quite inclined to do the redo and finish the experiment. > > BTW.> It's not just asm fixpus that have to be done for this > > to work. For example all the c files with -fno-omit-frame-pointer > > 20% is a nice large number I just wanted to note that I got already the wohle fixup until the stage where the first schedule() occures during the kernel initialization... printk and so on all seem to work nicely ;-). Well the where some errors which had to be fixed until this. For example the decompress_kernel function should have the attribute asmlinkage and boot/compressed/misc.c should not export enything else. Further debugging will occur this evening... ^ permalink raw reply [flat|nested] 59+ messages in thread
* Merge BUG in 2.4.15-pre4 serial.c 2001-11-07 14:17 ` Alan Cox ` (3 preceding siblings ...) 2001-11-08 14:08 ` Martin Dalecki @ 2001-11-13 16:49 ` Martin Dalecki 2001-11-13 16:21 ` Russell King 4 siblings, 1 reply; 59+ messages in thread From: Martin Dalecki @ 2001-11-13 16:49 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, linux-kernel I have found the following code in serial.c aorund line 5565 #ifdef __i386__ if (i == NR_PORTS) { for (i = 4; i < NR_PORTS; i++) if ((rs_table[i].type == PORT_UNKNOWN) && (rs_table[i].count == 0)) break; } #endif if (i == NR_PORTS) { for (i = 0; i < NR_PORTS; i++) if ((rs_table[i].type == PORT_UNKNOWN) && (rs_table[i].count == 0)) break; } This is supposedly the result of applying some patch twice. Let me guess the first 8 lines of this can be deleted. Regards! ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Merge BUG in 2.4.15-pre4 serial.c 2001-11-13 16:49 ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki @ 2001-11-13 16:21 ` Russell King 2001-11-13 17:37 ` Martin Dalecki 0 siblings, 1 reply; 59+ messages in thread From: Russell King @ 2001-11-13 16:21 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel On Tue, Nov 13, 2001 at 05:49:24PM +0100, Martin Dalecki wrote: > I have found the following code in serial.c aorund line 5565 > > #ifdef __i386__ > if (i == NR_PORTS) { > for (i = 4; i < NR_PORTS; i++) > if ((rs_table[i].type == PORT_UNKNOWN) && > (rs_table[i].count == 0)) > break; > } > #endif > if (i == NR_PORTS) { > for (i = 0; i < NR_PORTS; i++) > if ((rs_table[i].type == PORT_UNKNOWN) && > (rs_table[i].count == 0)) > break; > } > > This is supposedly the result of applying some patch twice. > Let me guess the first 8 lines of this can be deleted. Look at it closer, in particular the for() loops. It's basically there so that on x86, we don't normally use ttyS0-3 for pcmcia and other similar ports, unless we run out of other ports to use. -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Merge BUG in 2.4.15-pre4 serial.c 2001-11-13 16:21 ` Russell King @ 2001-11-13 17:37 ` Martin Dalecki 2001-11-13 16:53 ` Russell King 2001-11-13 17:11 ` Alan Cox 0 siblings, 2 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-13 17:37 UTC (permalink / raw) To: Russell King; +Cc: dalecki, Alan Cox, Linus Torvalds, linux-kernel Russell King wrote: > > On Tue, Nov 13, 2001 at 05:49:24PM +0100, Martin Dalecki wrote: > > I have found the following code in serial.c aorund line 5565 > > > > #ifdef __i386__ > > if (i == NR_PORTS) { > > for (i = 4; i < NR_PORTS; i++) > > if ((rs_table[i].type == PORT_UNKNOWN) && > > (rs_table[i].count == 0)) > > break; > > } > > #endif > > if (i == NR_PORTS) { > > for (i = 0; i < NR_PORTS; i++) > > if ((rs_table[i].type == PORT_UNKNOWN) && > > (rs_table[i].count == 0)) > > break; > > } > > > > This is supposedly the result of applying some patch twice. > > Let me guess the first 8 lines of this can be deleted. > > Look at it closer, in particular the for() loops. > > It's basically there so that on x86, we don't normally use ttyS0-3 > for pcmcia and other similar ports, unless we run out of other ports > to use. Well I still think that the 8 lines can be deleted. Once again my famous notbook is perfectly __i386__ and doesn't contain any devices served by serial.c unless I configure IrDA. Pushing the port numbers artificially behind doesn't make sense for me and makes some setserial unknown tricks neccessary for irtty setup. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Merge BUG in 2.4.15-pre4 serial.c 2001-11-13 17:37 ` Martin Dalecki @ 2001-11-13 16:53 ` Russell King 2001-11-13 18:05 ` Martin Dalecki 2001-11-13 17:11 ` Alan Cox 1 sibling, 1 reply; 59+ messages in thread From: Russell King @ 2001-11-13 16:53 UTC (permalink / raw) To: dalecki; +Cc: linux-kernel On Tue, Nov 13, 2001 at 06:37:54PM +0100, Martin Dalecki wrote: > Pushing the port numbers artificially behind doesn't make sense for me > and makes some setserial unknown tricks neccessary for irtty setup. The key words here are "for me". What setserial "unknown tricks" are you referring to? -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Merge BUG in 2.4.15-pre4 serial.c 2001-11-13 16:53 ` Russell King @ 2001-11-13 18:05 ` Martin Dalecki 0 siblings, 0 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-13 18:05 UTC (permalink / raw) To: Russell King; +Cc: dalecki, linux-kernel Russell King wrote: > > On Tue, Nov 13, 2001 at 06:37:54PM +0100, Martin Dalecki wrote: > > Pushing the port numbers artificially behind doesn't make sense for me > > and makes some setserial unknown tricks neccessary for irtty setup. > > The key words here are "for me". > > What setserial "unknown tricks" are you referring to? I referr to the IrDA-HOWTO problems with the serial driver I think this may be precisely the cause of the culprit. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Merge BUG in 2.4.15-pre4 serial.c 2001-11-13 17:37 ` Martin Dalecki 2001-11-13 16:53 ` Russell King @ 2001-11-13 17:11 ` Alan Cox 2001-11-13 18:23 ` Martin Dalecki 1 sibling, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-13 17:11 UTC (permalink / raw) To: dalecki; +Cc: Russell King, Alan Cox, Linus Torvalds, linux-kernel > Well I still think that the 8 lines can be deleted. Once again my famous > notbook is perfectly __i386__ and doesn't contain any devices served by > serial.c > unless I configure IrDA. Pushing the port numbers artificially behind > doesn't make sense for me and makes some setserial unknown tricks > neccessary Renumbering everyones serial ports by suprise seems to be a 2.5 thing ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Merge BUG in 2.4.15-pre4 serial.c 2001-11-13 17:11 ` Alan Cox @ 2001-11-13 18:23 ` Martin Dalecki 0 siblings, 0 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-13 18:23 UTC (permalink / raw) To: Alan Cox; +Cc: dalecki, Russell King, Linus Torvalds, linux-kernel Alan Cox wrote: > > > Well I still think that the 8 lines can be deleted. Once again my famous > > notbook is perfectly __i386__ and doesn't contain any devices served by > > serial.c > > unless I configure IrDA. Pushing the port numbers artificially behind > > doesn't make sense for me and makes some setserial unknown tricks > > neccessary > > Renumbering everyones serial ports by suprise seems to be a 2.5 thing OK that's an argument to which I fully agree. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 13:38 ` Alan Cox 2001-11-07 14:59 ` Martin Dalecki @ 2001-11-07 20:04 ` Andrew Morton 2001-11-11 13:16 ` Martin Dalecki 2 siblings, 0 replies; 59+ messages in thread From: Andrew Morton @ 2001-11-07 20:04 UTC (permalink / raw) To: Alan Cox; +Cc: Martin Dalecki, linux-kernel Alan Cox wrote: > > > With the following options enabled we get: > > -freg-struct-return -mrtd -mregparm=3 > > > > text data bss dec hex filename > > 1302372 260804 288080 1851256 1c3f78 vmlinux > > > > Quite significant difference if you ask me!!! > > 30K is nice have but still a scratch on the surface compared with 500K 8) > It's a lot of L1 though. If this sort of change breaks the ability to build with conventional argument passing and no-omit-frame-pointer then the happy kgdb users of this world will be most aggrieved. - ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-07 13:38 ` Alan Cox 2001-11-07 14:59 ` Martin Dalecki 2001-11-07 20:04 ` Using %cr2 to reference "current" Andrew Morton @ 2001-11-11 13:16 ` Martin Dalecki 2001-11-11 13:06 ` Keith Owens 2001-11-12 11:28 ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki 2 siblings, 2 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-11 13:16 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, linux-kernel Alan Cox wrote: > > > With the following options enabled we get: > > -freg-struct-return -mrtd -mregparm=3 > > > > text data bss dec hex filename > > 1302372 260804 288080 1851256 1c3f78 vmlinux > > > > Quite significant difference if you ask me!!! > > 30K is nice have but still a scratch on the surface compared with 500K 8) > > > in a saving of about 2.3% in code size. This may not sound grat in > > relative > > numbers, but for a compiler designer this would already sound hilarious > > and in > > absolute numbers it's: 29760 bytes. Not withstanding the speed > > improvement... > > The obvious question is - have you tried running the kernel built like that > with any asm fixups needed ? I have now a nice kernel at home, compiled with -mredparm=3 up and going. Full interactive session, full kernel compiles working, X11 whatsup. Everything seems fine so far. However I still have to build a RPM-feature grade kernel and test it. Further the precise benchmarking will take some time as well. I think that I will in esp. use the byte benchmark, since it is quite "kernel intensive" at some parts. Patch will follow on monday (if nothing comes in between...). ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-11 13:16 ` Martin Dalecki @ 2001-11-11 13:06 ` Keith Owens 2001-11-12 11:28 ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki 1 sibling, 0 replies; 59+ messages in thread From: Keith Owens @ 2001-11-11 13:06 UTC (permalink / raw) To: dalecki; +Cc: linux-kernel On Sun, 11 Nov 2001 14:16:36 +0100, Martin Dalecki <dalecki@evision-ventures.com> wrote: >I have now a nice kernel at home, compiled with -mredparm=3 up >... Patch will follow on monday Compiling the kernel with mregparm is going to play havoc with binary only modules (BOMs), interface mismatches all over the place. I know we do not support BOMs but there is a big difference between not supporting them and having them actively destroy the kernel because of different calling sequences. A new feature of kbuild 2.5 is defining which CONFIG options are critical, any change to any critical config option forces a complete kernel rebuild. Modutils 2.5 will also refuse to load a module if its critical config options are different from the kernel. The current list of critical options is CONFIG_SMP UP modules in SMP kernel or vice versa just go splat. This replaces the modversions '_smp' prefix. CONFIG_KBUILD_GCC_VERSION Inserting a module compiled with gcc 3.0.1 into a kernel compiled with gcc 3.0.2 is a receipe for disaster. Kernel and module must be built with the same compiler. Any changes that affect the ABI for modules must be handled via config options and those options must be on the critical list in 2.5. Please add CONFIG_MREGPARM with a huge warning that, until kbuild 2.5 and modutils 2.5 are available, inserting a BOM is likely to destroy a kernel compiled with CONFIG_MREGPARM. ^ permalink raw reply [flat|nested] 59+ messages in thread
* PATCH 2.4.14 mregparm=3 compilation fixes 2001-11-11 13:16 ` Martin Dalecki 2001-11-11 13:06 ` Keith Owens @ 2001-11-12 11:28 ` Martin Dalecki 2001-11-12 16:10 ` Keith Owens 2001-11-12 16:42 ` Linus Torvalds 1 sibling, 2 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-12 11:28 UTC (permalink / raw) To: Alan Cox, Linus Torvalds, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1027 bytes --] Hello out there! The attached patch is fixing compilation and running of the kernel with -mregparm=3 on IA32. The fixes excluding the change in arch/i386/Makefile of course apply to the stock kernel as well, so Linus please include it in 2.4.15 - it just won't hurt... Well the benchmarks I intended to do (i.e. the byte unix bench) where not quite conclusive, so I include the results here just for reference. They where done on a PIII Celeron notebook running at 700 MHz with 192 of RAM. - reparm3.report was gathered with the patch applied. - report was probed without the patch applied. Maybe someone with more time and who has the proper infrastructure at hand may provide here some more fine grained tests? The patch itself turned out to be much smaller and simpler than what I did expect. However the space savings are quite significant, in esp. respective a so small change in the kernel... BTW. The -pipe compiler options doesn't give any speed advantage on systems where /tmp is on tmpfs anylonger! Have fun! [-- Attachment #2: mregparm.patch --] [-- Type: text/plain, Size: 4136 bytes --] diff -ur linux-2.4.14-2/arch/i386/Makefile linux-mdcki/arch/i386/Makefile --- linux-2.4.14-2/arch/i386/Makefile Thu Apr 12 21:20:31 2001 +++ linux-mdcki/arch/i386/Makefile Sat Nov 10 00:07:17 2001 @@ -21,7 +21,7 @@ LDFLAGS=-e stext LINKFLAGS =-T $(TOPDIR)/arch/i386/vmlinux.lds $(LDFLAGS) -CFLAGS += -pipe +CFLAGS += -freg-struct-return -mregparm=3 # prevent gcc from keeping the stack 16 byte aligned CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi) diff -ur linux-2.4.14-2/arch/i386/boot/compressed/misc.c linux-mdcki/arch/i386/boot/compressed/misc.c --- linux-2.4.14-2/arch/i386/boot/compressed/misc.c Fri Oct 5 03:42:54 2001 +++ linux-mdcki/arch/i386/boot/compressed/misc.c Sat Nov 10 00:02:08 2001 @@ -9,6 +9,7 @@ * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996 */ +#include <linux/linkage.h> #include <linux/vmalloc.h> #include <linux/tty.h> #include <asm/io.h> @@ -304,7 +305,7 @@ short b; } stack_start = { & user_stack [STACK_SIZE] , __KERNEL_DS }; -void setup_normal_output_buffer(void) +static void setup_normal_output_buffer(void) { #ifdef STANDARD_MEMORY_BIOS_CALL if (EXT_MEM_K < 1024) error("Less than 2MB of memory.\n"); @@ -320,7 +321,7 @@ uch *high_buffer_start; int hcount; }; -void setup_output_buffer_if_we_run_high(struct moveparams *mv) +static void setup_output_buffer_if_we_run_high(struct moveparams *mv) { high_buffer_start = (uch *)(((ulg)&end) + HEAP_SIZE); #ifdef STANDARD_MEMORY_BIOS_CALL @@ -342,7 +343,7 @@ mv->high_buffer_start = high_buffer_start; } -void close_output_buffer_if_we_run_high(struct moveparams *mv) +static void close_output_buffer_if_we_run_high(struct moveparams *mv) { if (bytes_out > low_buffer_size) { mv->lcount = low_buffer_size; @@ -355,7 +356,7 @@ } -int decompress_kernel(struct moveparams *mv, void *rmode) +asmlinkage int decompress_kernel(struct moveparams *mv, void *rmode) { real_mode = rmode; diff -ur linux-2.4.14-2/arch/i386/kernel/bluesmoke.c linux-mdcki/arch/i386/kernel/bluesmoke.c --- linux-2.4.14-2/arch/i386/kernel/bluesmoke.c Thu Oct 11 18:04:57 2001 +++ linux-mdcki/arch/i386/kernel/bluesmoke.c Sat Nov 10 02:24:25 2001 @@ -100,11 +100,11 @@ /* * Call the installed machine check handler for this CPU setup. - */ - + */ + static void (*machine_check_vector)(struct pt_regs *, long error_code) = unexpected_machine_check; -void do_machine_check(struct pt_regs * regs, long error_code) +asmlinkage void do_machine_check(struct pt_regs * regs, long error_code) { machine_check_vector(regs, error_code); } diff -ur linux-2.4.14-2/arch/i386/math-emu/fpu_proto.h linux-mdcki/arch/i386/math-emu/fpu_proto.h --- linux-2.4.14-2/arch/i386/math-emu/fpu_proto.h Wed Dec 10 02:57:09 1997 +++ linux-mdcki/arch/i386/math-emu/fpu_proto.h Sat Nov 10 02:31:22 2001 @@ -53,7 +53,7 @@ extern void fst_i_(void); extern void fstp_i(void); /* fpu_entry.c */ -extern void math_emulate(long arg); +asmlinkage extern void math_emulate(long arg); extern void math_abort(struct info *info, unsigned int signal); /* fpu_etc.c */ extern void FPU_etc(void); diff -ur linux-2.4.14-2/include/linux/kernel.h linux-mdcki/include/linux/kernel.h --- linux-2.4.14-2/include/linux/kernel.h Fri Nov 9 20:11:22 2001 +++ linux-mdcki/include/linux/kernel.h Sun Nov 11 12:35:46 2001 @@ -51,7 +51,7 @@ extern struct notifier_block *panic_notifier_list; NORET_TYPE void panic(const char * fmt, ...) __attribute__ ((NORET_AND format (printf, 1, 2))); -NORET_TYPE void do_exit(long error_code) +asmlinkage NORET_TYPE void do_exit(long error_code) ATTRIB_NORET; NORET_TYPE void complete_and_exit(struct completion *, long) ATTRIB_NORET; diff -ur linux-2.4.14-2/kernel/sched.c linux-mdcki/kernel/sched.c --- linux-2.4.14-2/kernel/sched.c Fri Nov 9 19:56:42 2001 +++ linux-mdcki/kernel/sched.c Sat Nov 10 02:07:01 2001 @@ -515,7 +515,7 @@ #endif /* CONFIG_SMP */ } -void schedule_tail(struct task_struct *prev) +asmlinkage void schedule_tail(struct task_struct *prev) { __schedule_tail(prev); } [-- Attachment #3: regparm3.report --] [-- Type: text/plain, Size: 3083 bytes --] BYTE UNIX Benchmarks (Version 3.11) System -- Linux kozaczek 2.4.14-mdcki #15 nie lis 11 12:35:45 CET 2001 i686 unknown Start Benchmark Run: nie lis 11 14:40:32 CET 2001 1 interactive users. Dhrystone 2 without register variables 1263066.6 lps (10 secs, 6 samples) Dhrystone 2 using register variables 1264480.5 lps (10 secs, 6 samples) Arithmetic Test (type = arithoh) 3179144.1 lps (10 secs, 6 samples) Arithmetic Test (type = register) 188804.1 lps (10 secs, 6 samples) Arithmetic Test (type = short) 190760.8 lps (10 secs, 6 samples) Arithmetic Test (type = int) 188823.6 lps (10 secs, 6 samples) Arithmetic Test (type = long) 189990.7 lps (10 secs, 6 samples) Arithmetic Test (type = float) 182915.1 lps (10 secs, 6 samples) Arithmetic Test (type = double) 183937.8 lps (10 secs, 6 samples) System Call Overhead Test 363784.1 lps (10 secs, 6 samples) Pipe Throughput Test 415828.7 lps (10 secs, 6 samples) Pipe-based Context Switching Test 196984.2 lps (10 secs, 6 samples) Process Creation Test 3378.5 lps (10 secs, 6 samples) Execl Throughput Test 619.3 lps (9 secs, 6 samples) File Read (10 seconds) 1327798.0 KBps (10 secs, 6 samples) File Write (10 seconds) 138593.0 KBps (10 secs, 6 samples) File Copy (10 seconds) 19076.0 KBps (10 secs, 6 samples) File Read (30 seconds) 1337240.0 KBps (30 secs, 6 samples) File Write (30 seconds) 147663.0 KBps (30 secs, 6 samples) File Copy (30 seconds) 14968.0 KBps (30 secs, 6 samples) C Compiler Test 388.7 lpm (60 secs, 3 samples) Shell scripts (1 concurrent) 1065.8 lpm (60 secs, 3 samples) Shell scripts (2 concurrent) 562.8 lpm (60 secs, 3 samples) Shell scripts (4 concurrent) 287.0 lpm (60 secs, 3 samples) Shell scripts (8 concurrent) 146.0 lpm (60 secs, 3 samples) Dc: sqrt(2) to 99 decimal places 28902.2 lpm (60 secs, 6 samples) Recursion Test--Tower of Hanoi 16393.7 lps (10 secs, 6 samples) INDEX VALUES TEST BASELINE RESULT INDEX Arithmetic Test (type = double) 2541.7 183937.8 72.4 Dhrystone 2 without register variables 22366.3 1263066.6 56.5 Execl Throughput Test 16.5 619.3 37.5 File Copy (30 seconds) 179.0 14968.0 83.6 Pipe-based Context Switching Test 1318.5 196984.2 149.4 Shell scripts (8 concurrent) 4.0 146.0 36.5 ========= SUM of 6 items 435.9 AVERAGE 72.6 [-- Attachment #4: report --] [-- Type: text/plain, Size: 3077 bytes --] BYTE UNIX Benchmarks (Version 3.11) System -- Linux kozaczek 2.4.14-2 #1 pi± lis 9 22:22:10 CET 2001 i686 unknown Start Benchmark Run: nie lis 11 16:10:53 CET 2001 1 interactive users. Dhrystone 2 without register variables 1263134.8 lps (10 secs, 6 samples) Dhrystone 2 using register variables 1263583.6 lps (10 secs, 6 samples) Arithmetic Test (type = arithoh) 3177830.7 lps (10 secs, 6 samples) Arithmetic Test (type = register) 189076.1 lps (10 secs, 6 samples) Arithmetic Test (type = short) 190665.1 lps (10 secs, 6 samples) Arithmetic Test (type = int) 188753.5 lps (10 secs, 6 samples) Arithmetic Test (type = long) 190094.2 lps (10 secs, 6 samples) Arithmetic Test (type = float) 182872.2 lps (10 secs, 6 samples) Arithmetic Test (type = double) 183902.9 lps (10 secs, 6 samples) System Call Overhead Test 360235.7 lps (10 secs, 6 samples) Pipe Throughput Test 421456.7 lps (10 secs, 6 samples) Pipe-based Context Switching Test 194915.8 lps (10 secs, 6 samples) Process Creation Test 3605.4 lps (10 secs, 6 samples) Execl Throughput Test 608.6 lps (9 secs, 6 samples) File Read (10 seconds) 1294487.0 KBps (10 secs, 6 samples) File Write (10 seconds) 138403.0 KBps (10 secs, 6 samples) File Copy (10 seconds) 19158.0 KBps (10 secs, 6 samples) File Read (30 seconds) 1278293.0 KBps (30 secs, 6 samples) File Write (30 seconds) 147556.0 KBps (30 secs, 6 samples) File Copy (30 seconds) 15129.0 KBps (30 secs, 6 samples) C Compiler Test 388.8 lpm (60 secs, 3 samples) Shell scripts (1 concurrent) 1063.2 lpm (60 secs, 3 samples) Shell scripts (2 concurrent) 563.1 lpm (60 secs, 3 samples) Shell scripts (4 concurrent) 287.4 lpm (60 secs, 3 samples) Shell scripts (8 concurrent) 145.7 lpm (60 secs, 3 samples) Dc: sqrt(2) to 99 decimal places 28576.1 lpm (60 secs, 6 samples) Recursion Test--Tower of Hanoi 16445.3 lps (10 secs, 6 samples) INDEX VALUES TEST BASELINE RESULT INDEX Arithmetic Test (type = double) 2541.7 183902.9 72.4 Dhrystone 2 without register variables 22366.3 1263134.8 56.5 Execl Throughput Test 16.5 608.6 36.9 File Copy (30 seconds) 179.0 15129.0 84.5 Pipe-based Context Switching Test 1318.5 194915.8 147.8 Shell scripts (8 concurrent) 4.0 145.7 36.4 ========= SUM of 6 items 434.5 AVERAGE 72.4 ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: PATCH 2.4.14 mregparm=3 compilation fixes 2001-11-12 11:28 ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki @ 2001-11-12 16:10 ` Keith Owens 2001-11-12 16:25 ` Christoph Hellwig 2001-11-12 17:56 ` Martin Dalecki 2001-11-12 16:42 ` Linus Torvalds 1 sibling, 2 replies; 59+ messages in thread From: Keith Owens @ 2001-11-12 16:10 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel On Mon, 12 Nov 2001 12:28:33 +0100, Martin Dalecki <dalecki@evision-ventures.com> wrote: >diff -ur linux-2.4.14-2/arch/i386/Makefile linux-mdcki/arch/i386/Makefile >--- linux-2.4.14-2/arch/i386/Makefile Thu Apr 12 21:20:31 2001 >+++ linux-mdcki/arch/i386/Makefile Sat Nov 10 00:07:17 2001 >@@ -21,7 +21,7 @@ > LDFLAGS=-e stext > LINKFLAGS =-T $(TOPDIR)/arch/i386/vmlinux.lds $(LDFLAGS) > >-CFLAGS += -pipe >+CFLAGS += -freg-struct-return -mregparm=3 > > # prevent gcc from keeping the stack 16 byte aligned > CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi) Setting mregparm must be a CONFIG_ option, with a huge warning that A) Changing CONFIG_MREGPARM requires make mrproper. B) Loading binary only modules into a kernel compiled with mregparm is even more likely to destroy your kernel. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: PATCH 2.4.14 mregparm=3 compilation fixes 2001-11-12 16:10 ` Keith Owens @ 2001-11-12 16:25 ` Christoph Hellwig 2001-11-12 17:56 ` Martin Dalecki 1 sibling, 0 replies; 59+ messages in thread From: Christoph Hellwig @ 2001-11-12 16:25 UTC (permalink / raw) To: Keith Owens, dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel In article <4300.1005581402@ocs3.intra.ocs.com.au> you wrote: > Setting mregparm must be a CONFIG_ option, with a huge warning that > > A) Changing CONFIG_MREGPARM requires make mrproper. The above patch changes the kernel to always use mregparm - it should be catched by the .flags depencies anyway. > B) Loading binary only modules into a kernel compiled with mregparm is > even more likely to destroy your kernel. Nope - people who uses those are just doomed. Christoph -- Of course it doesn't work. We've performed a software upgrade. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: PATCH 2.4.14 mregparm=3 compilation fixes 2001-11-12 16:10 ` Keith Owens 2001-11-12 16:25 ` Christoph Hellwig @ 2001-11-12 17:56 ` Martin Dalecki 1 sibling, 0 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-12 17:56 UTC (permalink / raw) To: Keith Owens; +Cc: dalecki, Alan Cox, Linus Torvalds, linux-kernel Keith Owens wrote: > > On Mon, 12 Nov 2001 12:28:33 +0100, > Martin Dalecki <dalecki@evision-ventures.com> wrote: > >diff -ur linux-2.4.14-2/arch/i386/Makefile linux-mdcki/arch/i386/Makefile > >--- linux-2.4.14-2/arch/i386/Makefile Thu Apr 12 21:20:31 2001 > >+++ linux-mdcki/arch/i386/Makefile Sat Nov 10 00:07:17 2001 > >@@ -21,7 +21,7 @@ > > LDFLAGS=-e stext > > LINKFLAGS =-T $(TOPDIR)/arch/i386/vmlinux.lds $(LDFLAGS) > > > >-CFLAGS += -pipe > >+CFLAGS += -freg-struct-return -mregparm=3 > > > > # prevent gcc from keeping the stack 16 byte aligned > > CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi) > > Setting mregparm must be a CONFIG_ option, with a huge warning that > > A) Changing CONFIG_MREGPARM requires make mrproper. > > B) Loading binary only modules into a kernel compiled with mregparm is > even more likely to destroy your kernel. Ehmm... In fact my feelings about this are that _this part_ of the patch _should not_ be included in the mainstream kernel at all. It should may be made just the default (in 2.5 perhaps) if it turns out that the performance code size and so on gains are worth it, since I didn't encounter any problems thus far even with a "distro RPM grade kernel" containing USB TCP and what a not. GCC real got better over the last years! So there is no real need for an option at all in my oppinion. We have already enough of them. The REST OF THE PATCH is containing only pure true clear cut bugfixes which should be applied STRAIGHT away. Those fixes do not influence the current compilation output at all (with the exception of hiding not externaly used global symbols in misc.c). But they enable somebody who knows what he is doing to add the above CFLAGS for his system to gain a significant amount of free speace for example in the PROM or to gain a bit of performance - supposedly. I hope this makes my intentions clear. OK? BTW.> Try it out it doesn't interferre with any module handling. However your objections about binary only modules I just don't share - becouse I just don't care about them... In esp. my nonexistant interrest in computer games doesn't oppress me to by any nvida graphics cards. Pure nice old Mach64 - which always was one of the most UNIX friendly VGA designs ever just makes it fine for me ;-). ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: PATCH 2.4.14 mregparm=3 compilation fixes 2001-11-12 11:28 ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki 2001-11-12 16:10 ` Keith Owens @ 2001-11-12 16:42 ` Linus Torvalds 2001-11-12 18:51 ` Martin Dalecki 1 sibling, 1 reply; 59+ messages in thread From: Linus Torvalds @ 2001-11-12 16:42 UTC (permalink / raw) To: dalecki; +Cc: Alan Cox, linux-kernel On Mon, 12 Nov 2001, Martin Dalecki wrote: > > The attached patch is fixing compilation and running > of the kernel with -mregparm=3 on IA32. The fixes excluding > the change in arch/i386/Makefile of course apply to the stock kernel > as well, so Linus please include it in 2.4.15 - it just won't hurt... I certainly won't enable it in the stock kernel, considering the bad track record gcc has had with regparm under register pressure, but the "asmlinkage" parts look like real fixes. However, it's kind of sad to make some of the more timing-critical stuff (like schedule_tail) be asmlinkage - it might be worth it to do it the other way around, and make it FASTCALL() and change the assembly code to pass arguments in registers. That way, the calling convention is still the same on both regparm=3 and without, but instead of defaulting to the slow method we'd default to the fast one.. Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: PATCH 2.4.14 mregparm=3 compilation fixes 2001-11-12 16:42 ` Linus Torvalds @ 2001-11-12 18:51 ` Martin Dalecki 2001-11-12 20:05 ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki 0 siblings, 1 reply; 59+ messages in thread From: Martin Dalecki @ 2001-11-12 18:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: dalecki, Alan Cox, linux-kernel Linus Torvalds wrote: > > On Mon, 12 Nov 2001, Martin Dalecki wrote: > > > > The attached patch is fixing compilation and running > > of the kernel with -mregparm=3 on IA32. The fixes excluding > > the change in arch/i386/Makefile of course apply to the stock kernel > > as well, so Linus please include it in 2.4.15 - it just won't hurt... > > I certainly won't enable it in the stock kernel, considering the bad track > record gcc has had with regparm under register pressure, but the > "asmlinkage" parts look like real fixes. Yes that was always my intention. The chunk changing the CFLAGS wasn't deleted from the patch only for the purpose of referrence. I did hope that I made this clear in my announcement, but i failed apparently ;-). Despite this I would like to make clear that I have compiled my own "RedHat 7.2" compatible kernel-RPM set with the patch applied already and didn't encounter any problems thus far... Even an ORACLE DB just started without noticing that anything changed beneath it. Since this all was done on my notebook, I can say that there where even no problems with any of the "less mature" kernel parts like USB handling, CardBus and so on and so on (Anybody please note: I didn't say "immature" just "less mature", more like "fresh" no pun intendid.) Apparently GCC got really much better in regard of this stuff recently. I'm using RedHat GCC 2.96 brand gcc-2.96-99... And I reiterate that I'm just happy running a whole kernel compiled with mregparm=3 without any anomalities thus far. > However, it's kind of sad to make some of the more timing-critical stuff > (like schedule_tail) be asmlinkage - it might be worth it to do it the > other way around, and make it FASTCALL() and change the assembly code to > pass arguments in registers. That way, the calling convention is still the > same on both regparm=3 and without, but instead of defaulting to the slow > method we'd default to the fast one.. Yes that's right. However if you look close than you will notice, that asmlinkage is quite a bad name. There should be a asmlinkage with mregparm=3 ideally and a syslinkage macro for system call entry points with mregparm=0 there. And then fixes are fixes and with the current semantics my patch is really just fixing bugs. (Tougth not "tragical" ones). So if I see this fix applied I will make the above described improvements in 2.5 ;-). They are not difficult anyway, just a bit tedious... and then they would affect a bit more code around there. In esp. the system call declarations and we have a lot of them already ;-). So long... ^ permalink raw reply [flat|nested] 59+ messages in thread
* Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 2001-11-12 18:51 ` Martin Dalecki @ 2001-11-12 20:05 ` Martin Dalecki 2001-11-12 20:13 ` BUG BUG hunt the bugs!!! patch-2.4.15-pre5 Martin Dalecki 0 siblings, 1 reply; 59+ messages in thread From: Martin Dalecki @ 2001-11-12 20:05 UTC (permalink / raw) Cc: Linus Torvalds, Alan Cox, linux-kernel Hello out there! Doing a X-patch between, ehmm, the pre-patches 2 and 3, I noticed that a call to sa1100_irda_init() will be added in patch-2.4.15-pre3 TWICE. This *may* work, but I think this isn't quite in the intention of the inventor :-). So Linus/Alan please watch out... It's in the file linux/net/irda/irda_device.c: The following will be twice there after pre3 #ifdef CONFIG_SA1100_FIR sa1100_irda_init() #endif ^ permalink raw reply [flat|nested] 59+ messages in thread
* BUG BUG hunt the bugs!!! patch-2.4.15-pre5 2001-11-12 20:05 ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki @ 2001-11-12 20:13 ` Martin Dalecki 0 siblings, 0 replies; 59+ messages in thread From: Martin Dalecki @ 2001-11-12 20:13 UTC (permalink / raw) Cc: Linus Torvalds, Alan Cox, linux-kernel Hallo out there! Same symptom from patch-2.4.15-pre4: diff -u --recursive --new-file v2.4.14/linux/net/irda/irda_device.c linux/net/irda/irda_device.c --- v2.4.14/linux/net/irda/irda_device.c Sun Sep 23 11:41:02 2001 +++ linux/net/irda/irda_device.c Sun Nov 11 10:20:21 2001 bla bla bla... @@ -124,6 +127,12 @@ #ifdef CONFIG_WINBOND_FIR w83977af_init(); #endif +#ifdef CONFIG_SA1100_FIR + sa1100_irda_init(); +#endif +#ifdef CONFIG_SA1100_FIR + sa1100_irda_init(); +#endif #ifdef CONFIG_NSC_FIR nsc_ircc_init(); #endif @@ -151,6 +160,12 @@ #ifdef CONFIG_OLD_BELKIN old_belkin_init(); #endif +#ifdef CONFIG_EP7211_IR + ep7211_ir_init(); +#endif +#ifdef CONFIG_EP7211_IR + ep7211_ir_init(); +#endif return 0; You see the initialization done twice! ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 7:18 Using %cr2 to reference "current" H. Peter Anvin 2001-11-06 8:01 ` Robert Love 2001-11-06 10:58 ` Alan Cox @ 2001-11-06 17:02 ` Linus Torvalds 2001-11-06 17:13 ` Benjamin LaHaise 2 siblings, 1 reply; 59+ messages in thread From: Linus Torvalds @ 2001-11-06 17:02 UTC (permalink / raw) To: linux-kernel In article <9s82rl$k51$1@cesium.transmeta.com>, H. Peter Anvin <hpa@zytor.com> wrote: > >Is using %cr2 really faster than the old implementation, or is there >another reason? It seems that the alignment constraints on the stack >still remains, since the %esp solution still remains in places... I think the _real_ issue with that patch is that %cr2 is by no means architecturally even guaranteed to work the way the patches want it to work. It's simply not a general-purpose register, and I don't see why it is assumed to be (a) fast (b) stable and (c) writable. I could well imagine a x86-compatible chip where %cr2 isn't even writable. In fact, reading the intel documentation, I see _nowhere_ a mention of %cr2 being writable at all - it all just says "contains the fault address". Similarly, there is _nothing_ that guarantees that the low bits of %cr2 are meaningful, writable, or even implemented. Which means that the whole approach is just depending on undocumented implementation behaviour. That's asking for trouble. Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds @ 2001-11-06 17:13 ` Benjamin LaHaise 2001-11-06 17:49 ` Linus Torvalds 0 siblings, 1 reply; 59+ messages in thread From: Benjamin LaHaise @ 2001-11-06 17:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Tue, Nov 06, 2001 at 05:02:32PM +0000, Linus Torvalds wrote: > Which means that the whole approach is just depending on undocumented > implementation behaviour. That's asking for trouble. NetWare uses it and has for a long time. -ben ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:13 ` Benjamin LaHaise @ 2001-11-06 17:49 ` Linus Torvalds 2001-11-06 18:19 ` Alan Cox 2001-11-06 18:42 ` Benjamin LaHaise 0 siblings, 2 replies; 59+ messages in thread From: Linus Torvalds @ 2001-11-06 17:49 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: linux-kernel On Tue, 6 Nov 2001, Benjamin LaHaise wrote: > > On Tue, Nov 06, 2001 at 05:02:32PM +0000, Linus Torvalds wrote: > > Which means that the whole approach is just depending on undocumented > > implementation behaviour. That's asking for trouble. > > NetWare uses it and has for a long time. Does anybody know if WNT uses it? Quite frankly, I don't see Intel worrying over-much about NetWare compatibility. They've broken small OS's before (ie older versions of SCO Xenix wouldn't boot on a Pentium MMU because of some changes to error reporting, if I remember correctly). That said, how expensive is loading %cr2 anyway? We can do all the same tricks with a 16kB stack and just playing games with using the higher bits as the "offset", ie things like /* Return "current" in %eax, trash %edx */ do_get_current: movl $0x0003c000,%eax // 4 bits at bit 14 movl $-16384,%edx // remove low 14 bits andl $esp,%eax andl $esp,%edx shrl $7,%eax // color it by 128 bytes addl %edx,%eax ret which is going to be ~5 cycles _without_ doing anything that is undocumented (add a push/pop to not trash a register, that might be worthwhile - it makes the function marginally slower but might make callers happier). Oh, and call using inline assembly, not a C call (so that gcc can take advantage of better calling convention, and not think memory is trashed etc). So static inline struct task_struct *get_current(void) { struct task_struct *tsk; asm("call do_get_current":"=a" (tsk)::"dx"); return tsk; } See? You don't have to play games with control registers. (actually, entry.S seems to want the return value in %ebx, so change to taste. Or you could have two different versions of the thing, or even inline it for any place where that makes sense). The above also allows you to keep fork with just one allocation, and makes the stack larger (we steal 2kB for the coloring, but we'd use an order-2 allocation that at least SGI wants to do regardless). The 2kB is, of course, tunable. The above is with a 128-byte cacheline and 16 colors - that may be overkill. 32-byte increents with 32 colors might be more appropriate (I don't know what the effect of the P4 half-cacheline thing is, I don't know if the CPU can have just a 64-byte block coherent, or what.. But a 32-byte color is fine for _most_ CPU's). The 32-byte by 32-color thing would just change the bitmasks to 0x0007c000 and the shift to 9 (bit 14+ shifted down to bit 5+). Note that there are lots of advantages to using simple regular instructions over using "special" instructions like "move from control register". Historically, the special instructions tend to always become slower, while the regular instructions become faster. I would not be surprised if "mov %cr2,%reg" will break a netburst trace cache entity, or even cause microcode to be executed. While I _guarantee_ that all future Intel CPU's will continue to be fast at mixtures of simple arithmetic operations like "add" and "and". (And I bet that the likelyhood of Intel speeding up shifts in the next P4 derivative is a _lot_ higher than Intel speeding up "mov %cr2,xx"..) Linus ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:49 ` Linus Torvalds @ 2001-11-06 18:19 ` Alan Cox 2001-11-09 21:52 ` Jamie Lokier 2001-11-06 18:42 ` Benjamin LaHaise 1 sibling, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-06 18:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Benjamin LaHaise, linux-kernel > That said, how expensive is loading %cr2 anyway? We can do all the same > tricks with a 16kB stack and just playing games with using the higher bits > as the "offset", ie things like So thats another 600K on my box vanished. I suspect the page faults will outweigh it > the stack larger (we steal 2kB for the coloring, but we'd use an order-2 > allocation that at least SGI wants to do regardless). 16K stack is serious "people who cant program" country. > I would not be surprised if "mov %cr2,%reg" will break a netburst trace > cache entity, or even cause microcode to be executed. While I _guarantee_ > that all future Intel CPU's will continue to be fast at mixtures of simple > arithmetic operations like "add" and "and". True enough, but then we can go to andl %%esp, %0 movl (%%eax), %%eax which doesnt really change the cost much, lets us colour the task structs nicely, and lets us colour the stack somewhat by offseting esp from the base - and all in standard instructions Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:19 ` Alan Cox @ 2001-11-09 21:52 ` Jamie Lokier 0 siblings, 0 replies; 59+ messages in thread From: Jamie Lokier @ 2001-11-09 21:52 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, Benjamin LaHaise, linux-kernel Alan Cox wrote: > True enough, but then we can go to > > andl %%esp, %0 > movl (%%eax), %%eax > > which doesnt really change the cost much, lets us colour the task structs > nicely, and lets us colour the stack somewhat by offseting esp from the base > - and all in standard instructions A variant lets you put the pointer at the top of the stack, where it can sometimes share a cache line with the freshly pushed context: movl $0x1ffc,%0 orl %esp,%0 movl (%0), %0 This works because GCC keeps the stack aligned to 4 bytes at all times, I believe. Both this simple sequence, and Alan's code, suffer from the problem that the pointer itself is not cache-coloured, but it is a lot better than having the whole context and task state on the same colour. This perhaps be improved using Linus' idea of shifting upper address bits to colour the pointer as well. -- Jamie ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 17:49 ` Linus Torvalds 2001-11-06 18:19 ` Alan Cox @ 2001-11-06 18:42 ` Benjamin LaHaise 2001-11-06 19:09 ` H. Peter Anvin 2001-11-06 19:16 ` Dave Jones 1 sibling, 2 replies; 59+ messages in thread From: Benjamin LaHaise @ 2001-11-06 18:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Tue, Nov 06, 2001 at 09:49:15AM -0800, Linus Torvalds wrote: > That said, how expensive is loading %cr2 anyway? We can do all the same > tricks with a 16kB stack and just playing games with using the higher bits > as the "offset", ie things like Here are some numbers: read cr2 best: 11 av: 11.12 write cr2 cr2 best: 61 av: 64.42 read cr2 best: 11 av: 11.12 write cr2 cr2 best: 61 av: 65.01 read stk best: 10 av: 11.03 write cr2 stk best: 61 av: 64.95 read stk best: 10 av: 11.03 write cr2 stk best: 61 av: 65.23 Which come from insmod of the below two modules. I didn't test writing to the stack register, but I expect it's similarly expensive as it affects the call return stack and other behind the scenes dependancies. Suffice it to say that reading %cr2 is essentially free on my box (athlon mp). Maybe we should use it as a pointer into a per-cpu area to avoid writing it? -ben ----teststk_k.c---- #define USE_STK 1 #include "testcr2_k.c" ----testcr2_k.c---- #include <linux/module.h> #include <linux/kernel.h> #include <asm/errno.h> #include <linux/init.h> static inline long long rdtsc(void) { unsigned int low,high; __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high)); return low + (((long long)high)<<32); } long dummy; long doit(void) { long long start, end; long val; start = rdtsc(); #ifdef USE_STK #define WHICH "stk" __asm__ __volatile__( "movl $0x0003c000,%%eax \n" // 4 bits at bit 14 "movl $-16384,%%edx \n" // remove low 14 bits "andl %%esp,%%eax \n" "andl %%esp,%%edx \n" "shrl $7,%%eax \n" // color it by 128 bytes "addl %%edx,%%eax \n" : "=a" (val) :: "edx"); #else #define WHICH "cr2" __asm__ __volatile__("movl %%cr2,%0" : "=r" (val)); #endif val += 100; dummy = val; end = rdtsc(); return end - start; } long doit2(void) { long long start, end; long val; start = rdtsc(); val = dummy; __asm__ __volatile__("movl %0,%%cr2" : "=r" (val)); end = rdtsc(); return end - start; } int test_init (void) { long min = 1000000000, av = 0; int i; for (i=0; i<100; i++) { long dur = doit(); if (dur < min) min = dur; av += dur; } printk("read " WHICH " best: %ld av: %ld.%02ld\n", min, av / 100, av % 100); min = 10000000; av = 0; for (i=0; i<100; i++) { long dur = doit2(); if (dur < min) min = dur; av += dur; } printk("write cr2 " WHICH " best: %ld av: %ld.%02ld\n", min, av / 100, av % 100); return -ENODEV; } void test_exit(void) { return; } module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); ---snip--- ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:42 ` Benjamin LaHaise @ 2001-11-06 19:09 ` H. Peter Anvin 2001-11-06 19:16 ` Dave Jones 1 sibling, 0 replies; 59+ messages in thread From: H. Peter Anvin @ 2001-11-06 19:09 UTC (permalink / raw) To: linux-kernel Followup to: <20011106134234.A27718@redhat.com> By author: Benjamin LaHaise <bcrl@redhat.com> In newsgroup: linux.dev.kernel > > On Tue, Nov 06, 2001 at 09:49:15AM -0800, Linus Torvalds wrote: > > That said, how expensive is loading %cr2 anyway? We can do all the same > > tricks with a 16kB stack and just playing games with using the higher bits > > as the "offset", ie things like > > Here are some numbers: > > read cr2 best: 11 av: 11.12 > write cr2 cr2 best: 61 av: 64.42 > read cr2 best: 11 av: 11.12 > write cr2 cr2 best: 61 av: 65.01 > read stk best: 10 av: 11.03 > write cr2 stk best: 61 av: 64.95 > read stk best: 10 av: 11.03 > write cr2 stk best: 61 av: 65.23 > > Which come from insmod of the below two modules. I didn't test writing to > the stack register, but I expect it's similarly expensive as it affects the > call return stack and other behind the scenes dependancies. Suffice it to > say that reading %cr2 is essentially free on my box (athlon mp). Maybe > we should use it as a pointer into a per-cpu area to avoid writing it? > You still have to write it every time you take a page fault. You're adding 60-odd cycles to the page fault path at least. Not to mention any system which does microcoded reads of %cr2, which apparently the Athlon XP doesn't. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt <amsp@zytor.com> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 18:42 ` Benjamin LaHaise 2001-11-06 19:09 ` H. Peter Anvin @ 2001-11-06 19:16 ` Dave Jones 2001-11-06 20:10 ` Ricky Beam 2001-11-06 23:09 ` Alan Cox 1 sibling, 2 replies; 59+ messages in thread From: Dave Jones @ 2001-11-06 19:16 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Linus Torvalds, linux-kernel On Tue, 6 Nov 2001, Benjamin LaHaise wrote: > Here are some numbers: > Which come from insmod of the below two modules. I didn't test writing to > the stack register, but I expect it's similarly expensive as it affects the > call return stack and other behind the scenes dependancies. Suffice it to > say that reading %cr2 is essentially free on my box (athlon mp). Maybe > we should use it as a pointer into a per-cpu area to avoid writing it? If this is done, it should perhaps be done on only on certain x86s, as some show the results go the other way. For example, the Cyrix III.. read stk best: 42 av: 42.60 read cr2 best: 61 av: 61.28 regards, Dave. -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 19:16 ` Dave Jones @ 2001-11-06 20:10 ` Ricky Beam 2001-11-06 23:09 ` Alan Cox 1 sibling, 0 replies; 59+ messages in thread From: Ricky Beam @ 2001-11-06 20:10 UTC (permalink / raw) To: Dave Jones; +Cc: Benjamin LaHaise, Linux Kernel Mail List On Tue, 6 Nov 2001, Dave Jones wrote: >If this is done, it should perhaps be done on only on certain x86s, >as some show the results go the other way. For example, the Cyrix III.. And for some (P150) it makes no difference... read cr2 best: 25 av: 27.09 write cr2 cr2 best: 32 av: 34.39 read stk best: 26 av: 28.22 write cr2 stk best: 32 av: 33.04 --Ricky ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 19:16 ` Dave Jones 2001-11-06 20:10 ` Ricky Beam @ 2001-11-06 23:09 ` Alan Cox 2001-11-06 23:15 ` Dave Jones 1 sibling, 1 reply; 59+ messages in thread From: Alan Cox @ 2001-11-06 23:09 UTC (permalink / raw) To: Dave Jones; +Cc: Benjamin LaHaise, Linus Torvalds, linux-kernel > If this is done, it should perhaps be done on only on certain x86s, > as some show the results go the other way. For example, the Cyrix III.. > > read stk best: 42 av: 42.60 > read cr2 best: 61 av: 61.28 Do we have many SMP Cyrix III's ? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Using %cr2 to reference "current" 2001-11-06 23:09 ` Alan Cox @ 2001-11-06 23:15 ` Dave Jones 0 siblings, 0 replies; 59+ messages in thread From: Dave Jones @ 2001-11-06 23:15 UTC (permalink / raw) To: Alan Cox; +Cc: Benjamin LaHaise, Linus Torvalds, linux-kernel On Tue, 6 Nov 2001, Alan Cox wrote: > > If this is done, it should perhaps be done on only on certain x86s, > > as some show the results go the other way. For example, the Cyrix III.. > Do we have many SMP Cyrix III's ? I wish :) Today no, tomorrow only VIA knows. I just used that as an example that it may not be a win everywhere. A better example perhaps was the P5 case Ricky posted, which as you know, are seen in the real world in SMP. regards, Dave. -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs ^ permalink raw reply [flat|nested] 59+ messages in thread
end of thread, other threads:[~2001-11-13 17:31 UTC | newest] Thread overview: 59+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-11-06 7:18 Using %cr2 to reference "current" H. Peter Anvin 2001-11-06 8:01 ` Robert Love 2001-11-06 10:55 ` Alan Cox 2001-11-06 17:31 ` Michael Barabanov 2001-11-06 14:14 ` Manfred Spraul 2001-11-06 10:58 ` Alan Cox 2001-11-06 17:04 ` Linus Torvalds 2001-11-06 17:46 ` Alan Cox 2001-11-06 17:59 ` Linus Torvalds 2001-11-06 18:14 ` Alan Cox 2001-11-06 16:55 ` Marcelo Tosatti 2001-11-06 18:14 ` Linus Torvalds 2001-11-06 18:31 ` Alan Cox 2001-11-06 22:38 ` Linus Torvalds 2001-11-07 0:00 ` Martin Dalecki 2001-11-06 23:19 ` Alan Cox 2001-11-07 0:43 ` Martin Dalecki 2001-11-07 0:27 ` Alan Cox 2001-11-07 0:35 ` Jeff Garzik 2001-11-07 14:00 ` Martin Dalecki 2001-11-07 13:38 ` Alan Cox 2001-11-07 14:59 ` Martin Dalecki 2001-11-07 14:17 ` Alan Cox 2001-11-07 14:34 ` Dirk Moerenhout 2001-11-07 14:54 ` Alan Cox 2001-11-07 15:32 ` David Howells 2001-11-07 14:39 ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl 2001-11-07 22:05 ` lists 2001-11-07 15:36 ` Using %cr2 to reference "current" Martin Dalecki 2001-11-08 14:08 ` Martin Dalecki 2001-11-13 16:49 ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki 2001-11-13 16:21 ` Russell King 2001-11-13 17:37 ` Martin Dalecki 2001-11-13 16:53 ` Russell King 2001-11-13 18:05 ` Martin Dalecki 2001-11-13 17:11 ` Alan Cox 2001-11-13 18:23 ` Martin Dalecki 2001-11-07 20:04 ` Using %cr2 to reference "current" Andrew Morton 2001-11-11 13:16 ` Martin Dalecki 2001-11-11 13:06 ` Keith Owens 2001-11-12 11:28 ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki 2001-11-12 16:10 ` Keith Owens 2001-11-12 16:25 ` Christoph Hellwig 2001-11-12 17:56 ` Martin Dalecki 2001-11-12 16:42 ` Linus Torvalds 2001-11-12 18:51 ` Martin Dalecki 2001-11-12 20:05 ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki 2001-11-12 20:13 ` BUG BUG hunt the bugs!!! patch-2.4.15-pre5 Martin Dalecki 2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds 2001-11-06 17:13 ` Benjamin LaHaise 2001-11-06 17:49 ` Linus Torvalds 2001-11-06 18:19 ` Alan Cox 2001-11-09 21:52 ` Jamie Lokier 2001-11-06 18:42 ` Benjamin LaHaise 2001-11-06 19:09 ` H. Peter Anvin 2001-11-06 19:16 ` Dave Jones 2001-11-06 20:10 ` Ricky Beam 2001-11-06 23:09 ` Alan Cox 2001-11-06 23:15 ` Dave Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox