linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Intel P6 vs P7 system call performance
@ 2002-12-18 12:55 Terje Eggestad
  2002-12-18 20:14 ` H. Peter Anvin
  0 siblings, 1 reply; 268+ messages in thread
From: Terje Eggestad @ 2002-12-18 12:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel, hpa


what about:

int (*_vsyscall) (int, ...);
_vsyscall = mmap(NULL, getpagesize(),  PROT_READ|PROT_EXEC,
MAP_VSYSCALL, , ); 

or if you're afraid of running out of MAP_* flags:

fd = open("/dev/vsyscall", );
_vsyscall = mmap(NULL, getpagesize(),  PROT_READ|PROT_EXEC, MAP_SHARED,
fd, 0);

Then you can leisurely map it in just after the programs text segment. 

TJ


On tir, 2002-12-17 at 18:55, Linus Torvalds wrote: 
> On Tue, 17 Dec 2002, Matti Aarnio wrote:
> >
> > On Tue, Dec 17, 2002 at 09:07:21AM -0800, Linus Torvalds wrote:
> > > On Tue, 17 Dec 2002, Hugh Dickins wrote:
> > > > I thought that last page was intentionally left invalid?
> > >
> > > It was. But I thought it made sense to use, as it's the only really
> > > "special" page.
> >
> >   In couple of occasions I have caught myself from pre-decrementing
> >   a char pointer which "just happened" to be NULL.
> >
> >   Please keep the last page, as well as a few of the first pages as
> >   NULL-pointer poisons.
> 
> I think I have a good clean solution to this, that not only avoids the
> need for any hard-coded address _at_all_, but also solves Uli's problem
> quite cleanly.
> 
> Uli, how about I just add one ne warchitecture-specific ELF AT flag, which
> is the "base of sysinfo page". Right now that page is all zeroes except
> for the system call trampoline at the beginning, but we might want to add
> other system information to the page in the future (it is readable, after
> all).
> 
> So we'd have an AT_SYSINFO entry, that with the current implementation
> would just get the value 0xfffff000. And then the glibc startup code could
> easily be backwards compatible with the suggestion I had in the previous
> email. Since we basically want to do an indirect jump anyway (because of
> the lack of absolute jumps in the instruction set), this looks like the
> natural way to do it.
> 
> That also allows the kernel to move around the SYSINFO page at will, and
> even makes it possible to avoid it altogether (ie this will solve the
> inevitable problems with UML - UML just wouldn't set AT_SYSINFO, so user
> level just wouldn't even _try_ to use it).
> 
> With that, there's nothing "special" about the vsyscall page, and I'd just
> go back to having the very last page unmapped (and have the vsyscall page
> in some other fixmap location that might even depend on kernel
> configuration).
> 
> Whaddaya think?
> 
> 		Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
_________________________________________________________________________

Terje Eggestad                  mailto:terje.eggestad@scali.no
Scali Scalable Linux Systems    http://www.scali.com

Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
N-0619 Oslo                     fax:    +47 22 62 89 51
NORWAY            
_________________________________________________________________________


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2003-01-10 18:08 Gabriel Paubert
  0 siblings, 0 replies; 268+ messages in thread
From: Gabriel Paubert @ 2003-01-10 18:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Jamie Lokier, Ulrich Drepper, davej, linux-kernel

Linus Torvalds wrote:
> It shouldn't matter.
>
> NT is only tested by "iret", and if somebody sets NT in user space they
> get exactly what they deserve.

Indeed. I realized after I sent the previous mail that I had missed the
flags save/restore in switch_to :-(

Still, does this mean that there is some micro optimization opportunity in
the lcall7/lcall27 handlers to remove the popfl? After all TF is now
handled by some magic in do_debug unless I miss (again) something,
NT has become irrelevant, and cld in SAVE_ALL takes care of DF.

In short something like the following (I just love patches which only
remove code):

===== entry.S 1.51 vs edited =====
--- 1.51/arch/i386/kernel/entry.S	Mon Jan  6 04:54:58 2003
+++ edited/entry.S	Fri Jan 10 18:57:42 2003
@@ -156,16 +156,6 @@
 	movl %edx,EIP(%ebp)	# Now we move them to their "normal" places
 	movl %ecx,CS(%ebp)	#

-	#
-	# Call gates don't clear TF and NT in eflags like
-	# traps do, so we need to do it ourselves.
-	# %eax already contains eflags (but it may have
-	# DF set, clear that also)
-	#
-	andl $~(DF_MASK | TF_MASK | NT_MASK),%eax
-	pushl %eax
-	popfl
-
 	andl $-8192, %ebp	# GET_THREAD_INFO
 	movl TI_EXEC_DOMAIN(%ebp), %edx	# Get the execution domain
 	call *4(%edx)		# Call the lcall7 handler for the domain


>>For example, set NT and then execute sysenter with garbage in %eax, the
>>kernel will try to return (-ENOSYS) with iret and kill the task. As long
>>as it only allows a task to kill itself, it's not a big deal. But NT is
>>not cleared across task switches unless I miss something, and that looks
>>very dangerous.
>
>
> It _is_ cleared by task-switching these days. Or rather, it's saved and
> restored, so the original NT setter will get it restored when resumed.

Yeah, sorry for the noise.

>
>
>>I'm no Ingo, unfortunately, but you'll need at least the following patch
>>(the second hunk is only a typo fix) to the iret exception recovery code,
>>which used push and pops to get the smallest possible code size.
>
>
> Good job.

That was too easy since I did originally suggest the push/pop sequence :-)

	Gabriel.




^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-30 13:06 Manfred Spraul
  2002-12-30 14:54 ` Andi Kleen
  0 siblings, 1 reply; 268+ messages in thread
From: Manfred Spraul @ 2002-12-30 13:06 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

DaveJ wrote:

>On Sat, Dec 28, 2002 at 10:37:06PM +0200, Ville Herva wrote:
>
> > > SYSCALL is AMD.  SYSENTER is Intel, and is likely to be significantly
> > Now that Linus has killed the dragon and everybody seems happy with the
> > shiny new SYSENTER code, let just add one more stupid question to this
> > thread: has anyone made benchmarks on SYSCALL/SYSENTER/INT80 on Athlon? Is
> > SYSCALL worth doing separately for Athlon (and perhaps Hammer/32-bit mode)?
>
>Its something I wondered about too. Even if it isn't a win for K7,
>it's possible that the K6 family may benefit from SYSCALL support.
>Maybe even the K5 if it was around that early ? (too lazy to check pdf's)
>  
>

I looked at SYSCALL once, and noticed some problems:

- it doesn't even load ESP with a kernel value, a task gate for NMI is 
mandatory.
- SMP support is only possible with a per-cpu entry point with 
(boot-time) fixups to the address where the entry point can find the 
kernel stack.
- The AMD docs contain one odd sentence:
"The CS and SS registers must not be modified by the operating system 
between the execution of the SYSCALL and the corresponding SYSRET 
instruction".
Is SYSCALL+iretd permitted? That's needed for execve, iopl, task 
switches, signal delivery.
What about interrupts during SYSCALLs? NMI to taskgate?

Either that sentence is just wrong, or SYSCALL is unusable.

It's not supported by the K5 cpu:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/20734.pdf

--
    Manfred


^ permalink raw reply	[flat|nested] 268+ messages in thread
* RE: Intel P6 vs P7 system call performance
@ 2002-12-22 15:45 Nakajima, Jun
  0 siblings, 0 replies; 268+ messages in thread
From: Nakajima, Jun @ 2002-12-22 15:45 UTC (permalink / raw)
  To: Mikael Pettersson, mingo, torvalds; +Cc: drepper, linux-kernel

Correct. Please look at Table B-1. Most of MSRs are shared, but some MSRs are unique in each logical processor, to provide the x86 architectural state. Those SYSENTER MSRs, and Machine Check register save state (IA32_MCG_XXX), for example, are unique.

Jun

> -----Original Message-----
> From: Mikael Pettersson [mailto:mikpe@csd.uu.se]
> Sent: Sunday, December 22, 2002 4:34 AM
> To: mingo@elte.hu; torvalds@transmeta.com
> Cc: drepper@redhat.com; Nakajima, Jun; linux-kernel@vger.kernel.org
> Subject: Re: Intel P6 vs P7 system call performance
> 
> On Sun, 22 Dec 2002 11:23:08 +0100 (CET), Ingo Molnar wrote:
> >while reviewing the sysenter trampoline code i started wondering about
> the
> >HT case. Dont HT boxes share the MSRs between logical CPUs? This pretty
> >much breaks the concept of per-logical-CPU sysenter trampolines. It also
> >makes context-switch time sysenter MSR writing impossible, so i really
> >hope this is not the case.
> 
> Some MSRs are shared, some aren't. One must always check this in
> the IA32 Volume 3 manual. The three SYSENTER MSRs are not shared.
> 
> However, no-one has yet proven that writing to these in the context
> switch path has acceptable performance -- remember, there is _no_
> a priori reason to assume _anything_ about performance on P4s,
> you really do need to measure things before taking design decisions.
> 
> Manfred had a version with fixed MSR values and the varying data
> in memory. Maybe that's actually faster.

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-22 12:33 Mikael Pettersson
  2002-12-22 16:00 ` Jamie Lokier
  0 siblings, 1 reply; 268+ messages in thread
From: Mikael Pettersson @ 2002-12-22 12:33 UTC (permalink / raw)
  To: mingo, torvalds; +Cc: drepper, jun.nakajima, linux-kernel

On Sun, 22 Dec 2002 11:23:08 +0100 (CET), Ingo Molnar wrote:
>while reviewing the sysenter trampoline code i started wondering about the
>HT case. Dont HT boxes share the MSRs between logical CPUs? This pretty
>much breaks the concept of per-logical-CPU sysenter trampolines. It also
>makes context-switch time sysenter MSR writing impossible, so i really
>hope this is not the case.

Some MSRs are shared, some aren't. One must always check this in
the IA32 Volume 3 manual. The three SYSENTER MSRs are not shared.

However, no-one has yet proven that writing to these in the context
switch path has acceptable performance -- remember, there is _no_
a priori reason to assume _anything_ about performance on P4s,
you really do need to measure things before taking design decisions.

Manfred had a version with fixed MSR values and the varying data
in memory. Maybe that's actually faster.

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 18:46 billyrose
  0 siblings, 0 replies; 268+ messages in thread
From: billyrose @ 2002-12-19 18:46 UTC (permalink / raw)
  To: bart; +Cc: root, linux-kernel


> Not true. A ret(urn) is (sort of) equivalent to 'pop %eip'. The above
> code would actually jump to address 0xfffff000, but probably be slow
> since it confuses the branch prediction.
>
>
>Bart

that being the case, then the original code that Linus put forth:

        pushl $0xfffff000
        call *(%esp)
        add $4,%esp

would be the way to go as it is highly readable. actually, the code at
0xfffff000 could issue a ret $4 and eliminate the add after the call.


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 16:10 billyrose
  0 siblings, 0 replies; 268+ messages in thread
From: billyrose @ 2002-12-19 16:10 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

Richard B. Johnson wrote:

> Because the number pushed onto the stack is a displacement, not
> an address, i.e., -4095. To have the address act as an address,
> you need to load a full-pointer, i.e. SEG:OFFSET (like the old
> 16-bit days). The offset is 32-bits and the segment is whatever
> the kernel has set up for __USER_CS (0x23). All the 'near' calls
> are calls to a signed displacement, same for jumps.

call's and jmp's use displacement, ret's are _always_ absolute.

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 15:20 bart
  0 siblings, 0 replies; 268+ messages in thread
From: bart @ 2002-12-19 15:20 UTC (permalink / raw)
  To: root; +Cc: linux-kernel, billyrose

On 19 Dec, Richard B. Johnson wrote:
> On Thu, 19 Dec 2002 billyrose@billyrose.net wrote:

>> long_call:
>>         pushl $0xfffff000
>>         ret
>> 
> 
> Because the number pushed onto the stack is a displacement, not
> an address, i.e., -4095. To have the address act as an address,

Not true. A ret(urn) is (sort of) equivalent to 'pop %eip'. The above
code would actually jump to address 0xfffff000, but probably be slow
since it confuses the branch prediction.

Bart

-- 
Bart Hartgers - TUE Eindhoven 
http://plasimo.phys.tue.nl/bart/contact.html

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 14:57 bart
  0 siblings, 0 replies; 268+ messages in thread
From: bart @ 2002-12-19 14:57 UTC (permalink / raw)
  To: billyrose; +Cc: root, linux-kernel

On 19 Dec, billyrose@billyrose.net wrote:
> long_call:
>         pushl $0xfffff000
>         ret
> 

A ret(urn) to an address that wasn't put on the stack by a call
severly confuses the branch prediction on many processors.


-- 
Bart Hartgers - TUE Eindhoven 
http://plasimo.phys.tue.nl/bart/contact.html

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 14:40 billyrose
  2002-12-19 15:11 ` Richard B. Johnson
  0 siblings, 1 reply; 268+ messages in thread
From: billyrose @ 2002-12-19 14:40 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

Richard B. Johnson wrote:

> The target, i.e., the label 'goto' would be the reserved page for the
> system call. The whole purpose was to minimize the number of CPU cycles
> necessary to call 0xfffff000 and return. The system call does not have
> issue a 'far' return, it can do anything it requires. The page at
> 0xfffff000 is mapped into every process and is in that process CS space
> already.

that being the case, why push %cs and reload it without reason as the
code is mapped into every process?

therefore, would it not suffice to use:

        ...
        long_call(); //call to $0xfffff000 via near ret
        //code at $0xfffff000 returns directly here when a ret is issued
        ...

long_call:
        pushl $0xfffff000
        ret


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 13:55 bart
  2002-12-19 19:37 ` Linus Torvalds
  0 siblings, 1 reply; 268+ messages in thread
From: bart @ 2002-12-19 13:55 UTC (permalink / raw)
  To: davej
  Cc: torvalds, lk, hpa, terje.eggestad, drepper, matti.aarnio, hugh,
	mingo, linux-kernel

On 19 Dec, Dave Jones wrote:
> On Thu, Dec 19, 2002 at 02:22:36PM +0100, bart@etpmod.phys.tue.nl wrote:
>  > > However, there's another issue, namely process startup cost. I personally 
>  > > want it to be as light as at all possible. I hate doing an "strace" on 
>  > > user processes and seeing tons and tons of crapola showing up. Just for 
>  > So why not map the magic page at 0xffffe000 at some other address as
>  > well? 
>  > Static binaries can just directly jump/call into the magic page.
> 
> .. and explode nicely when you try to run them on an older kernel
> without the new syscall magick. This is what Linus' first
> proof-of-concept code did.


True, but unless I really don't get it, compatibility of a new static
binary with an old kernel is going to break anyway. 
My point was that the double-mapped page trick adds no overhead in the
case of a static binary, and just one extra mmap in case of a shared
binary.

Bart

> 
> 		Dave
> 

-- 
Bart Hartgers - TUE Eindhoven 
http://plasimo.phys.tue.nl/bart/contact.html

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-19 13:22 bart
  2002-12-19 13:38 ` Dave Jones
  2002-12-19 19:29 ` H. Peter Anvin
  0 siblings, 2 replies; 268+ messages in thread
From: bart @ 2002-12-19 13:22 UTC (permalink / raw)
  To: torvalds
  Cc: lk, hpa, terje.eggestad, drepper, matti.aarnio, hugh, davej,
	mingo, linux-kernel

On 18 Dec, Linus Torvalds wrote:
> 
> On Wed, 18 Dec 2002, Jamie Lokier wrote:
>> 
>> That said, you always need the page at 0xfffe0000 mapped anyway, so
>> that sysexit can jump to a fixed address (which is fastest).
> 
> Yes. This is important. There _needs_ to be some fixed address at least as 
> far as the kernel is concerned (it might move around between reboots or 
> something like that, but it needs to be something the kernel knows about 
> intimately and doesn't need lots of dynamic lookup).
> 
> However, there's another issue, namely process startup cost. I personally 
> want it to be as light as at all possible. I hate doing an "strace" on 
> user processes and seeing tons and tons of crapola showing up. Just for 

So why not map the magic page at 0xffffe000 at some other address as
well? 

Static binaries can just directly jump/call into the magic page.

Shared binaries do somekind of mmap("/proc/self/mem") magic to put a
copy of the page at an address that is convenient for them. Shared
binaries have to do a lot of mmap-ing anyway, so the overhead should be
negligible.




-- 
Bart Hartgers - TUE Eindhoven 
http://plasimo.phys.tue.nl/bart/contact.html

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-18 23:51 billyrose
  2002-12-19 13:10 ` Richard B. Johnson
  0 siblings, 1 reply; 268+ messages in thread
From: billyrose @ 2002-12-18 23:51 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

Richard B. Johnson wrote:
> The number of CPU clocks necessary to make the 'far' or
> full-pointer call by pushing the segment register, the offset,
> then issuing a 'lret' is 33 clocks on a Pentium II.
>
> longcall clocks = 46
> call clocks = 13
> actual full-pointer call clocks = 33

this is not correct. the assumed target (of a _far_ call) would issue a far 
return and only an offset would be left on the stack to return to (oops). the 
code segment of the orginal caller needs pushed to create the seg:off pair and 
hence a far return would land back at the original calling routine. this is a 
very convoluted method of making the orginal call being far, as simply calling 
far in the first pace should issue much faster. OTOH, if you are making a 
workaround to an already existing piece of code, this works beautifully (with 
the additional seg pushed on the stack).

b.

^ permalink raw reply	[flat|nested] 268+ messages in thread
* RE: Intel P6 vs P7 system call performance
@ 2002-12-18  1:30 Nakajima, Jun
  2002-12-18  1:54 ` Ulrich Drepper
  0 siblings, 1 reply; 268+ messages in thread
From: Nakajima, Jun @ 2002-12-18  1:30 UTC (permalink / raw)
  To: Ulrich Drepper, Linus Torvalds
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

AMD (at least Athlon, as far as I know) supports sysenter/sysexit. We tested it on an Athlon box as well, and it worked fine. And sysenter/sysexit was better than int/iret too (about 40% faster) there. 

Jun

> -----Original Message-----
> From: Ulrich Drepper [mailto:drepper@redhat.com]
> Sent: Tuesday, December 17, 2002 11:19 AM
> To: Linus Torvalds
> Cc: Matti Aarnio; Hugh Dickins; Dave Jones; Ingo Molnar; linux-
> kernel@vger.kernel.org; hpa@transmeta.com
> Subject: Re: Intel P6 vs P7 system call performance
> 
> Linus Torvalds wrote:
> 
> > In the meantime, I do agree with you that the TLS approach should work
> > too, and might be better. It will allow all six arguments to be used if
> we
> > just find a good calling conventions
> 
> If you push out the AT_* patch I'll hack the glibc bits (probably the
> TLS variant).  Won't take too  long, you'll get results this afternoon.
> 
> What about AMD's instruction?  Is it as flawed as sysenter?  If not and
> %ebp is available I really should use the TLS method.
> 
> --
> --------------.                        ,-.            444 Castro Street
> Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
> Red Hat         `--' drepper at redhat.com `---------------------------
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-17 16:32 Manfred Spraul
  2002-12-17 17:13 ` Richard B. Johnson
  0 siblings, 1 reply; 268+ messages in thread
From: Manfred Spraul @ 2002-12-17 16:32 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

>
>
>   pushl %ebp
>   movl $0xfffff000, %ebp
>   call *%ebp
>   popl %ebp
>  
>

You could avoid clobbering a register with something like

pushl $0xfffff000
call *(%esp)
addl %esp,4

--
    Manfred


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-17 16:14 John Reiser
  0 siblings, 0 replies; 268+ messages in thread
From: John Reiser @ 2002-12-17 16:14 UTC (permalink / raw)
  To: linux-kernel

Ulrich Drepper wrote:
[snip]
  >    pushl %ebp
  >    movl $0xfffff000, %ebp
  >    call *%ebp
  >    popl %ebp

This does not work for mmap64 [syscall 192], which passes a parameter in %ebp.

-- 
John Reiser, jreiser@BitWagon.com


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-17 16:01 John Reiser
  0 siblings, 0 replies; 268+ messages in thread
From: John Reiser @ 2002-12-17 16:01 UTC (permalink / raw)
  To: linux-kernel

On Mon, 16 Dec 2002, Linus Torvalds wrote [regarding vsyscall implementation]:
 > The good news is that the kernel part really looks pretty clean.

Where is the CPU serializing instruction which must be executed before return
to user mode, so that kernel accesses to hardware devices are guaranteed to
complete before any subsequent user access begins?  (Otherwise a read/write
by the user to a memory-mapped device page can appear out-of-order with respect
to the kernel accesses in a preceding syscall.)  The only generally useful
serializing instructions are IRET and CPUID; only IRET is implemented univerally.

-- 
John Reiser, jreiser@BitWagon.com


^ permalink raw reply	[flat|nested] 268+ messages in thread
[parent not found: <20021209193649.GC10316@suse.de.suse.lists.linux.kernel>]
* Re: Intel P6 vs P7 system call performance
@ 2002-12-15  8:43 scott thomason
  0 siblings, 0 replies; 268+ messages in thread
From: scott thomason @ 2002-12-15  8:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Saturday 14 December 2002 11:48 am, Mike Dresser wrote:
> On Sat, 14 Dec 2002, Dave Jones wrote:
> > Note that there are more factors at play than raw cpu speed in a
> > kernel compile. Your time here is slightly faster than my 2.8Ghz
> > P4-HT for example.  My guess is you have faster disk(s) than I
> > do, as most of the time mine seems to be waiting for something to
> > do.
>
> Quantum Fireball AS's in that machine.  My main comment was that
> his Althon MP at 1.8 was half or less the speed of a single P4.
> Even with compiler changes, I wouldn't think it would make THAT
> much of a difference?

I've been doing a lot of benchmarking with "contest" lately, and one
thing I can state emphatically is that the kernel that you are
running while performing a compile can be a large factor, especially
if you are maxing out the machine with a large "make -jN". Some
kernel versions vary enormously in their ability to handle I/O load
(an area I've been paying close attention to). Sounds like you have
some decent SMP hardware, and probably a good chunk of memory to go
with it, so you might want to experiment with these kernels, which
have given good I/O performance in my tests:

linux-2.4.19-rmap14c
linux-2.4.19-rmap15a
linux-2.4.18-rml-O1 (slow at creating tarballs, fast everwhere else)

And if you you don't mind bleeding edge, just go with a more recent
2.5 kernel that you can make work. You simply can't get comparable
performance out of 2.4.

I've attached some contest numbers for tests I've run to-date below. 
Please note that while I use contest as the benchmarking tool, I use 
qmail compiles as the actual load, not kernel compiles (I don't have 
the patience--qmail compiles take about 35-40% the time as a kernel 
compile. Now if we can get Con to work on speeding up "Killing the 
the load process..." <g>).
---scott

sorry for the html table to text pasting conversion :(

noload
process_load
ctar_load
xtar_load
read_load
list_load
mem_load

linux-2.4.18
16.73
22.61
244.52
78.84
108.52
18.58
53.12

linux-2.4.18-ac3
19.01
25.64
99.52
94.23
314.29
23.34
119.95

linux-2.4.18-rc1-akpm-low-latency
16.69
21.92
335.62
79.10
122.34
18.39
104.80

linux-2.4.18-rc4-aa1
16.43
93.85
179.12
100.29
46.64
17.15
96.91

linux-2.4.18-rmap12h
18.84
24.72
143.12
95.11
298.85
23.17
121.22

linux-2.4.18-rml-O1
16.83
31.42
266.28
77.98
77.15
18.18
63.03

linux-2.4.18-rml-preempt
16.93
21.87
334.08
84.22
116.30
18.46
60.30

linux-2.4.18-rml-preempt+lockbreak
16.85
22.42
271.52
74.37
229.96
19.57
45.21

linux-2.4.19
16.99
22.42
261.69
103.61
163.55
18.44
66.16

linux-2.4.19-ac4
19.08
30.32
176.03
89.38
288.53
22.79
102.09

linux-2.4.19-akpm-low-latency
16.90
21.87
230.92
111.37
179.63
18.36
87.47

linux-2.4.19-ck14
-
-
-
-
-
-
176.41

linux-2.4.19-rc5-aa1
18.37
27.18
931.45
154.94
372.73
22.01
125.92

linux-2.4.19-rmap14c
17.84
24.56
74.81
76.73
121.86
20.57
165.10

linux-2.4.19-rmap15
18.27
24.09
71.32
77.05
146.68
18.99
102.56

linux-2.4.19-rmap15-splitactive
17.28
23.09
69.16
79.49
140.15
20.27
129.84

linux-2.4.19-rmap15a
17.10
23.00
62.44
78.12
138.96
18.46
133.32

linux-2.4.19-rml-O1
16.61
25.45
314.24
90.43
124.27
18.32
72.90

linux-2.4.19-rml-preempt
16.88
21.80
238.80
86.46
155.89
18.45
56.74

linux-2.4.20
16.62
21.84
191.12
101.06
100.35
18.22
70.47

linux-2.4.20-aa1
18.23
29.03
331.96
137.70
96.88
22.22
143.22

linux-2.4.20-ac1
20.24
28.41
776.73
138.35
221.55
22.06
171.13

linux-2.4.20-rc2-aa1
18.44
28.39
255.79
156.30
86.78
21.98
139.04

linux-2.5.49
17.66
22.39
36.73
26.85
19.91
20.29
57.34

linux-2.5.50
17.80
24.19
32.81
25.87
21.43
21.17
45.96

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-15  4:06 Albert D. Cahalan
  2002-12-15 22:01 ` Pavel Machek
  0 siblings, 1 reply; 268+ messages in thread
From: Albert D. Cahalan @ 2002-12-15  4:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: hpa, terje.eggestad


H. Peter Anvin writes:

> As far as I know, though, the SYSENTER patch didn't deal with several of
> the corner cases introduced by the generally weird SYSENTER instruction
> (such as the fact that V86 tasks can execute it despite the fact there
> is in general no way to resume execution of the V86 task afterwards.)
>
> In practice this means that vsyscalls is pretty much the only sensible
> way to do this.  Also note that INT 80h will need to be supported
> indefinitely.
>
> Personally, I wonder if it's worth the trouble, when x86-64 takes care
> of the issue anyway :)

There is another way:

Have apps enter kernel mode via Intel's purposely undefined
instruction, plus a few bytes of padding and identification.
Require that this not cross a page boundry. When it faults,
write the SYSENTER, INT 0x80, or SYSCALL as needed. Leave
the page marked clean so it doesn't need to hit swap; if it
gets paged in again it gets patched again.



^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-13 21:52 Margit Schubert-While
  0 siblings, 0 replies; 268+ messages in thread
From: Margit Schubert-While @ 2002-12-13 21:52 UTC (permalink / raw)
  To: linux-kernel

Hmm Apples & Oranges

diff hanoi.c hanoi2.c
17a18
 > void  mov();
51c52
<               mov(disk,1,3);
---
 >               (void)mov(disk,1,3);
58c59
< mov(n,f,t)
---
 > void mov(n,f,t)
67,69c68,70
<       mov(n-1,f,o);
<       mov(1,f,t);
<       mov(n-1,o,t);
---
 >       (void)mov(n-1,f,o);
 >       (void)mov(1,f,t);
 >       (void)mov(n-1,o,t);


cc -O3 -march=i686 -mcpu=i686 -fomit-frame-pointer  hanoi.c -o hanoi
cc -O3 -march=i686 -mcpu=i686 -fomit-frame-pointer  hanoi2.c -o hanoi2
./hanoi 10
536837 loops
./hanoi 10
538709 loops
./hanoi2 10
850127 loops
./hanoi2 10
852651 loops

Huu ?

Margit 


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-13 19:32 Dieter Nützel
  0 siblings, 0 replies; 268+ messages in thread
From: Dieter Nützel @ 2002-12-13 19:32 UTC (permalink / raw)
  To: Margit Schubert-While; +Cc: Linux Kernel List

> Well, in the 2.4.x kernels, the P4 gets compiled as a I686 with NO special
> treatment :-) (Not even prefetch, because of an ifdef bug)
> The P3 at least gets one level of prefetch and the AMD's get special compile
> options(arch=k6,athlon), full prefetch and SSE.
>
> >From Mike Hayward
> >Dual Pentium 4 Xeon 2.4Ghz 2.4.19 kernel 33661.9 lps (10 secs, 6 samples)
>
> Hmm, P4 2.4Ghz , also gcc -O3 -march=i686
>
> margit:/disk03/bytebench-3.1/src # ./hanoi 10
> 576264 loops
> margit:/disk03/bytebench-3.1/src # ./hanoi 10
> 571001 loops
> margit:/disk03/bytebench-3.1/src # ./hanoi 10
> 571133 loops
> margit:/disk03/bytebench-3.1/src # ./hanoi 10
> 570517 loops
> margit:/disk03/bytebench-3.1/src # ./hanoi 10
> 571019 loops
> margit:/disk03/bytebench-3.1/src # ./hanoi 10
> 582688 loops

Apples and oranges? ;-)

dual AMD Athlon MP 1900+, 1.6 GHz
(but single threaded app)
2.4.20-aa1
gcc-2.95.3

unixbench-4.1.0/src> gcc -O -mcpu=k6 -march=i686 -fomit-frame-pointer 
-mpreferred-stack-boundary=2 -malign-functions=4 -o hanoi hanoi.c
unixbench-4.1.0/src> sync
unixbench-4.1.0/src> ./hanoi 10                                                            
565338 loops
unixbench-4.1.0/src> ./hanoi 10
565379 loops
unixbench-4.1.0/src> ./hanoi 10
565448 loops
unixbench-4.1.0/src> ./hanoi 10
565218 loops
unixbench-4.1.0/src> ./hanoi 10
565148 loops
unixbench-4.1.0/src> ./hanoi 10
565136 loops

You should run "./Run hanoi"...

Recursion Test--Tower of Hanoi            58404.5 lps   (19.3 secs, 3 samples)

Regards,
	Dieter
-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel at hamburg.de (replace at with @)

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-13 17:51 Margit Schubert-While
  0 siblings, 0 replies; 268+ messages in thread
From: Margit Schubert-While @ 2002-12-13 17:51 UTC (permalink / raw)
  To: linux-kernel

Well, in the 2.4.x kernels, the P4 gets compiled as a I686 with NO special
treatment :-) (Not even prefetch, because of an ifdef bug)
The P3 at least gets one level of prefetch and the AMD's get special compile
options(arch=k6,athlon), full prefetch and SSE.

 >From Mike Hayward
 >Dual Pentium 4 Xeon 2.4Ghz 2.4.19 kernel 33661.9 lps (10 secs, 6 samples)

Hmm, P4 2.4Ghz , also gcc -O3 -march=i686

margit:/disk03/bytebench-3.1/src # ./hanoi 10
576264 loops
margit:/disk03/bytebench-3.1/src # ./hanoi 10
571001 loops
margit:/disk03/bytebench-3.1/src # ./hanoi 10
571133 loops
margit:/disk03/bytebench-3.1/src # ./hanoi 10
570517 loops
margit:/disk03/bytebench-3.1/src # ./hanoi 10
571019 loops
margit:/disk03/bytebench-3.1/src # ./hanoi 10
582688 loops

Margit 


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-11 12:48 Terje Eggestad
  2002-12-11 18:50 ` H. Peter Anvin
  0 siblings, 1 reply; 268+ messages in thread
From: Terje Eggestad @ 2002-12-11 12:48 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, Dave Jones

It get even worse with Hammer. When you run hammer in compatibility mode
(32 bit app on a 64 bit OS) the sysenter is an illegal instruction.
Since Intel don't implement syscall, there is no portable sys*
instruction for 32 bit apps. You could argue that libc hides it for you
and you just need libc to test the host at startup (do I get a sigill if
I try to do getpid() with sysenter? syscall? if so we uses int80 for
syscalls).  But not all programs are linked dyn.

Too bad really, I tried the sysenter patch once, and the gain (on PIII
and athlon) was significant.

Fortunately the 64bit libc for hammer uses syscall. 


PS:  rdtsc on P4 is also painfully slow!!!

TJ

On man, 2002-12-09 at 20:46, H. Peter Anvin wrote: 
> Followup to:  <20021209193649.GC10316@suse.de>
> By author:    Dave Jones <davej@codemonkey.org.uk>
> In newsgroup: linux.dev.kernel
> >
> > On Mon, Dec 09, 2002 at 05:48:45PM +0000, Linus Torvalds wrote:
> > 
> >  > P4's really suck at system calls.  A 2.8GHz P4 does a simple system call
> >  > a lot _slower_ than a 500MHz PIII. 
> >  > 
> >  > The P4 has problems with some other things too, but the "int + iret"
> >  > instruction combination is absolutely the worst I've seen.  A 1.2GHz
> >  > Athlon will be 5-10 times faster than the fastest P4 on system call
> >  > overhead. 
> > 
> > Time to look into an alternative like SYSCALL perhaps ?
> > 
> 
> SYSCALL is AMD.  SYSENTER is Intel, and is likely to be significantly
> faster.  Unfortunately SYSENTER is also extremely braindamaged, in
> that it destroys *both* the EIP and the ESP beyond recovery, and
> because it's allowed in V86 and 16-bit modes (where it will cause
> permanent data loss) which means that it needs to be able to be turned
> off for things like DOSEMU and WINE to work correctly.
> 
> 	-hpa



-- 
_________________________________________________________________________

Terje Eggestad                  mailto:terje.eggestad@scali.no
Scali Scalable Linux Systems    http://www.scali.com

Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
N-0619 Oslo                     fax:    +47 22 62 89 51
NORWAY            
_________________________________________________________________________


^ permalink raw reply	[flat|nested] 268+ messages in thread
* Intel P6 vs P7 system call performance
@ 2002-12-09  8:30 Mike Hayward
  2002-12-09 15:40 ` erich
                   ` (2 more replies)
  0 siblings, 3 replies; 268+ messages in thread
From: Mike Hayward @ 2002-12-09  8:30 UTC (permalink / raw)
  To: linux-kernel

I have been benchmarking Pentium 4 boxes against my Pentium III laptop
with the exact same kernel and executables as well as custom compiled
kernels.  The Pentium III has a much lower clock rate and I have
noticed that system call performance (and hence io performance) is up
to an order of magnitude higher on my Pentium III laptop.  1k block IO
reads/writes are anemic on the Pentium 4, for example, so I'm trying
to figure out why and thought someone might have an idea.

Notice below that the System Call overhead is much higher on the
Pentium 4 even though the cpu runs more than twice the speed and the
system has DDRAM, a 400 Mhz FSB, etc.  I even get pretty remarkable
syscall/io performance on my Pentium III laptop vs. an otherwise idle
dual Xeon.

See how the performance is nearly opposite of what one would expect:

----------------------------------------------------------------------
basic sys call performance iterated for 10 secs:

        while (1) {
                close(dup(0));
                getpid();
                getuid();
                umask(022);
                iter++;
        }

M-Pentium III 850Mhz Sys Call Rate   433741.8
  Pentium 4     2Ghz Sys Call Rate   233637.8
  Xeon x 2    2.4Ghz Sys Call Rate   207684.2

----------------------------------------------------------------------
1k read sys calls iterated for 10 secs (all buffered reads, no disk):

M-Pentium III 850Mhz File Read      1492961.0 (~149 io/s)
  Pentium 4     2Ghz File Read      1088629.0 (~108 io/s)
  Xeon x 2    2.4Ghz File Read       686892.0 (~ 69 io/s)

Any ideas?  Not sure I want to upgrade to the P7 architecture if this
is right, since for me system calls are probably more important than
raw cpu computational power.

- Mike

--- Mobile Pentium III 850 Mhz ---

  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux flux.loup.net 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown
  Start Benchmark Run: Thu Nov  8 07:55:04 PST 2001
   1 interactive users.
Dhrystone 2 without register variables   1652556.1 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     1513809.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         3770106.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        230897.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           230586.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             230916.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            232229.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           222990.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          224339.4 lps   (10 secs, 6 samples)
System Call Overhead Test                433741.8 lps   (10 secs, 6 samples)
Pipe Throughput Test                     499465.5 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test        229029.2 lps   (10 secs, 6 samples)
Process Creation Test                      8696.6 lps   (10 secs, 6 samples)
Execl Throughput Test                      1089.8 lps   (9 secs, 6 samples)
File Read  (10 seconds)                  1492961.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  157663.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   32516.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                  1507645.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                  161130.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                   20155.0 KBps  (30 secs, 6 samples)
C Compiler Test                             491.2 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               1315.2 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                694.4 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                357.1 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                180.4 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places          46831.0 lpm   (60 secs, 6 samples)
Recursion Test--Tower of Hanoi            20954.1 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   224339.4       88.3
Dhrystone 2 without register variables       22366.3  1652556.1       73.9
Execl Throughput Test                           16.5     1089.8       66.0
File Copy  (30 seconds)                        179.0    20155.0      112.6
Pipe-based Context Switching Test             1318.5   229029.2      173.7
Shell scripts (8 concurrent)                     4.0      180.4       45.1
                                                                 =========
     SUM of  6 items                                                 559.6
     AVERAGE                                                          93.3

--- Desktop Pentium 4 2.0 Ghz w/ 266 Mhz DDR ---

  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux gw2 2.4.19 #1 Mon Dec 9 05:31:23 GMT-7 2002 i686 unknown
  Start Benchmark Run: Mon Dec  9 05:45:47 GMT-7 2002
   1 interactive users.
Dhrystone 2 without register variables   2910759.3 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     2928495.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         9252565.4 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        498894.3 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           473452.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             498956.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            498932.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           451138.8 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          451106.8 lps   (10 secs, 6 samples)
System Call Overhead Test                233637.8 lps   (10 secs, 6 samples)
Pipe Throughput Test                     437441.1 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test        167229.2 lps   (10 secs, 6 samples)
Process Creation Test                      9407.2 lps   (10 secs, 6 samples)
Execl Throughput Test                      2158.8 lps   (10 secs, 6 samples)
File Read  (10 seconds)                  1088629.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  472315.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   10569.0 KBps  (10 secs, 6 samples)
File Read  (120 seconds)                 1089526.0 KBps  (120 secs, 6 samples)
File Write (120 seconds)                 467028.0 KBps  (120 secs, 6 samples)
File Copy  (120 seconds)                   3541.0 KBps  (120 secs, 6 samples)
C Compiler Test                             973.9 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               2590.8 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)               1359.6 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                696.4 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                352.1 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places          99120.4 lpm   (60 secs, 6 samples)
Recursion Test--Tower of Hanoi            44857.5 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   451106.8      177.5
Dhrystone 2 without register variables       22366.3  2910759.3      130.1
Execl Throughput Test                           16.5     2158.8      130.8
File Copy  (120 seconds)                       179.0     3541.0       19.7
Pipe-based Context Switching Test             1318.5   167229.2      126.8
Shell scripts (8 concurrent)                     4.0      352.1       88.0
                                                                 =========
     SUM of  6 items                                                 673.0
     AVERAGE                                                         112.1


--- Pentium 4 Xeon 2.4 Ghz x 2 w/ 2.4.19 ---

  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux brent-xeon 2.4.19-kel #5 SMP Wed Sep 25 03:15:13 GMT 2002 i686 unknown
  Start Benchmark Run: Thu Oct 10 03:48:07 MDT 2002
   0 interactive users.
Dhrystone 2 without register variables   2200821.4 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     2233296.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         7366670.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        399261.4 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           361354.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             364200.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            345292.9 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           539907.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          537355.5 lps   (10 secs, 6 samples)
System Call Overhead Test                207684.2 lps   (10 secs, 6 samples)
Pipe Throughput Test                     283868.3 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test         98205.6 lps   (10 secs, 6 samples)
Process Creation Test                      5395.9 lps   (10 secs, 6 samples)
Execl Throughput Test                      1612.9 lps   (9 secs, 6 samples)
File Read  (10 seconds)                  686892.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  272217.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   56415.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                  681181.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                  272351.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                   20611.0 KBps  (30 secs, 6 samples)
C Compiler Test                             873.5 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               2970.1 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)               1294.2 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                845.2 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                409.2 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places           no measured results
Recursion Test--Tower of Hanoi            33661.9 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   537355.5      211.4
Dhrystone 2 without register variables       22366.3  2200821.4       98.4
Execl Throughput Test                           16.5     1612.9       97.8
File Copy  (30 seconds)                        179.0    20611.0      115.1
Pipe-based Context Switching Test             1318.5    98205.6       74.5
Shell scripts (8 concurrent)                     4.0      409.2      102.3
                                                                 =========
     SUM of  6 items                                                 699.5
     AVERAGE                                                         116.6

^ permalink raw reply	[flat|nested] 268+ messages in thread
* Re: Intel P6 vs P7 system call performance
@ 2002-12-09  7:01 Samium Gromoff
  0 siblings, 0 replies; 268+ messages in thread
From: Samium Gromoff @ 2002-12-09  7:01 UTC (permalink / raw)
  To: linux-kernel


  As of dualie Xeon vs one-way, the possible reason is the SMP overhead,
because single thread can not benefit from multicpuness...

cheers, Samium Gromoff

^ permalink raw reply	[flat|nested] 268+ messages in thread

end of thread, other threads:[~2003-01-10 18:01 UTC | newest]

Thread overview: 268+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-18 12:55 Intel P6 vs P7 system call performance Terje Eggestad
2002-12-18 20:14 ` H. Peter Anvin
2002-12-18 20:25   ` Richard B. Johnson
2002-12-18 20:26     ` H. Peter Anvin
2002-12-18 22:28   ` Jamie Lokier
2002-12-18 22:37     ` Linus Torvalds
2002-12-18 22:57       ` Linus Torvalds
2002-12-20  0:53         ` Daniel Jacobowitz
2002-12-20  1:47           ` Linus Torvalds
2002-12-20  2:37             ` Daniel Jacobowitz
2002-12-18 22:39     ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2003-01-10 18:08 Gabriel Paubert
2002-12-30 13:06 Manfred Spraul
2002-12-30 14:54 ` Andi Kleen
2002-12-22 15:45 Nakajima, Jun
2002-12-22 12:33 Mikael Pettersson
2002-12-22 16:00 ` Jamie Lokier
2002-12-19 18:46 billyrose
2002-12-19 16:10 billyrose
2002-12-19 15:20 bart
2002-12-19 14:57 bart
2002-12-19 14:40 billyrose
2002-12-19 15:11 ` Richard B. Johnson
2002-12-19 13:55 bart
2002-12-19 19:37 ` Linus Torvalds
2002-12-19 22:10   ` Jamie Lokier
2002-12-19 22:16     ` H. Peter Anvin
2002-12-19 22:22     ` Linus Torvalds
2002-12-19 22:26       ` H. Peter Anvin
2002-12-19 22:49         ` Linus Torvalds
2002-12-19 23:30           ` Linus Torvalds
2002-12-22 11:08       ` James H. Cloos Jr.
2002-12-22 18:49         ` Linus Torvalds
2002-12-22 19:07           ` Ulrich Drepper
2002-12-22 19:34             ` Linus Torvalds
2002-12-22 19:51               ` Ulrich Drepper
2002-12-22 20:50                 ` James H. Cloos Jr.
2002-12-22 20:56                   ` Ulrich Drepper
2002-12-22 19:17           ` Ulrich Drepper
2002-12-20 10:08   ` Ulrich Drepper
2002-12-20 12:06     ` Jamie Lokier
2002-12-20 16:47       ` Linus Torvalds
2002-12-20 23:38         ` Jamie Lokier
2002-12-20 23:50           ` H. Peter Anvin
2002-12-21  0:09           ` Linus Torvalds
2002-12-21 17:18             ` Jamie Lokier
2002-12-21 19:39               ` Linus Torvalds
2002-12-22  2:18                 ` Jamie Lokier
2002-12-22  3:11                   ` Linus Torvalds
2002-12-22 10:13                     ` Ingo Molnar
2002-12-22 15:32                       ` Jamie Lokier
2002-12-22 18:53                       ` Linus Torvalds
2002-12-23  5:03                         ` Linus Torvalds
2002-12-23  7:14                           ` Ulrich Drepper
2002-12-23 23:27                           ` Petr Vandrovec
2002-12-24  0:22                             ` Stephen Rothwell
2002-12-24  4:10                               ` Linus Torvalds
2002-12-24  8:05                                 ` Rogier Wolff
2002-12-24 18:51                                   ` Linus Torvalds
2002-12-24 21:10                                     ` Rogier Wolff
2002-12-27 16:14                                 ` Kai Henningsen
2002-12-24 19:36                       ` Linus Torvalds
2002-12-24 20:20                         ` Ingo Molnar
2002-12-24 20:27                           ` Linus Torvalds
2002-12-24 20:31                         ` Ingo Molnar
2002-12-24 20:39                           ` Linus Torvalds
2002-12-28  2:05                             ` H. Peter Anvin
2002-12-28  2:04                           ` H. Peter Anvin
2002-12-26  7:47                         ` Pavel Machek
2003-01-10 11:30                         ` Gabriel Paubert
2003-01-10 17:11                           ` Linus Torvalds
2002-12-22 10:23                     ` Ingo Molnar
2002-12-19 13:22 bart
2002-12-19 13:38 ` Dave Jones
2002-12-19 14:22   ` Jamie Lokier
2002-12-19 16:56     ` Dave Jones
2002-12-19 19:29 ` H. Peter Anvin
2002-12-18 23:51 billyrose
2002-12-19 13:10 ` Richard B. Johnson
2002-12-18  1:30 Nakajima, Jun
2002-12-18  1:54 ` Ulrich Drepper
2002-12-18  3:36   ` H. Peter Anvin
2002-12-18  4:05     ` Linus Torvalds
2002-12-18  4:36       ` H. Peter Anvin
2002-12-18  4:07     ` Linus Torvalds
2002-12-18  4:40       ` Stephen Rothwell
2002-12-18  4:52         ` Linus Torvalds
2002-12-18  4:53         ` Andrew Morton
2002-12-18 19:12         ` Andrew Morton
2002-12-18 23:45       ` Pavel Machek
2002-12-20  3:05         ` Alan Cox
2002-12-20  4:03           ` Stephen Rothwell
2002-12-18  6:00   ` Brian Gerst
2002-12-17 16:32 Manfred Spraul
2002-12-17 17:13 ` Richard B. Johnson
2002-12-17 17:19   ` Richard B. Johnson
2002-12-17 17:37     ` Mikael Pettersson
2002-12-17 16:14 John Reiser
2002-12-17 16:01 John Reiser
     [not found] <20021209193649.GC10316@suse.de.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0212161639310.1623-100000@penguin.transmeta.com.suse.lists.linux.kernel>
2002-12-17  8:56   ` Andi Kleen
2002-12-17 16:57     ` Linus Torvalds
2002-12-18  5:25       ` Brian Gerst
2002-12-18  6:06         ` Linus Torvalds
2002-12-21 11:24           ` Ingo Molnar
2002-12-21 17:28             ` Jamie Lokier
2002-12-21 16:07         ` Christian Leber
2002-12-15  8:43 scott thomason
2002-12-15  4:06 Albert D. Cahalan
2002-12-15 22:01 ` Pavel Machek
2002-12-16  7:33   ` Albert D. Cahalan
2002-12-16 11:17     ` Pavel Machek
2002-12-16 17:54       ` Mark Mielke
2002-12-16 16:07         ` Jonah Sherman
2002-12-17  4:10           ` David Schwartz
2002-12-17  8:02         ` Helge Hafting
2002-12-16 19:55       ` H. Peter Anvin
2002-12-13 21:52 Margit Schubert-While
2002-12-13 19:32 Dieter Nützel
2002-12-13 17:51 Margit Schubert-While
2002-12-11 12:48 Terje Eggestad
2002-12-11 18:50 ` H. Peter Anvin
2002-12-12  9:42   ` Terje Eggestad
2002-12-12 10:06     ` Arjan van de Ven
2002-12-12 10:31       ` Terje Eggestad
2002-12-12 19:03       ` H. Peter Anvin
2002-12-12 20:36     ` Mark Mielke
2002-12-12 20:56       ` J.A. Magallon
2002-12-12 20:12         ` Zac Hansen
2002-12-13  9:21         ` Terje Eggestad
2002-12-13 15:58           ` Ville Herva
2002-12-13 21:57             ` Terje Eggestad
2002-12-13 22:53               ` H. Peter Anvin
2002-12-12 20:56       ` Vojtech Pavlik
2002-12-09  8:30 Mike Hayward
2002-12-09 15:40 ` erich
2002-12-09 17:48 ` Linus Torvalds
2002-12-09 19:36   ` Dave Jones
2002-12-09 19:46     ` H. Peter Anvin
2002-12-28 20:37       ` Ville Herva
2002-12-29  2:05         ` Christian Leber
2002-12-30 18:22           ` Christian Leber
2002-12-30 21:22             ` Linus Torvalds
2002-12-30 11:29         ` Dave Jones
2002-12-17  0:47     ` Linus Torvalds
2002-12-17  1:03       ` Dave Jones
2002-12-17  2:36         ` Linus Torvalds
2002-12-17  5:55           ` Linus Torvalds
2002-12-17  6:09             ` Linus Torvalds
2002-12-17  6:18               ` Linus Torvalds
2002-12-19 14:03                 ` Shuji YAMAMURA
2002-12-17  6:19               ` GrandMasterLee
2002-12-17  6:43               ` dean gaudet
2002-12-17 16:50                 ` Linus Torvalds
2002-12-17 19:11                 ` H. Peter Anvin
2002-12-17 21:39                   ` Benjamin LaHaise
2002-12-17 21:41                     ` H. Peter Anvin
2002-12-17 21:53                       ` Benjamin LaHaise
2002-12-18 23:53                 ` Pavel Machek
2002-12-19 22:18                   ` H. Peter Anvin
2002-12-19 22:21                     ` Pavel Machek
2002-12-19 22:23                       ` H. Peter Anvin
2002-12-19 22:26                         ` Pavel Machek
2002-12-19 22:30                           ` H. Peter Anvin
2002-12-19 22:34                             ` Pavel Machek
2002-12-19 22:36                               ` H. Peter Anvin
2002-12-17 19:12               ` H. Peter Anvin
2002-12-17 19:26                 ` Martin J. Bligh
2002-12-17 20:51                   ` Alan Cox
2002-12-17 20:16                     ` H. Peter Anvin
2002-12-17 20:49                 ` Alan Cox
2002-12-17 20:12                   ` H. Peter Anvin
2002-12-17  9:45             ` Andre Hedrick
2002-12-17 12:40               ` Dave Jones
2002-12-17 23:18                 ` Andre Hedrick
2002-12-17 15:12               ` Alan Cox
2002-12-18 23:55                 ` Pavel Machek
2002-12-19 22:17                   ` H. Peter Anvin
2002-12-17 10:53             ` Ulrich Drepper
2002-12-17 11:17               ` dada1
2002-12-17 17:33                 ` Ulrich Drepper
2002-12-17 17:06               ` Linus Torvalds
2002-12-17 17:55                 ` Ulrich Drepper
2002-12-17 18:01                   ` Linus Torvalds
2002-12-17 19:23                   ` Alan Cox
2002-12-17 18:48                     ` Ulrich Drepper
2002-12-17 19:19                       ` H. Peter Anvin
2002-12-17 19:44                       ` Alan Cox
2002-12-17 19:52                         ` Richard B. Johnson
2002-12-17 19:54                           ` H. Peter Anvin
2002-12-17 19:58                           ` Linus Torvalds
2002-12-18  7:20                             ` Kai Henningsen
2002-12-17 18:49                     ` Linus Torvalds
2002-12-17 19:09                       ` Ross Biro
2002-12-17 21:34                       ` Benjamin LaHaise
2002-12-17 21:36                         ` H. Peter Anvin
2002-12-17 21:50                           ` Benjamin LaHaise
2002-12-18 23:59               ` Pavel Machek
2002-12-17 16:12             ` Hugh Dickins
2002-12-17 16:33               ` Richard B. Johnson
2002-12-17 17:47                 ` Linus Torvalds
2002-12-17 16:54               ` Hugh Dickins
2002-12-17 17:07               ` Linus Torvalds
2002-12-17 17:19                 ` Matti Aarnio
2002-12-17 17:55                   ` Linus Torvalds
2002-12-17 18:24                     ` Linus Torvalds
2002-12-17 18:33                       ` Ulrich Drepper
2002-12-17 18:30                     ` Ulrich Drepper
2002-12-17 19:04                       ` Linus Torvalds
2002-12-17 19:19                         ` Ulrich Drepper
2002-12-17 19:28                         ` Linus Torvalds
2002-12-17 19:32                           ` H. Peter Anvin
2002-12-17 19:44                             ` Linus Torvalds
2002-12-17 19:53                           ` Ulrich Drepper
2002-12-17 20:01                             ` Linus Torvalds
2002-12-17 20:17                               ` Ulrich Drepper
2002-12-18  4:15                                 ` Linus Torvalds
2002-12-18  4:15                               ` Linus Torvalds
2002-12-18  4:39                                 ` H. Peter Anvin
2002-12-18  4:49                                   ` Linus Torvalds
2002-12-18  6:38                                     ` Linus Torvalds
2002-12-18 13:17                                 ` Richard B. Johnson
2002-12-18 13:40                                 ` Horst von Brand
2002-12-18 13:47                                   ` Sean Neakums
2002-12-18 14:10                                     ` Horst von Brand
2002-12-18 14:51                                       ` dada1
2002-12-18 19:12                                       ` Mark Mielke
2002-12-18 15:52                                   ` Alan Cox
2002-12-18 16:41                                   ` Dave Jones
2002-12-18 18:41                                     ` Horst von Brand
2002-12-17 19:26                       ` Alan Cox
2002-12-17 18:57                         ` Ulrich Drepper
2002-12-17 19:10                           ` Linus Torvalds
2002-12-17 19:21                             ` H. Peter Anvin
2002-12-17 19:37                               ` Linus Torvalds
2002-12-17 19:43                                 ` H. Peter Anvin
2002-12-17 20:07                                   ` Matti Aarnio
2002-12-17 20:10                                     ` H. Peter Anvin
2002-12-17 19:59                                 ` Matti Aarnio
2002-12-17 20:06                                 ` Ulrich Drepper
2002-12-17 20:35                                   ` Daniel Jacobowitz
2002-12-18  0:20                                   ` Linus Torvalds
2002-12-18  0:38                                     ` Ulrich Drepper
2002-12-18  7:41                                 ` Kai Henningsen
2002-12-18 13:00                                 ` Rogier Wolff
2002-12-17 19:47                             ` Dave Jones
2002-12-18 12:57                             ` Rogier Wolff
2002-12-19  0:14                               ` Pavel Machek
2002-12-17 21:38                           ` Benjamin LaHaise
2002-12-17 21:41                             ` H. Peter Anvin
2002-12-17 18:39                     ` Jeff Dike
2002-12-17 19:05                       ` Linus Torvalds
2002-12-18  5:34                     ` Jeremy Fitzhardinge
2002-12-18  5:38                       ` H. Peter Anvin
2002-12-18 15:50                       ` Alan Cox
2002-12-18 23:51             ` Pavel Machek
2002-12-13 15:45 ` William Lee Irwin III
2002-12-13 16:49   ` Mike Hayward
2002-12-14  0:55     ` GrandMasterLee
2002-12-14  4:41       ` Mike Dresser
2002-12-14  4:53         ` Mike Dresser
2002-12-14 10:01           ` Dave Jones
2002-12-14 17:48             ` Mike Dresser
2002-12-14 18:36             ` GrandMasterLee
2002-12-15  2:03               ` J.A. Magallon
2002-12-15 21:59   ` Pavel Machek
2002-12-15 22:37     ` William Lee Irwin III
2002-12-15 22:43       ` Pavel Machek
2002-12-09  7:01 Samium Gromoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).