Intel P6 vs P7 system call performance

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Intel P6 vs P7 system call performance
@ 2002-12-09  8:30 Mike Hayward
  2002-12-09 15:40 ` erich
                   ` (2 more replies)
  0 siblings, 3 replies; 176+ messages in thread
From: Mike Hayward @ 2002-12-09  8:30 UTC (permalink / raw)
  To: linux-kernel

I have been benchmarking Pentium 4 boxes against my Pentium III laptop
with the exact same kernel and executables as well as custom compiled
kernels.  The Pentium III has a much lower clock rate and I have
noticed that system call performance (and hence io performance) is up
to an order of magnitude higher on my Pentium III laptop.  1k block IO
reads/writes are anemic on the Pentium 4, for example, so I'm trying
to figure out why and thought someone might have an idea.

Notice below that the System Call overhead is much higher on the
Pentium 4 even though the cpu runs more than twice the speed and the
system has DDRAM, a 400 Mhz FSB, etc.  I even get pretty remarkable
syscall/io performance on my Pentium III laptop vs. an otherwise idle
dual Xeon.

See how the performance is nearly opposite of what one would expect:

----------------------------------------------------------------------
basic sys call performance iterated for 10 secs:

        while (1) {
                close(dup(0));
                getpid();
                getuid();
                umask(022);
                iter++;
        }

M-Pentium III 850Mhz Sys Call Rate   433741.8
  Pentium 4     2Ghz Sys Call Rate   233637.8
  Xeon x 2    2.4Ghz Sys Call Rate   207684.2

----------------------------------------------------------------------
1k read sys calls iterated for 10 secs (all buffered reads, no disk):

M-Pentium III 850Mhz File Read      1492961.0 (~149 io/s)
  Pentium 4     2Ghz File Read      1088629.0 (~108 io/s)
  Xeon x 2    2.4Ghz File Read       686892.0 (~ 69 io/s)

Any ideas?  Not sure I want to upgrade to the P7 architecture if this
is right, since for me system calls are probably more important than
raw cpu computational power.

- Mike

--- Mobile Pentium III 850 Mhz ---

  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux flux.loup.net 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown
  Start Benchmark Run: Thu Nov  8 07:55:04 PST 2001
   1 interactive users.
Dhrystone 2 without register variables   1652556.1 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     1513809.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         3770106.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        230897.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           230586.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             230916.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            232229.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           222990.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          224339.4 lps   (10 secs, 6 samples)
System Call Overhead Test                433741.8 lps   (10 secs, 6 samples)
Pipe Throughput Test                     499465.5 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test        229029.2 lps   (10 secs, 6 samples)
Process Creation Test                      8696.6 lps   (10 secs, 6 samples)
Execl Throughput Test                      1089.8 lps   (9 secs, 6 samples)
File Read  (10 seconds)                  1492961.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  157663.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   32516.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                  1507645.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                  161130.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                   20155.0 KBps  (30 secs, 6 samples)
C Compiler Test                             491.2 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               1315.2 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                694.4 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                357.1 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                180.4 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places          46831.0 lpm   (60 secs, 6 samples)
Recursion Test--Tower of Hanoi            20954.1 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   224339.4       88.3
Dhrystone 2 without register variables       22366.3  1652556.1       73.9
Execl Throughput Test                           16.5     1089.8       66.0
File Copy  (30 seconds)                        179.0    20155.0      112.6
Pipe-based Context Switching Test             1318.5   229029.2      173.7
Shell scripts (8 concurrent)                     4.0      180.4       45.1
                                                                 =========
     SUM of  6 items                                                 559.6
     AVERAGE                                                          93.3

--- Desktop Pentium 4 2.0 Ghz w/ 266 Mhz DDR ---

  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux gw2 2.4.19 #1 Mon Dec 9 05:31:23 GMT-7 2002 i686 unknown
  Start Benchmark Run: Mon Dec  9 05:45:47 GMT-7 2002
   1 interactive users.
Dhrystone 2 without register variables   2910759.3 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     2928495.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         9252565.4 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        498894.3 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           473452.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             498956.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            498932.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           451138.8 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          451106.8 lps   (10 secs, 6 samples)
System Call Overhead Test                233637.8 lps   (10 secs, 6 samples)
Pipe Throughput Test                     437441.1 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test        167229.2 lps   (10 secs, 6 samples)
Process Creation Test                      9407.2 lps   (10 secs, 6 samples)
Execl Throughput Test                      2158.8 lps   (10 secs, 6 samples)
File Read  (10 seconds)                  1088629.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  472315.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   10569.0 KBps  (10 secs, 6 samples)
File Read  (120 seconds)                 1089526.0 KBps  (120 secs, 6 samples)
File Write (120 seconds)                 467028.0 KBps  (120 secs, 6 samples)
File Copy  (120 seconds)                   3541.0 KBps  (120 secs, 6 samples)
C Compiler Test                             973.9 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               2590.8 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)               1359.6 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                696.4 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                352.1 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places          99120.4 lpm   (60 secs, 6 samples)
Recursion Test--Tower of Hanoi            44857.5 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   451106.8      177.5
Dhrystone 2 without register variables       22366.3  2910759.3      130.1
Execl Throughput Test                           16.5     2158.8      130.8
File Copy  (120 seconds)                       179.0     3541.0       19.7
Pipe-based Context Switching Test             1318.5   167229.2      126.8
Shell scripts (8 concurrent)                     4.0      352.1       88.0
                                                                 =========
     SUM of  6 items                                                 673.0
     AVERAGE                                                         112.1


--- Pentium 4 Xeon 2.4 Ghz x 2 w/ 2.4.19 ---

  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux brent-xeon 2.4.19-kel #5 SMP Wed Sep 25 03:15:13 GMT 2002 i686 unknown
  Start Benchmark Run: Thu Oct 10 03:48:07 MDT 2002
   0 interactive users.
Dhrystone 2 without register variables   2200821.4 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     2233296.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         7366670.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        399261.4 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           361354.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             364200.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            345292.9 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           539907.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          537355.5 lps   (10 secs, 6 samples)
System Call Overhead Test                207684.2 lps   (10 secs, 6 samples)
Pipe Throughput Test                     283868.3 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test         98205.6 lps   (10 secs, 6 samples)
Process Creation Test                      5395.9 lps   (10 secs, 6 samples)
Execl Throughput Test                      1612.9 lps   (9 secs, 6 samples)
File Read  (10 seconds)                  686892.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  272217.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   56415.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                  681181.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                  272351.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                   20611.0 KBps  (30 secs, 6 samples)
C Compiler Test                             873.5 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               2970.1 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)               1294.2 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                845.2 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                409.2 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places           no measured results
Recursion Test--Tower of Hanoi            33661.9 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   537355.5      211.4
Dhrystone 2 without register variables       22366.3  2200821.4       98.4
Execl Throughput Test                           16.5     1612.9       97.8
File Copy  (30 seconds)                        179.0    20611.0      115.1
Pipe-based Context Switching Test             1318.5    98205.6       74.5
Shell scripts (8 concurrent)                     4.0      409.2      102.3
                                                                 =========
     SUM of  6 items                                                 699.5
     AVERAGE                                                         116.6

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09  8:30 Intel P6 vs P7 system call performance Mike Hayward
@ 2002-12-09 15:40 ` erich
  2002-12-09 17:48 ` Linus Torvalds
  2002-12-13 15:45 ` William Lee Irwin III
  2 siblings, 0 replies; 176+ messages in thread
From: erich @ 2002-12-09 15:40 UTC (permalink / raw)
  To: Mike Hayward; +Cc: linux-kernel


Mike Hayward <hayward@loup.net> wrote:

> I have been benchmarking Pentium 4 boxes against my Pentium III laptop
> with the exact same kernel and executables as well as custom compiled
> kernels.  The Pentium III has a much lower clock rate and I have
> noticed that system call performance (and hence io performance) is up
> to an order of magnitude higher on my Pentium III laptop.  1k block IO
> reads/writes are anemic on the Pentium 4, for example, so I'm trying
> to figure out why and thought someone might have an idea.
> 
> Notice below that the System Call overhead is much higher on the
> Pentium 4 even though the cpu runs more than twice the speed and the
> system has DDRAM, a 400 Mhz FSB, etc.  I even get pretty remarkable
> syscall/io performance on my Pentium III laptop vs. an otherwise idle
> dual Xeon.
> 
> See how the performance is nearly opposite of what one would expect:
...
> M-Pentium III 850Mhz Sys Call Rate   433741.8
>   Pentium 4     2Ghz Sys Call Rate   233637.8
>   Xeon x 2    2.4Ghz Sys Call Rate   207684.2
...[other benchmark deleted]...
> Any ideas?  Not sure I want to upgrade to the P7 architecture if this
> is right, since for me system calls are probably more important than
> raw cpu computational power.

You're assuming that ALL operations in a P4 are linearly faster than
a P-III.  This is definitely not the case.

A P4 has a much longer pipeline (for a many cases, considerably
longer than the diagrams imply) than the P-III, and in particular
it has a much longer latency in handling mode transitions.

The results you got don't surprise me whatsoever.  In fact the raw
system call transition instructions are likely 5x slower on the
P4.

--
    Erich Stefan Boleyn     <erich@uruk.org>     http://www.uruk.org/
"Reality is truly stranger than fiction; Probably why fiction is so popular"

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09  8:30 Intel P6 vs P7 system call performance Mike Hayward
  2002-12-09 15:40 ` erich
@ 2002-12-09 17:48 ` Linus Torvalds
  2002-12-09 19:36   ` Dave Jones
  2002-12-13 15:45 ` William Lee Irwin III
  2 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-09 17:48 UTC (permalink / raw)
  To: linux-kernel

In article <200212090830.gB98USW05593@flux.loup.net>,
Mike Hayward  <hayward@loup.net> wrote:
>
>I have been benchmarking Pentium 4 boxes against my Pentium III laptop
>with the exact same kernel and executables as well as custom compiled
>kernels.  The Pentium III has a much lower clock rate and I have
>noticed that system call performance (and hence io performance) is up
>to an order of magnitude higher on my Pentium III laptop.  1k block IO
>reads/writes are anemic on the Pentium 4, for example, so I'm trying
>to figure out why and thought someone might have an idea.

P4's really suck at system calls.  A 2.8GHz P4 does a simple system call
a lot _slower_ than a 500MHz PIII. 

The P4 has problems with some other things too, but the "int + iret"
instruction combination is absolutely the worst I've seen.  A 1.2GHz
Athlon will be 5-10 times faster than the fastest P4 on system call
overhead. 

HOWEVER, the P4 is really good at a lot of other things. On average, a
P4 tends to perform quite well on most loads, and hyperthreading (if you
have a Xeon or one of the newer desktop CPU's) also tends to work quite
well to smooth things out in real life.

In short: the P4 architecture excels at some things, and it sucks at
others. It _mostly_ tends to excel more than suck.

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09 17:48 ` Linus Torvalds
@ 2002-12-09 19:36   ` Dave Jones
  2002-12-09 19:46     ` H. Peter Anvin
  2002-12-17  0:47     ` Linus Torvalds
  0 siblings, 2 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-09 19:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Mon, Dec 09, 2002 at 05:48:45PM +0000, Linus Torvalds wrote:

 > P4's really suck at system calls.  A 2.8GHz P4 does a simple system call
 > a lot _slower_ than a 500MHz PIII. 
 > 
 > The P4 has problems with some other things too, but the "int + iret"
 > instruction combination is absolutely the worst I've seen.  A 1.2GHz
 > Athlon will be 5-10 times faster than the fastest P4 on system call
 > overhead. 

Time to look into an alternative like SYSCALL perhaps ?

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09 19:36   ` Dave Jones
@ 2002-12-09 19:46     ` H. Peter Anvin
  2002-12-28 20:37       ` Ville Herva
  2002-12-17  0:47     ` Linus Torvalds
  1 sibling, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-09 19:46 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20021209193649.GC10316@suse.de>
By author:    Dave Jones <davej@codemonkey.org.uk>
In newsgroup: linux.dev.kernel
>
> On Mon, Dec 09, 2002 at 05:48:45PM +0000, Linus Torvalds wrote:
> 
>  > P4's really suck at system calls.  A 2.8GHz P4 does a simple system call
>  > a lot _slower_ than a 500MHz PIII. 
>  > 
>  > The P4 has problems with some other things too, but the "int + iret"
>  > instruction combination is absolutely the worst I've seen.  A 1.2GHz
>  > Athlon will be 5-10 times faster than the fastest P4 on system call
>  > overhead. 
> 
> Time to look into an alternative like SYSCALL perhaps ?
> 

SYSCALL is AMD.  SYSENTER is Intel, and is likely to be significantly
faster.  Unfortunately SYSENTER is also extremely braindamaged, in
that it destroys *both* the EIP and the ESP beyond recovery, and
because it's allowed in V86 and 16-bit modes (where it will cause
permanent data loss) which means that it needs to be able to be turned
off for things like DOSEMU and WINE to work correctly.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09  8:30 Intel P6 vs P7 system call performance Mike Hayward
  2002-12-09 15:40 ` erich
  2002-12-09 17:48 ` Linus Torvalds
@ 2002-12-13 15:45 ` William Lee Irwin III
  2002-12-13 16:49   ` Mike Hayward
  2002-12-15 21:59   ` Pavel Machek
  2 siblings, 2 replies; 176+ messages in thread
From: William Lee Irwin III @ 2002-12-13 15:45 UTC (permalink / raw)
  To: Mike Hayward; +Cc: linux-kernel

On Mon, Dec 09, 2002 at 01:30:28AM -0700, Mike Hayward wrote:
> Any ideas?  Not sure I want to upgrade to the P7 architecture if this
> is right, since for me system calls are probably more important than
> raw cpu computational power.

This is the same for me. I'm extremely uninterested in the P-IV for my
own use because of this.


Bill

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-13 15:45 ` William Lee Irwin III
@ 2002-12-13 16:49   ` Mike Hayward
  2002-12-14  0:55     ` GrandMasterLee
  2002-12-15 21:59   ` Pavel Machek
  1 sibling, 1 reply; 176+ messages in thread
From: Mike Hayward @ 2002-12-13 16:49 UTC (permalink / raw)
  To: wli; +Cc: linux-kernel

Hi Bill,

 > On Mon, Dec 09, 2002 at 01:30:28AM -0700, Mike Hayward wrote:
 > > Any ideas?  Not sure I want to upgrade to the P7 architecture if this
 > > is right, since for me system calls are probably more important than
 > > raw cpu computational power.
 > 
 > This is the same for me. I'm extremely uninterested in the P-IV for my
 > own use because of this.

I've also noticed that algorithms like the recursive one I run which
simulates solving the Tower of Hanoi problem are most likely very hard
to do branch prediction on.  Both the code and data no doubt fit
entirely in the L2 cache.  The AMD processor below is a much lower
cost and significantly lower clock rate (and on a machine with only a
100Mhz Memory bus) than the Xeon, yet dramatically outperforms it with
the same executable, compiled with gcc -march=i686 -O3.  Maybe with a
better Pentium 4 optimizing compiler the P4 and Xeon could improve a
few percent, but I doubt it'll ever see the AMD numbers.

Recursion Test--Tower of Hanoi

Uni  AMD XP 1800            2.4.18 kernel  46751.6 lps   (10 secs, 6 samples)
Dual Pentium 4 Xeon 2.4Ghz  2.4.19 kernel  33661.9 lps   (10 secs, 6 samples)

- Mike

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-13 16:49   ` Mike Hayward
@ 2002-12-14  0:55     ` GrandMasterLee
  2002-12-14  4:41       ` Mike Dresser
  0 siblings, 1 reply; 176+ messages in thread
From: GrandMasterLee @ 2002-12-14  0:55 UTC (permalink / raw)
  To: Mike Hayward; +Cc: wli, linux-kernel

On Fri, 2002-12-13 at 10:49, Mike Hayward wrote:
> Hi Bill,
> 
>  > On Mon, Dec 09, 2002 at 01:30:28AM -0700, Mike Hayward wrote:
>  > > Any ideas?  Not sure I want to upgrade to the P7 architecture if this
>  > > is right, since for me system calls are probably more important than
>  > > raw cpu computational power.
>  > 
>  > This is the same for me. I'm extremely uninterested in the P-IV for my
>  > own use because of this.
> 
> I've also noticed that algorithms like the recursive one I run which
> simulates solving the Tower of Hanoi problem are most likely very hard
> to do branch prediction on.  Both the code and data no doubt fit
> entirely in the L2 cache.  The AMD processor below is a much lower
> cost and significantly lower clock rate (and on a machine with only a
> 100Mhz Memory bus) than the Xeon, yet dramatically outperforms it with
> the same executable, compiled with gcc -march=i686 -O3.  Maybe with a
> better Pentium 4 optimizing compiler the P4 and Xeon could improve a
> few percent, but I doubt it'll ever see the AMD numbers.
What GCC were you using? I'd use 3.2, or 3.2.1 myself with
-march=pentium4 and -mcpu=pentium4 to see if there *is* any difference
there. On my quad P4 Xeon 1.6Ghz with 1M L3 cache, I can compile a
kernel in about 35 seconds. Mind you that's my own config, not
*everything*. On a dual athlon MP at 1.8 Ghz, I get about 5 mins or so.
Both are running with make -jx where X is the saturation value.


> Recursion Test--Tower of Hanoi
> 
> Uni  AMD XP 1800            2.4.18 kernel  46751.6 lps   (10 secs, 6 samples)
> Dual Pentium 4 Xeon 2.4Ghz  2.4.19 kernel  33661.9 lps   (10 secs, 6 samples)
> 
> - Mike
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
GrandMasterLee <masterlee@digitalroadkill.net>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-14  0:55     ` GrandMasterLee
@ 2002-12-14  4:41       ` Mike Dresser
  2002-12-14  4:53         ` Mike Dresser
  0 siblings, 1 reply; 176+ messages in thread
From: Mike Dresser @ 2002-12-14  4:41 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

On 13 Dec 2002, GrandMasterLee wrote:

> there. On my quad P4 Xeon 1.6Ghz with 1M L3 cache, I can compile a
> kernel in about 35 seconds. Mind you that's my own config, not
> *everything*. On a dual athlon MP at 1.8 Ghz, I get about 5 mins or so.
> Both are running with make -jx where X is the saturation value.

Something seems odd about the athlon MP time, I've got a celeron 533
with slow disks that does a pretty standard make dep ; make of 2.4.20 in
7m05, which is not that much different considering it's a third the speed,
and one cpu instead of two.

The single P4/2.53 in another machine can haul down in 3m17s

Guess our kernel .config's or version must vary greatly.

Mike


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-14  4:41       ` Mike Dresser
@ 2002-12-14  4:53         ` Mike Dresser
  2002-12-14 10:01           ` Dave Jones
  0 siblings, 1 reply; 176+ messages in thread
From: Mike Dresser @ 2002-12-14  4:53 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

On Fri, 13 Dec 2002, Mike Dresser wrote:

> The single P4/2.53 in another machine can haul down in 3m17s
>
Amend that to 2m19s, forgot to kill a background backup that was moving
files around at about 20 meg a second.

Mike


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-14  4:53         ` Mike Dresser
@ 2002-12-14 10:01           ` Dave Jones
  2002-12-14 17:48             ` Mike Dresser
  2002-12-14 18:36             ` GrandMasterLee
  0 siblings, 2 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-14 10:01 UTC (permalink / raw)
  To: Mike Dresser; +Cc: GrandMasterLee, linux-kernel

On Fri, Dec 13, 2002 at 11:53:51PM -0500, Mike Dresser wrote:
 > On Fri, 13 Dec 2002, Mike Dresser wrote:
 > 
 > > The single P4/2.53 in another machine can haul down in 3m17s
 > >
 > Amend that to 2m19s, forgot to kill a background backup that was moving
 > files around at about 20 meg a second.

Note that there are more factors at play than raw cpu speed in a
kernel compile. Your time here is slightly faster than my 2.8Ghz P4-HT for
example.  My guess is you have faster disk(s) than I do, as most of
the time mine seems to be waiting for something to do.

*note also that this is compiling stock 2.4.20 with default configuration.
The minute you change any options, we're comparings apples to oranges.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-14 10:01           ` Dave Jones
@ 2002-12-14 17:48             ` Mike Dresser
  2002-12-14 18:36             ` GrandMasterLee
  1 sibling, 0 replies; 176+ messages in thread
From: Mike Dresser @ 2002-12-14 17:48 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

On Sat, 14 Dec 2002, Dave Jones wrote:

> Note that there are more factors at play than raw cpu speed in a
> kernel compile. Your time here is slightly faster than my 2.8Ghz P4-HT for
> example.  My guess is you have faster disk(s) than I do, as most of
> the time mine seems to be waiting for something to do.

Quantum Fireball AS's in that machine.  My main comment was that his
Althon MP at 1.8 was half or less the speed of a single P4.  Even with
compiler changes, I wouldn't think it would make THAT much of a
difference?

Mike


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-14 10:01           ` Dave Jones
  2002-12-14 17:48             ` Mike Dresser
@ 2002-12-14 18:36             ` GrandMasterLee
  2002-12-15  2:03               ` J.A. Magallon
  1 sibling, 1 reply; 176+ messages in thread
From: GrandMasterLee @ 2002-12-14 18:36 UTC (permalink / raw)
  To: Dave Jones; +Cc: Mike Dresser, linux-kernel

On Sat, 2002-12-14 at 04:01, Dave Jones wrote:
> On Fri, Dec 13, 2002 at 11:53:51PM -0500, Mike Dresser wrote:
>  > On Fri, 13 Dec 2002, Mike Dresser wrote:
>  > 
>  > > The single P4/2.53 in another machine can haul down in 3m17s
>  > >
>  > Amend that to 2m19s, forgot to kill a background backup that was moving
>  > files around at about 20 meg a second.



> Note that there are more factors at play than raw cpu speed in a
> kernel compile. Your time here is slightly faster than my 2.8Ghz P4-HT for
> example.  My guess is you have faster disk(s) than I do, as most of
> the time mine seems to be waiting for something to do.

An easy way to level the playing field would be to use /dev/shm to build
your kernel in. That way it's all in memory. If you've got a maching
with 512M, then it's easily accomplished.

> *note also that this is compiling stock 2.4.20 with default configuration.
> The minute you change any options, we're comparings apples to oranges.
> 
> 		Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-14 18:36             ` GrandMasterLee
@ 2002-12-15  2:03               ` J.A. Magallon
  0 siblings, 0 replies; 176+ messages in thread
From: J.A. Magallon @ 2002-12-15  2:03 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel


On 2002.12.14 GrandMasterLee wrote:
>On Sat, 2002-12-14 at 04:01, Dave Jones wrote:
>> On Fri, Dec 13, 2002 at 11:53:51PM -0500, Mike Dresser wrote:
>>  > On Fri, 13 Dec 2002, Mike Dresser wrote:
>>  > 
>>  > > The single P4/2.53 in another machine can haul down in 3m17s
>>  > >
>>  > Amend that to 2m19s, forgot to kill a background backup that was moving
>>  > files around at about 20 meg a second.
>
>
>
>> Note that there are more factors at play than raw cpu speed in a
>> kernel compile. Your time here is slightly faster than my 2.8Ghz P4-HT for
>> example.  My guess is you have faster disk(s) than I do, as most of
>> the time mine seems to be waiting for something to do.
>
>An easy way to level the playing field would be to use /dev/shm to build
>your kernel in. That way it's all in memory. If you've got a maching
>with 512M, then it's easily accomplished.
>

tmpfs does not guarantee you that it is always in ram. It also can be paged.
An easier way is to fill you page cache with the kernel tree like

werewolf:/usr/src/linux# grep -v -r "" *

and then build, so no disk read will be trown.

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-13 15:45 ` William Lee Irwin III
  2002-12-13 16:49   ` Mike Hayward
@ 2002-12-15 21:59   ` Pavel Machek
  2002-12-15 22:37     ` William Lee Irwin III
  1 sibling, 1 reply; 176+ messages in thread
From: Pavel Machek @ 2002-12-15 21:59 UTC (permalink / raw)
  To: William Lee Irwin III, Mike Hayward, linux-kernel

Hi!

> > Any ideas?  Not sure I want to upgrade to the P7 architecture if this
> > is right, since for me system calls are probably more important than
> > raw cpu computational power.
> 
> This is the same for me. I'm extremely uninterested in the P-IV for my
> own use because of this.

Well, then you should fix the kernel so that syscalls are done by
sysenter (or how is it called).
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-15 21:59   ` Pavel Machek
@ 2002-12-15 22:37     ` William Lee Irwin III
  2002-12-15 22:43       ` Pavel Machek
  0 siblings, 1 reply; 176+ messages in thread
From: William Lee Irwin III @ 2002-12-15 22:37 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Mike Hayward, linux-kernel

At some point in the past, I wrote:
>> This is the same for me. I'm extremely uninterested in the P-IV for my
>> own use because of this.

On Sun, Dec 15, 2002 at 10:59:51PM +0100, Pavel Machek wrote:
> Well, then you should fix the kernel so that syscalls are done by
> sysenter (or how is it called).
> 								Pavel

ABI is immutable. I actually run apps at home.

sysenter is also unusable for low-level loss-of-state reasons mentioned
elsewhere in this thread.


Nice try, though.


Bill

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-15 22:37     ` William Lee Irwin III
@ 2002-12-15 22:43       ` Pavel Machek
  0 siblings, 0 replies; 176+ messages in thread
From: Pavel Machek @ 2002-12-15 22:43 UTC (permalink / raw)
  To: William Lee Irwin III, Pavel Machek, Mike Hayward, linux-kernel

Hi!

> >> This is the same for me. I'm extremely uninterested in the P-IV for my
> >> own use because of this.
> 
> > Well, then you should fix the kernel so that syscalls are done by
> > sysenter (or how is it called).
> > 								Pavel
> 
> ABI is immutable. I actually run apps at home.

Perhaps that one killer app can be recompiled?

> sysenter is also unusable for low-level loss-of-state reasons mentioned
> elsewhere in this thread.

Well, disabling v86 may be well wroth it :-).
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09 19:36   ` Dave Jones
  2002-12-09 19:46     ` H. Peter Anvin
@ 2002-12-17  0:47     ` Linus Torvalds
  2002-12-17  1:03       ` Dave Jones
  1 sibling, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17  0:47 UTC (permalink / raw)
  To: Dave Jones, Ingo Molnar; +Cc: linux-kernel


On Mon, 9 Dec 2002, Dave Jones wrote:
> 
> Time to look into an alternative like SYSCALL perhaps ?

Well, here's a very raw first try at using intel sysenter/sysexit.

It does actually work, I've done a "hello world" program that used 
sysenter to enter the kernel, but kernel exit requires knowing where to 
return to (the SYSENTER_RETURN define in entry.S), and I didn't set up a 
fixmap entry for this yet, so I don't have a good value to return to yet.

But this, together with a fixmap entry that is user-readable (and thus
executable) that contains the "sysenter" instruction (and enough setup so
that %ebp points to the stack we want to return with), and together with
some debugging should get you there.

WARNING! I may be setting up the stack slightly incorrectly, since this
also hurls chunks when debugging. Dunno. Ingo, care to take a look?

Btw, that per-CPU sysenter entry-point is really clever of me, but it's 
not strictly NMI-safe. There's a single-instruction window between having 
started "sysenter" and having a valid kernel stack, and if an NMI comes in 
at that point, the NMI will now have a bogus stack pointer.

That NMI problem is pretty fundamentally unfixable due to the stupid
sysenter semantics, but we could just make the NMI handlers be real
careful about it and fix it up if it happens.

Most of the diff here is actually moving around some of the segments, 
since sysenter/sysexit wants them in one particular order. The setup code 
to initialize sysenter is itself pretty trivial.

		Linus

----
===== arch/i386/kernel/sysenter.c 1.1 vs edited =====
--- 1.1/arch/i386/kernel/sysenter.c	Sat Dec 14 04:38:56 2002
+++ edited/arch/i386/kernel/sysenter.c	2002-12-16 16:37:32.000000000 -0800
@@ -0,0 +1,52 @@
+/*
+ * linux/arch/i386/kernel/sysenter.c
+ *
+ * (C) Copyright 2002 Linus Torvalds
+ *
+ * This file contains the needed initializations to support sysenter.
+ */
+
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/thread_info.h>
+#include <linux/gfp.h>
+
+#include <asm/cpufeature.h>
+#include <asm/msr.h>
+
+extern asmlinkage void sysenter_entry(void);
+
+static void __init enable_sep_cpu(void *info)
+{
+	unsigned long page = __get_free_page(GFP_ATOMIC);
+	int cpu = get_cpu();
+	unsigned long *esp0_ptr = &(init_tss + cpu)->esp0;
+	unsigned long rel32;
+
+	rel32 = (unsigned long) sysenter_entry - (page+11);
+
+	
+	*(short *) (page+0) = 0x258b;		/* movl xxxxx,%esp */
+	*(long **) (page+2) = esp0_ptr;
+	*(char *)  (page+6) = 0xe9;		/* jmp rl32 */
+	*(long *)  (page+7) = rel32;
+
+	wrmsr(0x174, __KERNEL_CS, 0);		/* SYSENTER_CS_MSR */
+	wrmsr(0x175, page+PAGE_SIZE, 0);	/* SYSENTER_ESP_MSR */
+	wrmsr(0x176, page, 0);			/* SYSENTER_EIP_MSR */
+
+	printk("Enabling SEP on CPU %d\n", cpu);
+	put_cpu();	
+}
+
+static int __init sysenter_setup(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_SEP))
+		return 0;
+
+	enable_sep_cpu(NULL);
+	smp_call_function(enable_sep_cpu, NULL, 1, 1);
+	return 0;
+}
+
+__initcall(sysenter_setup);
===== arch/i386/kernel/Makefile 1.30 vs edited =====
--- 1.30/arch/i386/kernel/Makefile	Sat Dec 14 04:38:56 2002
+++ edited/arch/i386/kernel/Makefile	Mon Dec 16 13:43:57 2002
@@ -29,6 +29,7 @@
 obj-$(CONFIG_PROFILING)		+= profile.o
 obj-$(CONFIG_EDD)             	+= edd.o
 obj-$(CONFIG_MODULES)		+= module.o
+obj-y				+= sysenter.o
 
 EXTRA_AFLAGS   := -traditional
 
===== arch/i386/kernel/entry.S 1.41 vs edited =====
--- 1.41/arch/i386/kernel/entry.S	Fri Dec  6 09:43:43 2002
+++ edited/arch/i386/kernel/entry.S	Mon Dec 16 16:17:47 2002
@@ -94,7 +94,7 @@
 	movl %edx, %ds; \
 	movl %edx, %es;
 
-#define RESTORE_ALL	\
+#define RESTORE_REGS	\
 	popl %ebx;	\
 	popl %ecx;	\
 	popl %edx;	\
@@ -104,14 +104,25 @@
 	popl %eax;	\
 1:	popl %ds;	\
 2:	popl %es;	\
-	addl $4, %esp;	\
-3:	iret;		\
 .section .fixup,"ax";	\
-4:	movl $0,(%esp);	\
+3:	movl $0,(%esp);	\
 	jmp 1b;		\
-5:	movl $0,(%esp);	\
+4:	movl $0,(%esp);	\
 	jmp 2b;		\
-6:	pushl %ss;	\
+.previous;		\
+.section __ex_table,"a";\
+	.align 4;	\
+	.long 1b,3b;	\
+	.long 2b,4b;	\
+.previous
+
+
+#define RESTORE_ALL	\
+	RESTORE_REGS	\
+	addl $4, %esp;	\
+1:	iret;		\
+.section .fixup,"ax";   \
+2:	pushl %ss;	\
 	popl %ds;	\
 	pushl %ss;	\
 	popl %es;	\
@@ -120,11 +131,11 @@
 .previous;		\
 .section __ex_table,"a";\
 	.align 4;	\
-	.long 1b,4b;	\
-	.long 2b,5b;	\
-	.long 3b,6b;	\
+	.long 1b,2b;	\
 .previous
 
+
+
 ENTRY(lcall7)
 	pushfl			# We get a different stack layout with call
 				# gates, which has to be cleaned up later..
@@ -219,6 +230,39 @@
 	cli
 	jmp need_resched
 #endif
+
+#define SYSENTER_RETURN 0
+
+	# sysenter call handler stub
+	ALIGN
+ENTRY(sysenter_entry)
+	sti
+	pushl $(__USER_DS)
+	pushl %ebp
+	pushfl
+	pushl $(__USER_CS)
+	pushl $SYSENTER_RETURN
+
+	pushl %eax
+	SAVE_ALL
+	GET_THREAD_INFO(%ebx)
+	cmpl $(NR_syscalls), %eax
+	jae syscall_badsys
+
+	testb $_TIF_SYSCALL_TRACE,TI_FLAGS(%ebx)
+	jnz syscall_trace_entry
+	call *sys_call_table(,%eax,4)
+	movl %eax,EAX(%esp)
+	cli
+	movl TI_FLAGS(%ebx), %ecx
+	testw $_TIF_ALLWORK_MASK, %cx
+	jne syscall_exit_work
+	RESTORE_REGS
+	movl 4(%esp),%edx
+	movl 16(%esp),%ecx
+	sti
+	sysexit
+
 
 	# system call handler stub
 	ALIGN
===== arch/i386/kernel/head.S 1.18 vs edited =====
--- 1.18/arch/i386/kernel/head.S	Thu Dec  5 18:56:49 2002
+++ edited/arch/i386/kernel/head.S	Mon Dec 16 14:14:44 2002
@@ -414,8 +414,8 @@
 	.quad 0x0000000000000000	/* 0x0b reserved */
 	.quad 0x0000000000000000	/* 0x13 reserved */
 	.quad 0x0000000000000000	/* 0x1b reserved */
-	.quad 0x00cffa000000ffff	/* 0x23 user 4GB code at 0x00000000 */
-	.quad 0x00cff2000000ffff	/* 0x2b user 4GB data at 0x00000000 */
+	.quad 0x0000000000000000	/* 0x20 unused */
+	.quad 0x0000000000000000	/* 0x28 unused */
 	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
 	.quad 0x0000000000000000	/* 0x3b TLS entry 2 */
 	.quad 0x0000000000000000	/* 0x43 TLS entry 3 */
@@ -425,22 +425,25 @@
 
 	.quad 0x00cf9a000000ffff	/* 0x60 kernel 4GB code at 0x00000000 */
 	.quad 0x00cf92000000ffff	/* 0x68 kernel 4GB data at 0x00000000 */
-	.quad 0x0000000000000000	/* 0x70 TSS descriptor */
-	.quad 0x0000000000000000	/* 0x78 LDT descriptor */
+	.quad 0x00cffa000000ffff	/* 0x73 user 4GB code at 0x00000000 */
+	.quad 0x00cff2000000ffff	/* 0x7b user 4GB data at 0x00000000 */
+
+	.quad 0x0000000000000000	/* 0x80 TSS descriptor */
+	.quad 0x0000000000000000	/* 0x88 LDT descriptor */
 
 	/* Segments used for calling PnP BIOS */
-	.quad 0x00c09a0000000000	/* 0x80 32-bit code */
-	.quad 0x00809a0000000000	/* 0x88 16-bit code */
-	.quad 0x0080920000000000	/* 0x90 16-bit data */
-	.quad 0x0080920000000000	/* 0x98 16-bit data */
+	.quad 0x00c09a0000000000	/* 0x90 32-bit code */
+	.quad 0x00809a0000000000	/* 0x98 16-bit code */
 	.quad 0x0080920000000000	/* 0xa0 16-bit data */
+	.quad 0x0080920000000000	/* 0xa8 16-bit data */
+	.quad 0x0080920000000000	/* 0xb0 16-bit data */
 	/*
 	 * The APM segments have byte granularity and their bases
 	 * and limits are set at run time.
 	 */
-	.quad 0x00409a0000000000	/* 0xa8 APM CS    code */
-	.quad 0x00009a0000000000	/* 0xb0 APM CS 16 code (16 bit) */
-	.quad 0x0040920000000000	/* 0xb8 APM DS    data */
+	.quad 0x00409a0000000000	/* 0xb8 APM CS    code */
+	.quad 0x00009a0000000000	/* 0xc0 APM CS 16 code (16 bit) */
+	.quad 0x0040920000000000	/* 0xc8 APM DS    data */
 
 #if CONFIG_SMP
 	.fill (NR_CPUS-1)*GDT_ENTRIES,8,0 /* other CPU's GDT */
===== include/asm-i386/segment.h 1.2 vs edited =====
--- 1.2/include/asm-i386/segment.h	Mon Aug 12 10:56:27 2002
+++ edited/include/asm-i386/segment.h	Mon Dec 16 14:08:09 2002
@@ -9,8 +9,8 @@
  *   2 - reserved
  *   3 - reserved
  *
- *   4 - default user CS		<==== new cacheline
- *   5 - default user DS
+ *   4 - unused			<==== new cacheline
+ *   5 - unused
  *
  *  ------- start of TLS (Thread-Local Storage) segments:
  *
@@ -25,16 +25,18 @@
  *
  *  12 - kernel code segment		<==== new cacheline
  *  13 - kernel data segment
- *  14 - TSS
- *  15 - LDT
- *  16 - PNPBIOS support (16->32 gate)
- *  17 - PNPBIOS support
- *  18 - PNPBIOS support
+ *  14 - default user CS
+ *  15 - default user DS
+ *  16 - TSS
+ *  17 - LDT
+ *  18 - PNPBIOS support (16->32 gate)
  *  19 - PNPBIOS support
  *  20 - PNPBIOS support
- *  21 - APM BIOS support
- *  22 - APM BIOS support
- *  23 - APM BIOS support 
+ *  21 - PNPBIOS support
+ *  22 - PNPBIOS support
+ *  23 - APM BIOS support
+ *  24 - APM BIOS support
+ *  25 - APM BIOS support 
  */
 #define GDT_ENTRY_TLS_ENTRIES	3
 #define GDT_ENTRY_TLS_MIN	6
@@ -42,10 +44,10 @@
 
 #define TLS_SIZE (GDT_ENTRY_TLS_ENTRIES * 8)
 
-#define GDT_ENTRY_DEFAULT_USER_CS	4
+#define GDT_ENTRY_DEFAULT_USER_CS	14
 #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS * 8 + 3)
 
-#define GDT_ENTRY_DEFAULT_USER_DS	5
+#define GDT_ENTRY_DEFAULT_USER_DS	15
 #define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS * 8 + 3)
 
 #define GDT_ENTRY_KERNEL_BASE	12
@@ -56,14 +58,14 @@
 #define GDT_ENTRY_KERNEL_DS		(GDT_ENTRY_KERNEL_BASE + 1)
 #define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8)
 
-#define GDT_ENTRY_TSS			(GDT_ENTRY_KERNEL_BASE + 2)
-#define GDT_ENTRY_LDT			(GDT_ENTRY_KERNEL_BASE + 3)
+#define GDT_ENTRY_TSS			(GDT_ENTRY_KERNEL_BASE + 4)
+#define GDT_ENTRY_LDT			(GDT_ENTRY_KERNEL_BASE + 5)
 
-#define GDT_ENTRY_PNPBIOS_BASE		(GDT_ENTRY_KERNEL_BASE + 4)
-#define GDT_ENTRY_APMBIOS_BASE		(GDT_ENTRY_KERNEL_BASE + 9)
+#define GDT_ENTRY_PNPBIOS_BASE		(GDT_ENTRY_KERNEL_BASE + 6)
+#define GDT_ENTRY_APMBIOS_BASE		(GDT_ENTRY_KERNEL_BASE + 11)
 
 /*
- * The GDT has 21 entries but we pad it to cacheline boundary:
+ * The GDT has 23 entries but we pad it to cacheline boundary:
  */
 #define GDT_ENTRIES 24
 


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  0:47     ` Linus Torvalds
@ 2002-12-17  1:03       ` Dave Jones
  2002-12-17  2:36         ` Linus Torvalds
  0 siblings, 1 reply; 176+ messages in thread
From: Dave Jones @ 2002-12-17  1:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel

On Mon, Dec 16, 2002 at 04:47:00PM -0800, Linus Torvalds wrote:

Cool, new toys 8-) I'll have a play with this tomorrow.
after a quick glance, one thing jumped out at me.

 > +static int __init sysenter_setup(void)
 > +{
 > +	if (!boot_cpu_has(X86_FEATURE_SEP))
 > +		return 0;
 > +
 > +	enable_sep_cpu(NULL);
 > +	smp_call_function(enable_sep_cpu, NULL, 1, 1);
 > +	return 0;

I'm sure I recall seeing errata on at least 1 CPU re sysenter.
If we do decide to go this route, we'll need to blacklist ones
with any really icky problems.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  1:03       ` Dave Jones
@ 2002-12-17  2:36         ` Linus Torvalds
  2002-12-17  5:55           ` Linus Torvalds
  0 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17  2:36 UTC (permalink / raw)
  To: Dave Jones; +Cc: Ingo Molnar, linux-kernel

On Tue, 17 Dec 2002, Dave Jones wrote:
>
> I'm sure I recall seeing errata on at least 1 CPU re sysenter.
> If we do decide to go this route, we'll need to blacklist ones
> with any really icky problems.

The errata is something like "all P6's report SEP, but it doesn't
actually _work_ on anything before the third stepping".

However, that should _not_ be handled by magic sysenter-specific code.
That's what the per-vendor cpu feature fixups are there for, so that these
kinds of bugs get fixed in _one_ place (initialization) and not in all the
users of the feature flags.

In fact, we already have that code in the proper place, namely
arch/i386/kernel/cpu/intel.c:

        /* SEP CPUID bug: Pentium Pro reports SEP but doesn't have it */
        if ( c->x86 == 6 && c->x86_model < 3 && c->x86_mask < 3 )
                clear_bit(X86_FEATURE_SEP, c->x86_capability);

so the stuff I sent out should work on everything.

(Modulo the missing syscall page I already mentioned and potential bugs
in the code itself, of course ;)

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  2:36         ` Linus Torvalds
@ 2002-12-17  5:55           ` Linus Torvalds
  2002-12-17  6:09             ` Linus Torvalds
                               ` (4 more replies)
  0 siblings, 5 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17  5:55 UTC (permalink / raw)
  To: Dave Jones; +Cc: Ingo Molnar, linux-kernel, hpa

On Mon, 16 Dec 2002, Linus Torvalds wrote:
>
> (Modulo the missing syscall page I already mentioned and potential bugs
> in the code itself, of course ;)

Ok, I did the vsyscall page too, and tried to make it do the right thing
(but I didn't bother to test it on a non-SEP machine).

I'm pushing the changes out right now, but basically it boils down to the
fact that with these changes, user space can instead of doing an

	int $0x80

instruction for a system call just do a

	call 0xfffff000

instead. The vsyscall page will be set up to use sysenter if the CPU
supports it, and if it doesn't, it will just do the old "int $0x80"
instead (and it could use the AMD syscall instruction if it wants to).
User mode shouldn't know or care, the calling convention is the same as it
ever was.

On my P4 machine, a "getppid()" is 641 cycles with sysenter/sysexit, and
something like 1761 cycles with the old "int 0x80/iret" approach. That's a
noticeable improvement, but I have to say that I'm a bit disappointed in
the P4 still, it shouldn't be even that much.

As a comparison, an Athlon will do a full int/iret faster than a P4 does a
sysenter/sysexit. Pathetic. But it's better than it used to be.

Whatever. The code is extremely simple, and while I'm sure there are
things I've missed I'd love to hear if this works for anybody else. I'm
appending the (extremely stupid) test-program I used to test it.

The way I did this, things like system call restarting etc _should_ all
work fine even with "sysenter", simply by virtue of both sysenter and "int
0x80" being two-byte opcodes. But it might be interesting to verify that a
recompiled glibc (or even just a preload) really works with this on a
"whole system" testbed rather than just testing one system call (and not
even caring about the return value) a million times.

The good news is that the kernel part really looks pretty clean.

		Linus

---
#include <time.h>
#include <sys/time.h>
#include <asm/unistd.h>
#include <sys/stat.h>
#include <stdio.h>

#define rdtsc() ({ unsigned long a,d; asm volatile("rdtsc":"=a" (a), "=d" (d)); a; })

int main()
{
	int i, ret;
	unsigned long start, end;

	start = rdtsc();
	for (i = 0; i < 1000000; i++) {
		asm volatile("call 0xfffff000"
			:"=a" (ret)
			:"0" (__NR_getppid));
	}
	end = rdtsc();
	printf("%f cycles\n", (end - start) / 1000000.0);

	start = rdtsc();
	for (i = 0; i < 1000000; i++) {
		asm volatile("int $0x80"
			:"=a" (ret)
			:"0" (__NR_getppid));
	}
	end = rdtsc();
	printf("%f cycles\n", (end - start) / 1000000.0);
}

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  5:55           ` Linus Torvalds
@ 2002-12-17  6:09             ` Linus Torvalds
  2002-12-17  6:18               ` Linus Torvalds
                                 ` (3 more replies)
  2002-12-17  9:45             ` Andre Hedrick
                               ` (3 subsequent siblings)
  4 siblings, 4 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17  6:09 UTC (permalink / raw)
  To: Dave Jones; +Cc: Ingo Molnar, linux-kernel, hpa

On Mon, 16 Dec 2002, Linus Torvalds wrote:
>
> On my P4 machine, a "getppid()" is 641 cycles with sysenter/sysexit, and
> something like 1761 cycles with the old "int 0x80/iret" approach. That's a
> noticeable improvement, but I have to say that I'm a bit disappointed in
> the P4 still, it shouldn't be even that much.

On a slightly more real system call (gettimeofday - which actually matters
in real life) the difference is still visible, but less so - because the
system call itself takes more of the time, and the kernel entry overhead
is thus not as clear.

For gettimeofday(), the results on my P4 are:

	sysenter:	1280.425844 cycles
	int/iret:	2415.698224 cycles
			1135.272380 cycles diff
	factor 1.886637

ie sysenter makes that system call almost twice as fast.

It's not as good as a pure user-mode solution using tsc could be, but
we've seen the kinds of complexities that has with multi-CPU systems, and
they are so painful that I suspect the sysenter approach is a lot more
palatable even if it doesn't allow for the absolute best theoretical
numbers.

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:09             ` Linus Torvalds
@ 2002-12-17  6:18               ` Linus Torvalds
  2002-12-19 14:03                 ` Shuji YAMAMURA
  2002-12-17  6:19               ` GrandMasterLee
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17  6:18 UTC (permalink / raw)
  To: Dave Jones; +Cc: Ingo Molnar, linux-kernel, hpa



On Mon, 16 Dec 2002, Linus Torvalds wrote:
>
> For gettimeofday(), the results on my P4 are:
>
> 	sysenter:	1280.425844 cycles
> 	int/iret:	2415.698224 cycles
> 			1135.272380 cycles diff
> 	factor 1.886637
>
> ie sysenter makes that system call almost twice as fast.

Final comparison for the evening: a PIII looks very different, since the
system call overhead is much smaller to begin with. On a PIII, the above
ends up looking like

   gettimeofday() testing:
	sysenter:	561.697236 cycles
	int/iret:	686.170463 cycles
			124.473227 cycles diff
	factor 1.221602

ie the speedup is much less because the original int/iret numbers aren't
nearly as embarrassing as the P4 ones. It's still a win, though.

		Linus


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:09             ` Linus Torvalds
  2002-12-17  6:18               ` Linus Torvalds
@ 2002-12-17  6:19               ` GrandMasterLee
  2002-12-17  6:43               ` dean gaudet
  2002-12-17 19:12               ` H. Peter Anvin
  3 siblings, 0 replies; 176+ messages in thread
From: GrandMasterLee @ 2002-12-17  6:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

On Tue, 2002-12-17 at 00:09, Linus Torvalds wrote:
> On Mon, 16 Dec 2002, Linus Torvalds wrote:
> >
> > On my P4 machine, a "getppid()" is 641 cycles with sysenter/sysexit, and
> > something like 1761 cycles with the old "int 0x80/iret" approach. That's a
> > noticeable improvement, but I have to say that I'm a bit disappointed in
> > the P4 still, it shouldn't be even that much.
> 
> On a slightly more real system call (gettimeofday - which actually matters
> in real life) the difference is still visible, but less so - because the
> system call itself takes more of the time, and the kernel entry overhead
> is thus not as clear.
> 
> For gettimeofday(), the results on my P4 are:
> 
> 	sysenter:	1280.425844 cycles
> 	int/iret:	2415.698224 cycles
> 			1135.272380 cycles diff
> 	factor 1.886637
> 
> ie sysenter makes that system call almost twice as fast.


I'm curious, if this is one of the Dual P4's non-Xeon(say, 2.4 Ghz+?) or
if this is one of the Xeons? There seems to be some perceived disparity
between which performs how. I think the biggest difference on the Xeon's
is the stepping and the cache,(pipeline too?), but not too much else.

[...]
> 			Linus
> 


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:09             ` Linus Torvalds
  2002-12-17  6:18               ` Linus Torvalds
  2002-12-17  6:19               ` GrandMasterLee
@ 2002-12-17  6:43               ` dean gaudet
  2002-12-17 16:50                 ` Linus Torvalds
                                   ` (2 more replies)
  2002-12-17 19:12               ` H. Peter Anvin
  3 siblings, 3 replies; 176+ messages in thread
From: dean gaudet @ 2002-12-17  6:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

On Mon, 16 Dec 2002, Linus Torvalds wrote:

> It's not as good as a pure user-mode solution using tsc could be, but
> we've seen the kinds of complexities that has with multi-CPU systems, and
> they are so painful that I suspect the sysenter approach is a lot more
> palatable even if it doesn't allow for the absolute best theoretical
> numbers.

don't many of the multi-CPU problems with tsc go away because you've got a
per-cpu physical page for the vsyscall?

i.e. per-cpu tsc epoch and scaling can be set on that page.

the only trouble i know of is what happens when an interrupt occurs and
the task is rescheduled on another cpu... in theory you could test %eip
against 0xfffffxxx and "rollback" (or complete) any incomplete
gettimeofday call prior to saving a task's state.  but i bet that test is
undesirable on all interrupt paths.

-dean

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  5:55           ` Linus Torvalds
  2002-12-17  6:09             ` Linus Torvalds
@ 2002-12-17  9:45             ` Andre Hedrick
  2002-12-17 12:40               ` Dave Jones
  2002-12-17 15:12               ` Alan Cox
  2002-12-17 10:53             ` Ulrich Drepper
                               ` (2 subsequent siblings)
  4 siblings, 2 replies; 176+ messages in thread
From: Andre Hedrick @ 2002-12-17  9:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa


Linus,

Are you serious about moving of the banging we currently do on 0x80?
If so, I have a P4 development board with leds to monitor all the lower io
ports and can decode for you.

On Mon, 16 Dec 2002, Linus Torvalds wrote:

> 
> On Mon, 16 Dec 2002, Linus Torvalds wrote:
> >
> > (Modulo the missing syscall page I already mentioned and potential bugs
> > in the code itself, of course ;)
> 
> Ok, I did the vsyscall page too, and tried to make it do the right thing
> (but I didn't bother to test it on a non-SEP machine).
> 
> I'm pushing the changes out right now, but basically it boils down to the
> fact that with these changes, user space can instead of doing an
> 
> 	int $0x80
> 
> instruction for a system call just do a
> 
> 	call 0xfffff000
> 
> instead. The vsyscall page will be set up to use sysenter if the CPU
> supports it, and if it doesn't, it will just do the old "int $0x80"
> instead (and it could use the AMD syscall instruction if it wants to).
> User mode shouldn't know or care, the calling convention is the same as it
> ever was.
> 
> On my P4 machine, a "getppid()" is 641 cycles with sysenter/sysexit, and
> something like 1761 cycles with the old "int 0x80/iret" approach. That's a
> noticeable improvement, but I have to say that I'm a bit disappointed in
> the P4 still, it shouldn't be even that much.
> 
> As a comparison, an Athlon will do a full int/iret faster than a P4 does a
> sysenter/sysexit. Pathetic. But it's better than it used to be.
> 
> Whatever. The code is extremely simple, and while I'm sure there are
> things I've missed I'd love to hear if this works for anybody else. I'm
> appending the (extremely stupid) test-program I used to test it.
> 
> The way I did this, things like system call restarting etc _should_ all
> work fine even with "sysenter", simply by virtue of both sysenter and "int
> 0x80" being two-byte opcodes. But it might be interesting to verify that a
> recompiled glibc (or even just a preload) really works with this on a
> "whole system" testbed rather than just testing one system call (and not
> even caring about the return value) a million times.
> 
> The good news is that the kernel part really looks pretty clean.
> 
> 		Linus
> 
> ---
> #include <time.h>
> #include <sys/time.h>
> #include <asm/unistd.h>
> #include <sys/stat.h>
> #include <stdio.h>
> 
> #define rdtsc() ({ unsigned long a,d; asm volatile("rdtsc":"=a" (a), "=d" (d)); a; })
> 
> int main()
> {
> 	int i, ret;
> 	unsigned long start, end;
> 
> 	start = rdtsc();
> 	for (i = 0; i < 1000000; i++) {
> 		asm volatile("call 0xfffff000"
> 			:"=a" (ret)
> 			:"0" (__NR_getppid));
> 	}
> 	end = rdtsc();
> 	printf("%f cycles\n", (end - start) / 1000000.0);
> 
> 	start = rdtsc();
> 	for (i = 0; i < 1000000; i++) {
> 		asm volatile("int $0x80"
> 			:"=a" (ret)
> 			:"0" (__NR_getppid));
> 	}
> 	end = rdtsc();
> 	printf("%f cycles\n", (end - start) / 1000000.0);
> }
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  5:55           ` Linus Torvalds
  2002-12-17  6:09             ` Linus Torvalds
  2002-12-17  9:45             ` Andre Hedrick
@ 2002-12-17 10:53             ` Ulrich Drepper
  2002-12-17 11:17               ` dada1
                                 ` (2 more replies)
  2002-12-17 16:12             ` Hugh Dickins
  2002-12-18 23:51             ` Pavel Machek
  4 siblings, 3 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 10:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

Linus Torvalds wrote:

> Ok, I did the vsyscall page too, and tried to make it do the right thing
> (but I didn't bother to test it on a non-SEP machine).
> 
> But it might be interesting to verify that a
> recompiled glibc (or even just a preload) really works with this on a
> "whole system" testbed rather than just testing one system call (and not
> even caring about the return value) a million times.

I've created a modified glibc which uses the syscall code for almost
everything.  There are a few int $0x80 left here and there but mostly it
is a centralized change.

The result: all works as expected.  Nice.

On my test machine your little test program performs the syscalls on
roughly twice as fast (HT P4, pretty new).  Your numbers are perhaps for
the P4 Xeons.  Anyway, when measuring some more involved code (I ran my
thread benchmark) I got only about 3% performance increase.  It's doing
a fair amount of system calls.  But again, the good news is your code
survived even this stress test.

The problem with the current solution is the instruction set of the x86.
 In your test code you simply use call 0xfffff000 and it magically work.
 But this is only the case because your program is linked statically.

For the libc DSO I had to play some dirty tricks.  The x86 CPU has no
absolute call.  The variant with an immediate parameter is a relative
jump.  Only when jumping through a register or memory location is it
possible to jump to an absolute address.  To be clear, if I have

    call 0xfffff000

in a DSO which is loaded at address 0x80000000 the jumps ends at
0x7fffffff.  The problem is that the static linker doesn't know the load
address.  We could of course have the dynamic linker fix up the
addresses but this is plain stupid.  It would mean fixing up a lot of
places and making of those pages covered non-sharable.

Instead I've changed the syscall handling to effectve do

   pushl %ebp
   movl $0xfffff000, %ebp
   call *%ebp
   popl %ebp

An alternative is to store the address in a memory location.  But since
%ebx is used for a syscall parameter it is necessary to address the
memory relative to the stack pointer which would mean loading the stack
address with 0xfffff000 before making the syscall.  Not much better than
the code sequence above.

Anyway, it's still an improvement.  But now the question comes up: how
the ld.so detect that the kernel supports these syscalls and can use an
appropriate DSO?  This brings up again the idea of the read-only page(s)
mapped into all processes (you remember).

Anyway, it works nicely.  If you need more testing let me know.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 10:53             ` Ulrich Drepper
@ 2002-12-17 11:17               ` dada1
  2002-12-17 17:33                 ` Ulrich Drepper
  2002-12-17 17:06               ` Linus Torvalds
  2002-12-18 23:59               ` Pavel Machek
  2 siblings, 1 reply; 176+ messages in thread
From: dada1 @ 2002-12-17 11:17 UTC (permalink / raw)
  To: Ulrich Drepper, Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

> For the libc DSO I had to play some dirty tricks.  The x86 CPU has no
> absolute call.  The variant with an immediate parameter is a relative
> jump.  Only when jumping through a register or memory location is it
> possible to jump to an absolute address.  To be clear, if I have
>
>     call 0xfffff000
>
> in a DSO which is loaded at address 0x80000000 the jumps ends at
> 0x7fffffff.  The problem is that the static linker doesn't know the load
> address.  We could of course have the dynamic linker fix up the
> addresses but this is plain stupid.  It would mean fixing up a lot of
> places and making of those pages covered non-sharable.
>

You could have only one routine that would need a relocation / patch at
dynamic linking stage :

absolute_syscall:
    jmp  0xfffff000

Then all syscalls routine could use :

getpid:
    ...
    call absolute_syscall
    ...
instead of "call 0xfffff000"


If the kernel doesnt support the 0xfffff000 page, you could patch
absolute_syscall (if it resides in .data section) with :
    absolute_syscall:
            int 0x80
            ret
(3 bytes instead of 5 bytes)

See you


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  9:45             ` Andre Hedrick
@ 2002-12-17 12:40               ` Dave Jones
  2002-12-17 23:18                 ` Andre Hedrick
  2002-12-17 15:12               ` Alan Cox
  1 sibling, 1 reply; 176+ messages in thread
From: Dave Jones @ 2002-12-17 12:40 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Linus Torvalds, Ingo Molnar, linux-kernel, hpa

On Tue, Dec 17, 2002 at 01:45:52AM -0800, Andre Hedrick wrote:
 
 > Are you serious about moving of the banging we currently do on 0x80?
 > If so, I have a P4 development board with leds to monitor all the lower io
 > ports and can decode for you.

INT 0x80 != IO port 0x80

8-)

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  9:45             ` Andre Hedrick
  2002-12-17 12:40               ` Dave Jones
@ 2002-12-17 15:12               ` Alan Cox
  2002-12-18 23:55                 ` Pavel Machek
  1 sibling, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-17 15:12 UTC (permalink / raw)
  To: Andre Hedrick
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On Tue, 2002-12-17 at 09:45, Andre Hedrick wrote:
> 
> Linus,
> 
> Are you serious about moving of the banging we currently do on 0x80?
> If so, I have a P4 development board with leds to monitor all the lower io
> ports and can decode for you.

Different thing - int 0x80 syscall not i/o port 80. I've done I/O port
80 (its very easy), but requires we set up some udelay constants with an
initial safety value right at boot (which we should do - we udelay
before it is initialised)


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  5:55           ` Linus Torvalds
                               ` (2 preceding siblings ...)
  2002-12-17 10:53             ` Ulrich Drepper
@ 2002-12-17 16:12             ` Hugh Dickins
  2002-12-17 16:33               ` Richard B. Johnson
                                 ` (2 more replies)
  2002-12-18 23:51             ` Pavel Machek
  4 siblings, 3 replies; 176+ messages in thread
From: Hugh Dickins @ 2002-12-17 16:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, Ulrich Drepper, linux-kernel, hpa

On Mon, 16 Dec 2002, Linus Torvalds wrote:
> 
> Ok, I did the vsyscall page too, and tried to make it do the right thing
> (but I didn't bother to test it on a non-SEP machine).
> 
> I'm pushing the changes out right now, but basically it boils down to the
> fact that with these changes, user space can instead of doing an
> 
> 	int $0x80
> 
> instruction for a system call just do a
> 
> 	call 0xfffff000

I thought that last page was intentionally left invalid?

So that, for example, *(char *)MAP_FAILED will give SIGSEGV;
whereas now I can read a 0 there (and perhaps you should be
using get_zeroed_page rather than __get_free_page?).

I cannot name anything which relies on that page being invalid,
but think it would be safer to keep that it way; though I guess
more compatibility pain to use the next page down (or could
seg lim be used? I forget the granularity restrictions).

Hugh

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 16:12             ` Hugh Dickins
@ 2002-12-17 16:33               ` Richard B. Johnson
  2002-12-17 17:47                 ` Linus Torvalds
  2002-12-17 16:54               ` Hugh Dickins
  2002-12-17 17:07               ` Linus Torvalds
  2 siblings, 1 reply; 176+ messages in thread
From: Richard B. Johnson @ 2002-12-17 16:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar, Ulrich Drepper,
	linux-kernel, hpa

On Tue, 17 Dec 2002, Hugh Dickins wrote:

> On Mon, 16 Dec 2002, Linus Torvalds wrote:
> > 
> > Ok, I did the vsyscall page too, and tried to make it do the right thing
> > (but I didn't bother to test it on a non-SEP machine).
> > 
> > I'm pushing the changes out right now, but basically it boils down to the
> > fact that with these changes, user space can instead of doing an
> > 
> > 	int $0x80
> > 
> > instruction for a system call just do a
> > 
> > 	call 0xfffff000
> 

So you are going to do a system-call off a trap instead of an interrupt.
The difference in performance should be practically nothing. There is
also going to be additional overhead in returning from the trap since
the IP and caller's segment was not saved by the initial trap. I don't
see how you can possibly claim any improvement in performance. Further,
it doesn't make any sense. We don't call physical addresses from a
virtual address anyway, so there will be additional translation that
must take some time. With the current page-table translation you
would need to put your system-call entry point at 0xfffff000 - 0xc0000000
= 0x3ffff000 and there might not even be any RAM there. This guarantees
that you are going to have to set up a special PTE, resulting in
additional overhead.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:43               ` dean gaudet
@ 2002-12-17 16:50                 ` Linus Torvalds
  2002-12-17 19:11                 ` H. Peter Anvin
  2002-12-18 23:53                 ` Pavel Machek
  2 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 16:50 UTC (permalink / raw)
  To: dean gaudet; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

On Mon, 16 Dec 2002, dean gaudet wrote:
>
> don't many of the multi-CPU problems with tsc go away because you've got a
> per-cpu physical page for the vsyscall?

No.

The per-cpu page is _inside_ the kernel, and is only pointed at by the
SYSENTER_EIP_MSR, and not accessible from user space. It's not virtually
mapped to the same address at all.

The userspace vsyscall page is shared on the whole system, and has to be
so, because anything else is a disaster from a TLB standpoint (two threads
running on different CPU's have the same page tables, so it's basically
impossible to sanely do per-cpu TLB mappings with a hw-filled TLB like the
x86).

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 16:12             ` Hugh Dickins
  2002-12-17 16:33               ` Richard B. Johnson
@ 2002-12-17 16:54               ` Hugh Dickins
  2002-12-17 17:07               ` Linus Torvalds
  2 siblings, 0 replies; 176+ messages in thread
From: Hugh Dickins @ 2002-12-17 16:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, Ulrich Drepper, linux-kernel, hpa

On Tue, 17 Dec 2002, Hugh Dickins wrote:
> whereas now I can read a 0 there (and perhaps you should be
> using get_zeroed_page rather than __get_free_page?).

Sorry, yes, you are using get_zeroed_page for the one that needs it.

Hugh


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 10:53             ` Ulrich Drepper
  2002-12-17 11:17               ` dada1
@ 2002-12-17 17:06               ` Linus Torvalds
  2002-12-17 17:55                 ` Ulrich Drepper
  2002-12-18 23:59               ` Pavel Machek
  2 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 17:06 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

On Tue, 17 Dec 2002, Ulrich Drepper wrote:
>
> The problem with the current solution is the instruction set of the x86.
>  In your test code you simply use call 0xfffff000 and it magically work.
>  But this is only the case because your program is linked statically.

Yeah, it's not very convenient. I didn't find any real alternatives,
though, and you can always just put 0xfffff000 in memory or registers and
jump to that. In fact, I suspect that if you actually want to use it in
glibc, then at least in the short term that's what you need to do anyway,
sinc eyou probably don't want to have a glibc that only works with very
recent kernels.

So I was actually assuming that the glibc code would look more like
something like this:

	old_fashioned:
		int $0x80
		ret

	unsigned long system_call_ptr = old_fashioned;

	/* .. startup .. */
	if (kernel_version > xxx)
		system_call_ptr = 0xfffff000;

	/* ... usage ... */
		call *system_call_ptr;

since you cannot depend on the 0xfffff000 on older kernels anyway..

> Instead I've changed the syscall handling to effectve do
>
>    pushl %ebp
>    movl $0xfffff000, %ebp
>    call *%ebp
>    popl %ebp

The above will work, but then you'd have limited yourself to a binary that
_only_ works on new kernels. So I'd suggest the memory indirection
instead.

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 16:12             ` Hugh Dickins
  2002-12-17 16:33               ` Richard B. Johnson
  2002-12-17 16:54               ` Hugh Dickins
@ 2002-12-17 17:07               ` Linus Torvalds
  2002-12-17 17:19                 ` Matti Aarnio
  2 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 17:07 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, Ingo Molnar, Ulrich Drepper, linux-kernel, hpa



On Tue, 17 Dec 2002, Hugh Dickins wrote:
>
> I thought that last page was intentionally left invalid?

It was. But I thought it made sense to use, as it's the only really
"special" page.

But yes, we should decide on this quickly - it's easy to change right now,
but..

		Linus


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:07               ` Linus Torvalds
@ 2002-12-17 17:19                 ` Matti Aarnio
  2002-12-17 17:55                   ` Linus Torvalds
  0 siblings, 1 reply; 176+ messages in thread
From: Matti Aarnio @ 2002-12-17 17:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Ingo Molnar, Ulrich Drepper,
	linux-kernel, hpa

On Tue, Dec 17, 2002 at 09:07:21AM -0800, Linus Torvalds wrote:
> On Tue, 17 Dec 2002, Hugh Dickins wrote:
> > I thought that last page was intentionally left invalid?
> 
> It was. But I thought it made sense to use, as it's the only really
> "special" page.

  In couple of occasions I have caught myself from pre-decrementing
  a char pointer which "just happened" to be NULL.

  Please keep the last page, as well as a few of the first pages as
  NULL-pointer poisons.

> But yes, we should decide on this quickly - it's easy to change right now,
> but..
> 
> 		Linus

/Matti Aarnio

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 11:17               ` dada1
@ 2002-12-17 17:33                 ` Ulrich Drepper
  0 siblings, 0 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 17:33 UTC (permalink / raw)
  To: dada1; +Cc: Linus Torvalds, Dave Jones, Ingo Molnar, linux-kernel, hpa

dada1 wrote:

> You could have only one routine that would need a relocation / patch at
> dynamic linking stage :

That's a horrible way to deal with this in DSOs.  THere is no writable
and executable segment and it would have to be created which means
enormous additional setup costs and higher memory requirement.  I'm not
going to use any scode modification.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 16:33               ` Richard B. Johnson
@ 2002-12-17 17:47                 ` Linus Torvalds
  0 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 17:47 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Hugh Dickins, Dave Jones, Ingo Molnar, Ulrich Drepper,
	linux-kernel, hpa




On Tue, 17 Dec 2002, Richard B. Johnson wrote:
> On Mon, 16 Dec 2002, Linus Torvalds wrote:
> >
> > instruction for a system call just do a
> >
> > 	call 0xfffff000
>
> So you are going to do a system-call off a trap instead of an interrupt.

No no. The kernel maps a magic read-only page at 0xfffff000, and there's
no trap involved. The code at that address is kernel-generated for the CPU
in question, and it will do whatever is most convenient.

No traps. They're slow as hell.

		Linus


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:06               ` Linus Torvalds
@ 2002-12-17 17:55                 ` Ulrich Drepper
  2002-12-17 18:01                   ` Linus Torvalds
  2002-12-17 19:23                   ` Alan Cox
  0 siblings, 2 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 17:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

Linus Torvalds wrote:

> Yeah, it's not very convenient. I didn't find any real alternatives,
> though, and you can always just put 0xfffff000 in memory or registers and
> jump to that.

Putting the value into memory myself is not possible.  In a DSO I have
to address memory indirectly.  But all registers (except %ebp, and maybe
it'll be used some day) are used at the time of the call.

But there is a way: if I'm using

   #define makesyscall(name) \
        movl $__NR_##name, $eax; \
        call 0xfffff000-__NR_##name($eax)

and you'd put at address 0xfffff000 the address of the entry point the
wrappers wouldn't have any problems finding it.

> In fact, I suspect that if you actually want to use it in
> glibc, then at least in the short term that's what you need to do anyway,
> sinc eyou probably don't want to have a glibc that only works with very
> recent kernels.

That's a compilation option.  We might want to do dynamic testing and
yes, a simple pointer indirection is adequate.

But still, the problem is detecting the capable kernels.  You have said
not long ago that comparing kernel versions is wrong.  And I agree.  It
doesn't cover backports and nothing.  But there is a lack of an alternative.

If you don't like the process-global page thingy (anymore) the
alternative would be a sysconf() system call.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:19                 ` Matti Aarnio
@ 2002-12-17 17:55                   ` Linus Torvalds
  2002-12-17 18:24                     ` Linus Torvalds
                                       ` (3 more replies)
  0 siblings, 4 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 17:55 UTC (permalink / raw)
  To: Ulrich Drepper, Matti Aarnio
  Cc: Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel, hpa

On Tue, 17 Dec 2002, Matti Aarnio wrote:
>
> On Tue, Dec 17, 2002 at 09:07:21AM -0800, Linus Torvalds wrote:
> > On Tue, 17 Dec 2002, Hugh Dickins wrote:
> > > I thought that last page was intentionally left invalid?
> >
> > It was. But I thought it made sense to use, as it's the only really
> > "special" page.
>
>   In couple of occasions I have caught myself from pre-decrementing
>   a char pointer which "just happened" to be NULL.
>
>   Please keep the last page, as well as a few of the first pages as
>   NULL-pointer poisons.

I think I have a good clean solution to this, that not only avoids the
need for any hard-coded address _at_all_, but also solves Uli's problem
quite cleanly.

Uli, how about I just add one ne warchitecture-specific ELF AT flag, which
is the "base of sysinfo page". Right now that page is all zeroes except
for the system call trampoline at the beginning, but we might want to add
other system information to the page in the future (it is readable, after
all).

So we'd have an AT_SYSINFO entry, that with the current implementation
would just get the value 0xfffff000. And then the glibc startup code could
easily be backwards compatible with the suggestion I had in the previous
email. Since we basically want to do an indirect jump anyway (because of
the lack of absolute jumps in the instruction set), this looks like the
natural way to do it.

That also allows the kernel to move around the SYSINFO page at will, and
even makes it possible to avoid it altogether (ie this will solve the
inevitable problems with UML - UML just wouldn't set AT_SYSINFO, so user
level just wouldn't even _try_ to use it).

With that, there's nothing "special" about the vsyscall page, and I'd just
go back to having the very last page unmapped (and have the vsyscall page
in some other fixmap location that might even depend on kernel
configuration).

Whaddaya think?

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:55                 ` Ulrich Drepper
@ 2002-12-17 18:01                   ` Linus Torvalds
  2002-12-17 19:23                   ` Alan Cox
  1 sibling, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 18:01 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

On Tue, 17 Dec 2002, Ulrich Drepper wrote:
>
> If you don't like the process-global page thingy (anymore) the
> alternative would be a sysconf() system call.

Well, we do _have_ the process-global thingy now - it's the vsyscall page.
It's not settable by the process, but it's useful for information.
Together with an elf AT_ entry pointing to it, it's certainly sufficient
for this usage, and it should also be sufficient for "future use" (ie we
can add future system information in the page later: bitmaps of features
at offset "start + 128" for example).

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:55                   ` Linus Torvalds
@ 2002-12-17 18:24                     ` Linus Torvalds
  2002-12-17 18:33                       ` Ulrich Drepper
  2002-12-17 18:30                     ` Ulrich Drepper
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 18:24 UTC (permalink / raw)
  To: Ulrich Drepper, Matti Aarnio
  Cc: Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel, hpa



On Tue, 17 Dec 2002, Linus Torvalds wrote:
>
> Uli, how about I just add one ne warchitecture-specific ELF AT flag, which
> is the "base of sysinfo page". Right now that page is all zeroes except
> for the system call trampoline at the beginning, but we might want to add
> other system information to the page in the future (it is readable, after
> all).

Here's the suggested (totally untested as of yet) patch:

 - it moves the system call page to 0xffffe000 instead, leaving an
   unmapped page at the very top of the address space. So trying to
   dereference -1 will cause a SIGSEGV.

 - it adds the AT_SYSINFO elf entry on x86 that points to the system page.

Thus glibc startup should be able to just do

	ptr = default_int80_syscall;
	if (AT_SYSINFO entry found)
		ptr = value(AT_SYSINFO)

and then you can just do a

	call *ptr

to do a system call regardless of kernel version. This also allows the
kernel to later move the page around as it sees fit.

The advantage of using an AT_SYSINFO entry is that

 - no new system call needed to figure anything out
 - backwards compatibility (ie old kernels automatically detected)
 - I think glibc already parses the AT entries at startup anyway

so it _looks_ like a perfect way to do this.

		Linus

----
===== arch/i386/kernel/entry.S 1.42 vs edited =====
--- 1.42/arch/i386/kernel/entry.S	Mon Dec 16 21:39:04 2002
+++ edited/arch/i386/kernel/entry.S	Tue Dec 17 10:13:16 2002
@@ -232,7 +232,7 @@
 #endif

 /* Points to after the "sysenter" instruction in the vsyscall page */
-#define SYSENTER_RETURN 0xfffff007
+#define SYSENTER_RETURN 0xffffe007

 	# sysenter call handler stub
 	ALIGN
===== include/asm-i386/elf.h 1.3 vs edited =====
--- 1.3/include/asm-i386/elf.h	Thu Oct 17 00:48:55 2002
+++ edited/include/asm-i386/elf.h	Tue Dec 17 10:12:58 2002
@@ -100,6 +100,12 @@

 #define ELF_PLATFORM  (system_utsname.machine)

+/*
+ * Architecture-neutral AT_ values in 0-17, leave some room
+ * for more of them, start the x86-specific ones at 32.
+ */
+#define AT_SYSINFO	32
+
 #ifdef __KERNEL__
 #define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)

@@ -115,6 +121,11 @@
 extern void dump_smp_unlazy_fpu(void);
 #define ELF_CORE_SYNC dump_smp_unlazy_fpu
 #endif
+
+#define ARCH_DLINFO					\
+do {							\
+		NEW_AUX_ENT(AT_SYSINFO, 0xffffe000);	\
+} while (0)

 #endif

===== include/asm-i386/fixmap.h 1.9 vs edited =====
--- 1.9/include/asm-i386/fixmap.h	Mon Dec 16 21:39:04 2002
+++ edited/include/asm-i386/fixmap.h	Tue Dec 17 10:11:31 2002
@@ -42,8 +42,8 @@
  * task switches.
  */
 enum fixed_addresses {
-	FIX_VSYSCALL,
 	FIX_HOLE,
+	FIX_VSYSCALL,
 #ifdef CONFIG_X86_LOCAL_APIC
 	FIX_APIC_BASE,	/* local (CPU) APIC) -- required for SMP or not */
 #endif


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:55                   ` Linus Torvalds
  2002-12-17 18:24                     ` Linus Torvalds
@ 2002-12-17 18:30                     ` Ulrich Drepper
  2002-12-17 19:04                       ` Linus Torvalds
  2002-12-17 19:26                       ` Alan Cox
  2002-12-17 18:39                     ` Jeff Dike
  2002-12-18  5:34                     ` Jeremy Fitzhardinge
  3 siblings, 2 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 18:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

Linus Torvalds wrote:

> Uli, how about I just add one ne warchitecture-specific ELF AT flag, which
> is the "base of sysinfo page". Right now that page is all zeroes except
> for the system call trampoline at the beginning, but we might want to add
> other system information to the page in the future (it is readable, after
> all).
> 
> So we'd have an AT_SYSINFO entry, that with the current implementation
> would just get the value 0xfffff000. And then the glibc startup code could
> easily be backwards compatible with the suggestion I had in the previous
> email. Since we basically want to do an indirect jump anyway (because of
> the lack of absolute jumps in the instruction set), this looks like the
> natural way to do it.

Yes, I definitely think that a new AT_* value is at order and it's a
nice way to determine the address.

But it will eliminate the problem.  Remember: the x86 (unlike x86-64)
has no PC-relative data addressing mode.  I.e., in a DSO to find a
memory location with an address I need a base register which isn't
available anymore at the time the call is made.

You have to assume that all the registers, including %ebp, are used at
the time of the call.  This makes it impossible to find a memory
location in a DSO without text relocation (i.e., making parts of the
code writable, at least for a moment).  This is time consuming and not
resource friendly.

There is one way around this and maybe it is what should be done: we
have the TLS memory available.  And since this vsyscall stuff gets
introduced after the TLS is functional it is a possibility.

The address received in AT_SYSINFO is stored in a word in the TCB
(thread control block).  Then the code to call through this is a variant
of what I posted earlier

  movl $__NR_##name, %eax
  call *%gs:-__NR_##name+TCB_OFFSET(%eax)

In case the vsyscall stuff is not available we jump to something like

   int $0x80
   ret

The address of this code is the default value of the TCB word.

There is another thing we might want to consider.  The above code jump
to 0xfffff000 or whatever adddres is specified.  I.e., the
demultiplexing happens in the kernel.  Do we want to do this at
userlevel?  This would allow almost no-cost determination of those
syscalls which can be handled at userlevel (getpid, getppid, ...).

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:24                     ` Linus Torvalds
@ 2002-12-17 18:33                       ` Ulrich Drepper
  0 siblings, 0 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 18:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

Linus Torvalds wrote:

> Thus glibc startup should be able to just do
> 
> 	ptr = default_int80_syscall;
> 	if (AT_SYSINFO entry found)
> 		ptr = value(AT_SYSINFO)
> 
> and then you can just do a
> 
> 	call *ptr

This won't work as I just wrote but something similar I can make work.
I think the use of the TCB is the best thing to do.  Replicating the
info in all thread new thread's TCBs doesn't cost much and with NPTL
it's even lower cost since we reuse old TCBs.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:55                   ` Linus Torvalds
  2002-12-17 18:24                     ` Linus Torvalds
  2002-12-17 18:30                     ` Ulrich Drepper
@ 2002-12-17 18:39                     ` Jeff Dike
  2002-12-17 19:05                       ` Linus Torvalds
  2002-12-18  5:34                     ` Jeremy Fitzhardinge
  3 siblings, 1 reply; 176+ messages in thread
From: Jeff Dike @ 2002-12-17 18:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel, hpa

torvalds@transmeta.com said:
> That also allows the kernel to move around the SYSINFO page at will,
> and even makes it possible to avoid it altogether (ie this will solve
> the inevitable problems with UML - UML just wouldn't set AT_SYSINFO,
> so user level just wouldn't even _try_ to use it). 

Why shouldn't I just set it to where UML provides its own SYSINFO page?

				Jeff


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:23                   ` Alan Cox
@ 2002-12-17 18:48                     ` Ulrich Drepper
  2002-12-17 19:19                       ` H. Peter Anvin
  2002-12-17 19:44                       ` Alan Cox
  2002-12-17 18:49                     ` Linus Torvalds
  1 sibling, 2 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 18:48 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

Alan Cox wrote:

> Is there any reason you can't just keep the linker out of the entire
> mess by generating
> 
> 	.byte whatever
> 	.dword 0xFFFF0000
> 
> instead of call ?

There is no such instruction.  Unless you know about some secret
undocumented opcode...

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:23                   ` Alan Cox
  2002-12-17 18:48                     ` Ulrich Drepper
@ 2002-12-17 18:49                     ` Linus Torvalds
  2002-12-17 19:09                       ` Ross Biro
  2002-12-17 21:34                       ` Benjamin LaHaise
  1 sibling, 2 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 18:49 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ulrich Drepper, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On 17 Dec 2002, Alan Cox wrote:
>
> Is there any reason you can't just keep the linker out of the entire
> mess by generating
>
> 	.byte whatever
> 	.dword 0xFFFF0000
>
> instead of call ?

Alan, the problem is that there _is_ no such instruction as a "call
absolute".

There is only a "call relative" or "call indirect-absolute". So you either
have to indirect through memory or a register, or you have to fix up the
call at link-time.

Yeah, I know it sounds strange, but it makes sense. Absolute calls are
actually very unusual, and using relative calls is _usually_ the right
thing to do. It's only in cases like this that we really want to call a
specific address.

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:26                       ` Alan Cox
@ 2002-12-17 18:57                         ` Ulrich Drepper
  2002-12-17 19:10                           ` Linus Torvalds
  2002-12-17 21:38                           ` Benjamin LaHaise
  0 siblings, 2 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 18:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List, hpa

Alan Cox wrote:

> getppid changes and so I think has to go to kernel (unless we go around
> patching user pages on process exit [ick]).

But this is exactly what I expect to happen.  If you want to implement
gettimeofday() at user-level you need to modify the page.  Some of the
information the kernel has to keep for the thread group can be stored in
this place and eventually be used by some uerlevel code  executed by
jumping to 0xfffff000 or whatever the address is.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:30                     ` Ulrich Drepper
@ 2002-12-17 19:04                       ` Linus Torvalds
  2002-12-17 19:19                         ` Ulrich Drepper
  2002-12-17 19:28                         ` Linus Torvalds
  2002-12-17 19:26                       ` Alan Cox
  1 sibling, 2 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:04 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

On Tue, 17 Dec 2002, Ulrich Drepper wrote:
>
> But it will eliminate the problem.  Remember: the x86 (unlike x86-64)
> has no PC-relative data addressing mode.  I.e., in a DSO to find a
> memory location with an address I need a base register which isn't
> available anymore at the time the call is made.

Actually, I see a more serious problem with the current "syscall"
interface: it doesn't allow six-argument system calls AT ALL, since it
needed %ebp to keep the stack pointer.

So a six-argument system call _has_ to use "int $0x80" anyway, which to
some degree simplifies your problem: you can only use the indirect call
approach for things where %ebp will be free for use anyway.

So then you can use %ebp as the indirection, and the code will look
something like

games, since that is guaranteed not to be ever used by a system call (it
wasn't guaranteed before, but since the sysenter really needs something to
hold the stack pointer I made %ebp do that, so there's no way we can ever
use %ebp for system calls on x86).

So you _can_ do something like this:

	syscall_with_5_args:
		pushl %ebx
		pushl %esi
		pushl %edi
		pushl %ebp
		movl syscall_ptr + GOT,%ebp	// uses DSO ptr in %ebx or whatever
		movl $__NR_xxxxxx,%eax
		movl 20(%esp),%ebx
		movl 24(%esp),%ecx
		movl 28(%esp),%edx
		movl 32(%esp),%esi
		movl 36(%esp),%edi
		call *%ebp
		.. test for errno if needed ..
		popl %ebp
		popl %edi
		popl %esi
		popl %ebx
		ret

> You have to assume that all the registers, including %ebp, are used at
> the time of the call.

See why this isn't possible right now anyway.

Hmm.. Which system calls have all six parameters? I'll have to see if I
can find any way to make those use the new interface too.

In the meantime, I do agree with you that the TLS approach should work
too, and might be better. It will allow all six arguments to be used if we
just find a good calling conventions (too bad sysenter is such a pig of an
instruction, it's really not very well designed since it loses
information).

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:39                     ` Jeff Dike
@ 2002-12-17 19:05                       ` Linus Torvalds
  0 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:05 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel, hpa



On Tue, 17 Dec 2002, Jeff Dike wrote:
>
> torvalds@transmeta.com said:
> > That also allows the kernel to move around the SYSINFO page at will,
> > and even makes it possible to avoid it altogether (ie this will solve
> > the inevitable problems with UML - UML just wouldn't set AT_SYSINFO,
> > so user level just wouldn't even _try_ to use it).
>
> Why shouldn't I just set it to where UML provides its own SYSINFO page?

Sure, that works fine too.

		Linus


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:49                     ` Linus Torvalds
@ 2002-12-17 19:09                       ` Ross Biro
  2002-12-17 21:34                       ` Benjamin LaHaise
  1 sibling, 0 replies; 176+ messages in thread
From: Ross Biro @ 2002-12-17 19:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Ulrich Drepper, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

It doesn't make sense to me to use a specially formatted page forced 
into user space to tell libraries how to do system calls.  Perhaps each 
executable personality in the kernel should export a special shared 
library in it's own native format that contains the necessary 
information.  That way we don't have to worry as much about code or 
values changing sizes or locations.

We would have the chicken/egg problem with how the special shared 
library gets loaded in the first place.  For that we either support a 
legacy syscall method (i.e. int 0x80 on x86) which should only be used 
by ld.so or the equivalent or magically force the library into user 
space at a known address.

    Ross

Linus Torvalds wrote:

>On 17 Dec 2002, Alan Cox wrote:
>  
>
>>Is there any reason you can't just keep the linker out of the entire
>>mess by generating
>>
>>	.byte whatever
>>	.dword 0xFFFF0000
>>
>>instead of call ?
>>    
>>
>
>Alan, the problem is that there _is_ no such instruction as a "call
>absolute".
>
>There is only a "call relative" or "call indirect-absolute". So you either
>have to indirect through memory or a register, or you have to fix up the
>call at link-time.
>
>Yeah, I know it sounds strange, but it makes sense. Absolute calls are
>actually very unusual, and using relative calls is _usually_ the right
>thing to do. It's only in cases like this that we really want to call a
>specific address.
>
>			Linus
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>  
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:57                         ` Ulrich Drepper
@ 2002-12-17 19:10                           ` Linus Torvalds
  2002-12-17 19:21                             ` H. Peter Anvin
                                               ` (2 more replies)
  2002-12-17 21:38                           ` Benjamin LaHaise
  1 sibling, 3 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:10 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On Tue, 17 Dec 2002, Ulrich Drepper wrote:
>
> But this is exactly what I expect to happen.  If you want to implement
> gettimeofday() at user-level you need to modify the page.

Note that I really don't think we ever want to do the user-level
gettimeofday(). The complexity just argues against it, it's better to try
to make system calls be cheap enough that you really don't care.

sysenter helps a bit there.

If we'd need to modify the page, we couldn't share one page between all
processes, and we couldn't make it global in the TLB. So modifying the
info page is something we should avoid at all cost - it's not totally
unlikely that the overheads implied by per-thread pages would drown out
the wins from trying to be clever.

The advantage of the current static fixmap is that it's _extremely_
streamlined. The only overhead is literally the system entry itself, which
while a bit too high on a P4 is not that bad in general (and hopefully
Intel will fix the stupidities that cause the P4 to be slow at kernel
entry. Somebody already mentioned that apparently the newer P4 cores are
actually faster at system calls than mine is).

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:43               ` dean gaudet
  2002-12-17 16:50                 ` Linus Torvalds
@ 2002-12-17 19:11                 ` H. Peter Anvin
  2002-12-17 21:39                   ` Benjamin LaHaise
  2002-12-18 23:53                 ` Pavel Machek
  2 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:11 UTC (permalink / raw)
  To: dean gaudet; +Cc: Linus Torvalds, Dave Jones, Ingo Molnar, linux-kernel

dean gaudet wrote:
> On Mon, 16 Dec 2002, Linus Torvalds wrote:
> 
>>It's not as good as a pure user-mode solution using tsc could be, but
>>we've seen the kinds of complexities that has with multi-CPU systems, and
>>they are so painful that I suspect the sysenter approach is a lot more
>>palatable even if it doesn't allow for the absolute best theoretical
>>numbers.
> 
> don't many of the multi-CPU problems with tsc go away because you've got a
> per-cpu physical page for the vsyscall?
> 
> i.e. per-cpu tsc epoch and scaling can be set on that page.
> 
> the only trouble i know of is what happens when an interrupt occurs and
> the task is rescheduled on another cpu... in theory you could test %eip
> against 0xfffffxxx and "rollback" (or complete) any incomplete
> gettimeofday call prior to saving a task's state.  but i bet that test is
> undesirable on all interrupt paths.
> 

Exactly.  This is a real problem.

	-hpa


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:09             ` Linus Torvalds
                                 ` (2 preceding siblings ...)
  2002-12-17  6:43               ` dean gaudet
@ 2002-12-17 19:12               ` H. Peter Anvin
  2002-12-17 19:26                 ` Martin J. Bligh
  2002-12-17 20:49                 ` Alan Cox
  3 siblings, 2 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel

Linus Torvalds wrote:
> 
> It's not as good as a pure user-mode solution using tsc could be, but
> we've seen the kinds of complexities that has with multi-CPU systems, and
> they are so painful that I suspect the sysenter approach is a lot more
> palatable even if it doesn't allow for the absolute best theoretical
> numbers.
> 

The complexity only applies to nonsynchronized TSCs though, I would
assume.  I believe x86-64 uses a vsyscall using the TSC when it can
provide synchronized TSCs, and if it can't it puts a normal system call
inside the vsyscall in question.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:48                     ` Ulrich Drepper
@ 2002-12-17 19:19                       ` H. Peter Anvin
  2002-12-17 19:44                       ` Alan Cox
  1 sibling, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:19 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Alan Cox, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

Ulrich Drepper wrote:
> Alan Cox wrote:
> 
> 
>>Is there any reason you can't just keep the linker out of the entire
>>mess by generating
>>
>>	.byte whatever
>>	.dword 0xFFFF0000
>>
>>instead of call ?
> 
> 
> There is no such instruction.  Unless you know about some secret
> undocumented opcode...
> 

Well, there is lcall $0xffff0000, $USER_CS... (no, I'm most definitely
*not* suggesting it.)

	-hpa


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:04                       ` Linus Torvalds
@ 2002-12-17 19:19                         ` Ulrich Drepper
  2002-12-17 19:28                         ` Linus Torvalds
  1 sibling, 0 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 19:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

Linus Torvalds wrote:

> In the meantime, I do agree with you that the TLS approach should work
> too, and might be better. It will allow all six arguments to be used if we
> just find a good calling conventions 

If you push out the AT_* patch I'll hack the glibc bits (probably the
TLS variant).  Won't take too  long, you'll get results this afternoon.

What about AMD's instruction?  Is it as flawed as sysenter?  If not and
%ebp is available I really should use the TLS method.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:10                           ` Linus Torvalds
@ 2002-12-17 19:21                             ` H. Peter Anvin
  2002-12-17 19:37                               ` Linus Torvalds
  2002-12-17 19:47                             ` Dave Jones
  2002-12-18 12:57                             ` Rogier Wolff
  2 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List

Linus Torvalds wrote:
> 
> On Tue, 17 Dec 2002, Ulrich Drepper wrote:
> 
>>But this is exactly what I expect to happen.  If you want to implement
>>gettimeofday() at user-level you need to modify the page.
> 
> Note that I really don't think we ever want to do the user-level
> gettimeofday(). The complexity just argues against it, it's better to try
> to make system calls be cheap enough that you really don't care.
> 

Let's see... it works fine on UP and on *most* SMP, and on the ones
where it doesn't work you just fill in a system call into the vsyscall
slot.  It just means that gettimeofday() needs a different vsyscall slot.

	-hpa


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:55                 ` Ulrich Drepper
  2002-12-17 18:01                   ` Linus Torvalds
@ 2002-12-17 19:23                   ` Alan Cox
  2002-12-17 18:48                     ` Ulrich Drepper
  2002-12-17 18:49                     ` Linus Torvalds
  1 sibling, 2 replies; 176+ messages in thread
From: Alan Cox @ 2002-12-17 19:23 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On Tue, 2002-12-17 at 17:55, Ulrich Drepper wrote:
> But there is a way: if I'm using
> 
>    #define makesyscall(name) \
>         movl $__NR_##name, $eax; \
>         call 0xfffff000-__NR_##name($eax)
> 
> and you'd put at address 0xfffff000 the address of the entry point the
> wrappers wouldn't have any problems finding it.

Is there any reason you can't just keep the linker out of the entire
mess by generating

	.byte whatever
	.dword 0xFFFF0000

instead of call ?



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:30                     ` Ulrich Drepper
  2002-12-17 19:04                       ` Linus Torvalds
@ 2002-12-17 19:26                       ` Alan Cox
  2002-12-17 18:57                         ` Ulrich Drepper
  1 sibling, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-17 19:26 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Linus Torvalds, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List, hpa

On Tue, 2002-12-17 at 18:30, Ulrich Drepper wrote:
> demultiplexing happens in the kernel.  Do we want to do this at
> userlevel?  This would allow almost no-cost determination of those
> syscalls which can be handled at userlevel (getpid, getppid, ...).

getppid changes and so I think has to go to kernel (unless we go around
patching user pages on process exit [ick]). getpid can already be done
by reading it once at startup time and caching the data.



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:12               ` H. Peter Anvin
@ 2002-12-17 19:26                 ` Martin J. Bligh
  2002-12-17 20:51                   ` Alan Cox
  2002-12-17 20:49                 ` Alan Cox
  1 sibling, 1 reply; 176+ messages in thread
From: Martin J. Bligh @ 2002-12-17 19:26 UTC (permalink / raw)
  To: H. Peter Anvin, Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel

>> It's not as good as a pure user-mode solution using tsc could be, but
>> we've seen the kinds of complexities that has with multi-CPU systems, and
>> they are so painful that I suspect the sysenter approach is a lot more
>> palatable even if it doesn't allow for the absolute best theoretical
>> numbers.
> 
> The complexity only applies to nonsynchronized TSCs though, I would
> assume.  I believe x86-64 uses a vsyscall using the TSC when it can
> provide synchronized TSCs, and if it can't it puts a normal system call
> inside the vsyscall in question.

You can't use the TSC to do gettimeofday on boxes where they aren't 
syncronised anyway though. That's nothing to do with vsyscalls, you just
need a different time source (eg the legacy stuff or HPET/cyclone).

M.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:04                       ` Linus Torvalds
  2002-12-17 19:19                         ` Ulrich Drepper
@ 2002-12-17 19:28                         ` Linus Torvalds
  2002-12-17 19:32                           ` H. Peter Anvin
  2002-12-17 19:53                           ` Ulrich Drepper
  1 sibling, 2 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:28 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

On Tue, 17 Dec 2002, Linus Torvalds wrote:
>
> Hmm.. Which system calls have all six parameters? I'll have to see if I
> can find any way to make those use the new interface too.

The only ones I found from a quick grep are
 - sys_recvfrom
 - sys_sendto
 - sys_mmap2()
 - sys_ipc()

and none of them are of a kind where the system call entry itself is the
biggest performance issue (and sys_ipc() is deprecated anyway), so it's
probably acceptable to just use the old interface for them.

One other alternative is to change the calling convention for the
new-style system call, and not have arguments in registers at all. We
could make the interface something like

 - %eax contains system call number
 - %edx contains pointer to argument block
 - call *syscallptr	// trashes all registers

and then the old "compatibility" function would be something like

	movl 0(%edx),%ebx
	movl 4(%edx),%ecx
	movl 12(%edx),%esi
	movl 16(%edx),%edi
	movl 20(%edx),%ebp
	movl 8(%edx),%edx
	int $0x80
	ret

while the "sysenter" interface would do the loads from kernel space.

That would make some things easier, but the problem with this approach is
that if you have a single-argument system call, and you just pass in the
stack pointer offset in %edx directly, then the system call stubs will
always load 6 arguments, and if we're just at the end of the stack it
won't actually _work_. So part of the calling convention would have to be
the guarantee that there is stack-space available (should always be true
in practice, of course).

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:28                         ` Linus Torvalds
@ 2002-12-17 19:32                           ` H. Peter Anvin
  2002-12-17 19:44                             ` Linus Torvalds
  2002-12-17 19:53                           ` Ulrich Drepper
  1 sibling, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel

Linus Torvalds wrote:
> 
> On Tue, 17 Dec 2002, Linus Torvalds wrote:
> 
>>Hmm.. Which system calls have all six parameters? I'll have to see if I
>>can find any way to make those use the new interface too.
> 
> 
> The only ones I found from a quick grep are
>  - sys_recvfrom
>  - sys_sendto
>  - sys_mmap2()
>  - sys_ipc()
> 
> and none of them are of a kind where the system call entry itself is the
> biggest performance issue (and sys_ipc() is deprecated anyway), so it's
> probably acceptable to just use the old interface for them.
> 

recvfrom() and sendto() can also be implemeted as sendmsg() recvmsg() if
one really wants to.

What one can also do is that a sixth argument, if one exists, is passed
on the stack (i.e. in (%ebp), since %ebp contains the stack pointer.)

	-hpa


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:21                             ` H. Peter Anvin
@ 2002-12-17 19:37                               ` Linus Torvalds
  2002-12-17 19:43                                 ` H. Peter Anvin
                                                   ` (4 more replies)
  0 siblings, 5 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ulrich Drepper, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List

On Tue, 17 Dec 2002, H. Peter Anvin wrote:
>
> Let's see... it works fine on UP and on *most* SMP, and on the ones
> where it doesn't work you just fill in a system call into the vsyscall
> slot.  It just means that gettimeofday() needs a different vsyscall slot.

The thing is, gettimeofday() isn't _that_ special. It's just not worth a
vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
Just because we can?

This is especially true since the people who _really_ might care about
gettimeofday() are exactly the people who wouldn't be able to use the fast
user-space-only version.

How much do you think gettimeofday() really matters on a desktop? Sure, X
apps do gettimeofday() calls, but they do a whole lot more of _other_
calls, and gettimeofday() is really far far down in the noise for them.
The people who really call for gettimeofday() as a performance thing seem
to be database people who want it as a timestamp. But those are the same
people who also want NUMA machines which don't necessarily have
synchronized clocks.

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:37                               ` Linus Torvalds
@ 2002-12-17 19:43                                 ` H. Peter Anvin
  2002-12-17 20:07                                   ` Matti Aarnio
  2002-12-17 19:59                                 ` Matti Aarnio
                                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List

Linus Torvalds wrote:
> 
> The thing is, gettimeofday() isn't _that_ special. It's just not worth a
> vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
> Just because we can?
>

getpid() could be implemented in userspace, but not via vsyscalls
(instead it could be passed in the ELF data area at process start.)

"Because we can and it's relatively easy" is a pretty good argument in
my opinion.

> This is especially true since the people who _really_ might care about
> gettimeofday() are exactly the people who wouldn't be able to use the fast
> user-space-only version.
> 
> How much do you think gettimeofday() really matters on a desktop? Sure, X
> apps do gettimeofday() calls, but they do a whole lot more of _other_
> calls, and gettimeofday() is really far far down in the noise for them.
> The people who really call for gettimeofday() as a performance thing seem
> to be database people who want it as a timestamp. But those are the same
> people who also want NUMA machines which don't necessarily have
> synchronized clocks.
> 

I think this is really an overstatement.  Timestamping etc. (and heck,
even databases) are actually perfectly usable even on smaller machines
these days.  Sure, DB vendors like to boast of their 128-way NUMA
machines, but I suspect the bulk of them run on single- and
dual-processor machines (by count, not necessarily by data volume.)

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:48                     ` Ulrich Drepper
  2002-12-17 19:19                       ` H. Peter Anvin
@ 2002-12-17 19:44                       ` Alan Cox
  2002-12-17 19:52                         ` Richard B. Johnson
  1 sibling, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-17 19:44 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On Tue, 2002-12-17 at 18:48, Ulrich Drepper wrote:
> Alan Cox wrote:
> 
> > Is there any reason you can't just keep the linker out of the entire
> > mess by generating
> > 
> > 	.byte whatever
> > 	.dword 0xFFFF0000
> > 
> > instead of call ?
> 
> There is no such instruction.  Unless you know about some secret
> undocumented opcode...

No I'd forgotten how broken x86 was


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:32                           ` H. Peter Anvin
@ 2002-12-17 19:44                             ` Linus Torvalds
  0 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel

On Tue, 17 Dec 2002, H. Peter Anvin wrote:
>
> What one can also do is that a sixth argument, if one exists, is passed
> on the stack (i.e. in (%ebp), since %ebp contains the stack pointer.)

I like this. I will make it so. It will allow the old calling conventions
and has none of the stack size issues that my "memory block" approach had.

Also since this will all be done inside the wrapper and is thus entirely
invisible to the caller. Good, this solves the six-arg case nicely.

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:10                           ` Linus Torvalds
  2002-12-17 19:21                             ` H. Peter Anvin
@ 2002-12-17 19:47                             ` Dave Jones
  2002-12-18 12:57                             ` Rogier Wolff
  2 siblings, 0 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-17 19:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Alan Cox, Matti Aarnio, Hugh Dickins, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On Tue, Dec 17, 2002 at 11:10:20AM -0800, Linus Torvalds wrote:
 > Intel will fix the stupidities that cause the P4 to be slow at kernel
 > entry. Somebody already mentioned that apparently the newer P4 cores are
 > actually faster at system calls than mine is).

My HT Northwood returns slightly better results than your xeon,
but the syscall stuff still completely trounces it.

(19:38:46:davej@tetrachloride:davej)$ ./a.out 
440.107164 cycles
1152.596084 cycles

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:44                       ` Alan Cox
@ 2002-12-17 19:52                         ` Richard B. Johnson
  2002-12-17 19:54                           ` H. Peter Anvin
  2002-12-17 19:58                           ` Linus Torvalds
  0 siblings, 2 replies; 176+ messages in thread
From: Richard B. Johnson @ 2002-12-17 19:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ulrich Drepper, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On 17 Dec 2002, Alan Cox wrote:

> On Tue, 2002-12-17 at 18:48, Ulrich Drepper wrote:
> > Alan Cox wrote:
> > 
> > > Is there any reason you can't just keep the linker out of the entire
> > > mess by generating
> > > 
> > > 	.byte whatever
> > > 	.dword 0xFFFF0000
> > > 
> > > instead of call ?
> > 
> > There is no such instruction.  Unless you know about some secret
> > undocumented opcode...
> 
> No I'd forgotten how broken x86 was
> 

You can call intersegment with a full pointer. I don't know how
expensive that is. Since USER_CS is a fixed value in Linux, it
can be hard-coded

		.byte 0x9a
		.dword 0xfffff000
		.word USER_CS

No. I didn't try this, I'm just looking at the manual. I don't know
what the USER_CS is (didn't look in the kernel) The book says the
pointer is 16:32 which means that it's a dword, followed by a word.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:28                         ` Linus Torvalds
  2002-12-17 19:32                           ` H. Peter Anvin
@ 2002-12-17 19:53                           ` Ulrich Drepper
  2002-12-17 20:01                             ` Linus Torvalds
  1 sibling, 1 reply; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 19:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

Linus Torvalds wrote:
> 

> The only ones I found from a quick grep are
>  - sys_recvfrom
>  - sys_sendto
>  - sys_mmap2()
>  - sys_ipc()

All but mmap2 do not use 6 parameters.  They are implemented via the
sys_ipc multiplexer which takes the stack pointer as an argument which
then helps to locate the parameters.


-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:52                         ` Richard B. Johnson
@ 2002-12-17 19:54                           ` H. Peter Anvin
  2002-12-17 19:58                           ` Linus Torvalds
  1 sibling, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 19:54 UTC (permalink / raw)
  To: root
  Cc: Alan Cox, Ulrich Drepper, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

Richard B. Johnson wrote:
> 
> You can call intersegment with a full pointer. I don't know how
> expensive that is. Since USER_CS is a fixed value in Linux, it
> can be hard-coded
> 
> 		.byte 0x9a
> 		.dword 0xfffff000
> 		.word USER_CS
> 
> No. I didn't try this, I'm just looking at the manual. I don't know
> what the USER_CS is (didn't look in the kernel) The book says the
> pointer is 16:32 which means that it's a dword, followed by a word.
> 

It's quite expensive (not as expensive as INT, but not that far from
it), and you also push CS onto the stack.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:52                         ` Richard B. Johnson
  2002-12-17 19:54                           ` H. Peter Anvin
@ 2002-12-17 19:58                           ` Linus Torvalds
  2002-12-18  7:20                             ` Kai Henningsen
  1 sibling, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 19:58 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Alan Cox, Ulrich Drepper, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa



On Tue, 17 Dec 2002, Richard B. Johnson wrote:
>
> You can call intersegment with a full pointer. I don't know how
> expensive that is.

It's so expensive as to not be worth it, it's cheaper to load a register
or something, i eyou can do

	pushl $0xfffff000
	call *(%esp)

faster than doing a far call.

		Linus


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:37                               ` Linus Torvalds
  2002-12-17 19:43                                 ` H. Peter Anvin
@ 2002-12-17 19:59                                 ` Matti Aarnio
  2002-12-17 20:06                                 ` Ulrich Drepper
                                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 176+ messages in thread
From: Matti Aarnio @ 2002-12-17 19:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Ulrich Drepper, Alan Cox, Matti Aarnio,
	Hugh Dickins, Dave Jones, Ingo Molnar, Linux Kernel Mailing List

On Tue, Dec 17, 2002 at 11:37:04AM -0800, Linus Torvalds wrote:
> On Tue, 17 Dec 2002, H. Peter Anvin wrote:
> > Let's see... it works fine on UP and on *most* SMP, and on the ones
> > where it doesn't work you just fill in a system call into the vsyscall
> > slot.  It just means that gettimeofday() needs a different vsyscall slot.
> 
> The thing is, gettimeofday() isn't _that_ special. It's just not worth a
> vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
> Just because we can?

  clone()   -- which doesn't really like anybody using stack-pointer ?

  (I do use  gettimeofday() a _lot_, but I have my own userspace
   mapped shared segment thingamajingie doing it..  And I write
   code that runs on lots of systems, not only at Linux. )

> 		Linus

/Matti Aarnio

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:53                           ` Ulrich Drepper
@ 2002-12-17 20:01                             ` Linus Torvalds
  2002-12-17 20:17                               ` Ulrich Drepper
  2002-12-18  4:15                               ` Linus Torvalds
  0 siblings, 2 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-17 20:01 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa


How about this diff? It does both the 6-parameter thing _and_ the
AT_SYSINFO addition. Untested, since I have to run off and watch my kids
do their winter program ;)

		Linus

-----
===== arch/i386/kernel/entry.S 1.42 vs edited =====
--- 1.42/arch/i386/kernel/entry.S	Mon Dec 16 21:39:04 2002
+++ edited/arch/i386/kernel/entry.S	Tue Dec 17 11:59:13 2002
@@ -232,7 +232,7 @@
 #endif

 /* Points to after the "sysenter" instruction in the vsyscall page */
-#define SYSENTER_RETURN 0xfffff007
+#define SYSENTER_RETURN 0xffffe007

 	# sysenter call handler stub
 	ALIGN
@@ -243,6 +243,21 @@
 	pushfl
 	pushl $(__USER_CS)
 	pushl $SYSENTER_RETURN
+
+/*
+ * Load the potential sixth argument from user stack.
+ * Careful about security.
+ */
+	cmpl $0xc0000000,%ebp
+	jae syscall_badsys
+1:	movl (%ebp),%ebp
+.section .fixup,"ax"
+2:	xorl %ebp,%ebp
+.previous
+.section __ex_table,"a"
+	.align 4
+	.long 1b,2b
+.previous

 	pushl %eax
 	SAVE_ALL
===== arch/i386/kernel/sysenter.c 1.1 vs edited =====
--- 1.1/arch/i386/kernel/sysenter.c	Mon Dec 16 21:39:04 2002
+++ edited/arch/i386/kernel/sysenter.c	Tue Dec 17 11:39:39 2002
@@ -48,14 +48,14 @@
 		0xc3			/* ret */
 	};
 	static const char sysent[] = {
-		0x55,			/* push %ebp */
 		0x51,			/* push %ecx */
 		0x52,			/* push %edx */
+		0x55,			/* push %ebp */
 		0x89, 0xe5,		/* movl %esp,%ebp */
 		0x0f, 0x34,		/* sysenter */
+		0x5d,			/* pop %ebp */
 		0x5a,			/* pop %edx */
 		0x59,			/* pop %ecx */
-		0x5d,			/* pop %ebp */
 		0xc3			/* ret */
 	};
 	unsigned long page = get_zeroed_page(GFP_ATOMIC);
===== include/asm-i386/elf.h 1.3 vs edited =====
--- 1.3/include/asm-i386/elf.h	Thu Oct 17 00:48:55 2002
+++ edited/include/asm-i386/elf.h	Tue Dec 17 10:12:58 2002
@@ -100,6 +100,12 @@

 #define ELF_PLATFORM  (system_utsname.machine)

+/*
+ * Architecture-neutral AT_ values in 0-17, leave some room
+ * for more of them, start the x86-specific ones at 32.
+ */
+#define AT_SYSINFO	32
+
 #ifdef __KERNEL__
 #define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)

@@ -115,6 +121,11 @@
 extern void dump_smp_unlazy_fpu(void);
 #define ELF_CORE_SYNC dump_smp_unlazy_fpu
 #endif
+
+#define ARCH_DLINFO					\
+do {							\
+		NEW_AUX_ENT(AT_SYSINFO, 0xffffe000);	\
+} while (0)

 #endif

===== include/asm-i386/fixmap.h 1.9 vs edited =====
--- 1.9/include/asm-i386/fixmap.h	Mon Dec 16 21:39:04 2002
+++ edited/include/asm-i386/fixmap.h	Tue Dec 17 10:11:31 2002
@@ -42,8 +42,8 @@
  * task switches.
  */
 enum fixed_addresses {
-	FIX_VSYSCALL,
 	FIX_HOLE,
+	FIX_VSYSCALL,
 #ifdef CONFIG_X86_LOCAL_APIC
 	FIX_APIC_BASE,	/* local (CPU) APIC) -- required for SMP or not */
 #endif


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:37                               ` Linus Torvalds
  2002-12-17 19:43                                 ` H. Peter Anvin
  2002-12-17 19:59                                 ` Matti Aarnio
@ 2002-12-17 20:06                                 ` Ulrich Drepper
  2002-12-17 20:35                                   ` Daniel Jacobowitz
  2002-12-18  0:20                                   ` Linus Torvalds
  2002-12-18  7:41                                 ` Kai Henningsen
  2002-12-18 13:00                                 ` Rogier Wolff
  4 siblings, 2 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 20:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List

Linus Torvalds wrote:

> The thing is, gettimeofday() isn't _that_ special. It's just not worth a
> vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?

This is why I'd say mkae no distinction at all.  Have the first
nr_syscalls * 8 bytes starting at 0xfffff000 as a jump table.  We can
transfer to a different slot for each syscall.  Each slot then could be
a PC-relative jump to the common sysenter code or to some special code
sequence which is also in the global page.

If we don't do this now and it seems desirable in future we wither have
to introduce a second ABI for the vsyscall stuff (ugly!) or you'll have
to do the demultiplexing yourself in the code starting at 0xfffff000.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:43                                 ` H. Peter Anvin
@ 2002-12-17 20:07                                   ` Matti Aarnio
  2002-12-17 20:10                                     ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Matti Aarnio @ 2002-12-17 20:07 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Linux Kernel Mailing List

(cutting down To:/Cc:)

On Tue, Dec 17, 2002 at 11:43:57AM -0800, H. Peter Anvin wrote:
> Linus Torvalds wrote:
> > The thing is, gettimeofday() isn't _that_ special. It's just not worth a
> > vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
> > Just because we can?
> 
> getpid() could be implemented in userspace, but not via vsyscalls
> (instead it could be passed in the ELF data area at process start.)

  After fork() or clone()  ?
  If we had only spawn(), and some separate way to start threads...

...
> 	-hpa

/Matti Aarnio

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:07                                   ` Matti Aarnio
@ 2002-12-17 20:10                                     ` H. Peter Anvin
  0 siblings, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 20:10 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: Linus Torvalds, Linux Kernel Mailing List

Matti Aarnio wrote:
> (cutting down To:/Cc:)
> 
> On Tue, Dec 17, 2002 at 11:43:57AM -0800, H. Peter Anvin wrote:
> 
>>Linus Torvalds wrote:
>>
>>>The thing is, gettimeofday() isn't _that_ special. It's just not worth a
>>>vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
>>>Just because we can?
>>
>>getpid() could be implemented in userspace, but not via vsyscalls
>>(instead it could be passed in the ELF data area at process start.)
> 
> 
>   After fork() or clone()  ?
>   If we had only spawn(), and some separate way to start threads...
> 

fork() and clone() would have to return the self-pid as an auxilliary
return value.  This, of course, is getting rather fuggly.

Anything that cares caches getpid() anyway.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:49                 ` Alan Cox
@ 2002-12-17 20:12                   ` H. Peter Anvin
  0 siblings, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 20:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

Alan Cox wrote:
> On Tue, 2002-12-17 at 19:12, H. Peter Anvin wrote:
> 
>>The complexity only applies to nonsynchronized TSCs though, I would
>>assume.  I believe x86-64 uses a vsyscall using the TSC when it can
>>provide synchronized TSCs, and if it can't it puts a normal system call
>>inside the vsyscall in question.
> 
> 
> For x86-64 there is the hpet timer, which is a lot saner but I don't
> think we can mmap it
> 

It's only necessary, though, when TSC isn't usable.  TSC is psycho fast
when it's available.  Just about anything is saner than the old 8042 or
whatever it is called timer chip, though...

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:51                   ` Alan Cox
@ 2002-12-17 20:16                     ` H. Peter Anvin
  0 siblings, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 20:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: Martin J. Bligh, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

Alan Cox wrote:
> On Tue, 2002-12-17 at 19:26, Martin J. Bligh wrote:
> 
>>>>It's not as good as a pure user-mode solution using tsc could be, but
>>
>>You can't use the TSC to do gettimeofday on boxes where they aren't 
>>syncronised anyway though. That's nothing to do with vsyscalls, you just
>>need a different time source (eg the legacy stuff or HPET/cyclone).
> 
> 
> Ditto all the laptops and the like. With code provided by the kernel we
> can cheat however. If we know the fastest the CPU can go (ie full speed
> on spudstop/powernow etc) we can tell the tsc value at which we have to
> query the kernel to get time to any given accuracy, so allowing limited
> caching
> 
> Ditto by knowing the worst case drift on summit
> 

Clever.  I like it :)

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:01                             ` Linus Torvalds
@ 2002-12-17 20:17                               ` Ulrich Drepper
  2002-12-18  4:15                                 ` Linus Torvalds
  2002-12-18  4:15                               ` Linus Torvalds
  1 sibling, 1 reply; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-17 20:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

Linus Torvalds wrote:

> ===== arch/i386/kernel/sysenter.c 1.1 vs edited =====
> --- 1.1/arch/i386/kernel/sysenter.c	Mon Dec 16 21:39:04 2002
> +++ edited/arch/i386/kernel/sysenter.c	Tue Dec 17 11:39:39 2002
> @@ -48,14 +48,14 @@
>  		0xc3			/* ret */
>  	};
>  	static const char sysent[] = {
> -		0x55,			/* push %ebp */
>  		0x51,			/* push %ecx */
>  		0x52,			/* push %edx */
> +		0x55,			/* push %ebp */
>  		0x89, 0xe5,		/* movl %esp,%ebp */
>  		0x0f, 0x34,		/* sysenter */
> +		0x5d,			/* pop %ebp */
>  		0x5a,			/* pop %edx */
>  		0x59,			/* pop %ecx */
> -		0x5d,			/* pop %ebp */
>  		0xc3			/* ret */

Instead of duplicating the push/pop %ebp just use the first one by using

  movl 12(%ebo), %ebp

in the kernel code or remove the first.  The later is better, smaller code.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:06                                 ` Ulrich Drepper
@ 2002-12-17 20:35                                   ` Daniel Jacobowitz
  2002-12-18  0:20                                   ` Linus Torvalds
  1 sibling, 0 replies; 176+ messages in thread
From: Daniel Jacobowitz @ 2002-12-17 20:35 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, Dec 17, 2002 at 12:06:29PM -0800, Ulrich Drepper wrote:
> Linus Torvalds wrote:
> 
> > The thing is, gettimeofday() isn't _that_ special. It's just not worth a
> > vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
> 
> This is why I'd say mkae no distinction at all.  Have the first
> nr_syscalls * 8 bytes starting at 0xfffff000 as a jump table.  We can
> transfer to a different slot for each syscall.  Each slot then could be
> a PC-relative jump to the common sysenter code or to some special code
> sequence which is also in the global page.
> 
> If we don't do this now and it seems desirable in future we wither have
> to introduce a second ABI for the vsyscall stuff (ugly!) or you'll have
> to do the demultiplexing yourself in the code starting at 0xfffff000.

But what does this do to things like PTRACE_SYSCALL?  And do we care...
I suppose not if we keep the syscall trace checks on every kernel entry
path.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:12               ` H. Peter Anvin
  2002-12-17 19:26                 ` Martin J. Bligh
@ 2002-12-17 20:49                 ` Alan Cox
  2002-12-17 20:12                   ` H. Peter Anvin
  1 sibling, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-17 20:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

On Tue, 2002-12-17 at 19:12, H. Peter Anvin wrote:
> The complexity only applies to nonsynchronized TSCs though, I would
> assume.  I believe x86-64 uses a vsyscall using the TSC when it can
> provide synchronized TSCs, and if it can't it puts a normal system call
> inside the vsyscall in question.

For x86-64 there is the hpet timer, which is a lot saner but I don't
think we can mmap it



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:26                 ` Martin J. Bligh
@ 2002-12-17 20:51                   ` Alan Cox
  2002-12-17 20:16                     ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-17 20:51 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: H. Peter Anvin, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

On Tue, 2002-12-17 at 19:26, Martin J. Bligh wrote:
> >> It's not as good as a pure user-mode solution using tsc could be, but
> You can't use the TSC to do gettimeofday on boxes where they aren't 
> syncronised anyway though. That's nothing to do with vsyscalls, you just
> need a different time source (eg the legacy stuff or HPET/cyclone).

Ditto all the laptops and the like. With code provided by the kernel we
can cheat however. If we know the fastest the CPU can go (ie full speed
on spudstop/powernow etc) we can tell the tsc value at which we have to
query the kernel to get time to any given accuracy, so allowing limited
caching

Ditto by knowing the worst case drift on summit

Alan


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:49                     ` Linus Torvalds
  2002-12-17 19:09                       ` Ross Biro
@ 2002-12-17 21:34                       ` Benjamin LaHaise
  2002-12-17 21:36                         ` H. Peter Anvin
  1 sibling, 1 reply; 176+ messages in thread
From: Benjamin LaHaise @ 2002-12-17 21:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Ulrich Drepper, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

On Tue, Dec 17, 2002 at 10:49:31AM -0800, Linus Torvalds wrote:
> There is only a "call relative" or "call indirect-absolute". So you either
> have to indirect through memory or a register, or you have to fix up the
> call at link-time.
> 
> Yeah, I know it sounds strange, but it makes sense. Absolute calls are
> actually very unusual, and using relative calls is _usually_ the right
> thing to do. It's only in cases like this that we really want to call a
> specific address.

The stubs I used for the vsyscall bits just did an absolute jump to 
the vsyscall page, which would then do a ret to the original calling 
userspace code (since that provided library symbols for the user to 
bind against).

		-ben
-- 
"Do you seek knowledge in time travel?"

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 21:34                       ` Benjamin LaHaise
@ 2002-12-17 21:36                         ` H. Peter Anvin
  2002-12-17 21:50                           ` Benjamin LaHaise
  0 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 21:36 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Linus Torvalds, Alan Cox, Ulrich Drepper, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

Benjamin LaHaise wrote:
>
> The stubs I used for the vsyscall bits just did an absolute jump to 
> the vsyscall page, which would then do a ret to the original calling 
> userspace code (since that provided library symbols for the user to 
> bind against).
> 

What kind of "absolute jumps" were this?

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 18:57                         ` Ulrich Drepper
  2002-12-17 19:10                           ` Linus Torvalds
@ 2002-12-17 21:38                           ` Benjamin LaHaise
  2002-12-17 21:41                             ` H. Peter Anvin
  1 sibling, 1 reply; 176+ messages in thread
From: Benjamin LaHaise @ 2002-12-17 21:38 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Alan Cox, Linus Torvalds, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List, hpa

On Tue, Dec 17, 2002 at 10:57:29AM -0800, Ulrich Drepper wrote:
> But this is exactly what I expect to happen.  If you want to implement
> gettimeofday() at user-level you need to modify the page.  Some of the
> information the kernel has to keep for the thread group can be stored in
> this place and eventually be used by some uerlevel code  executed by
> jumping to 0xfffff000 or whatever the address is.

You don't actually need to modify the page, rather the data for the user 
level gettimeofday needs to be in a shared page and some register (like 
%tr) must expose the current cpu number to index into the data.  Either 
way, it's an internal implementation detail for the kernel to take care 
of, with multiple potential solutions.

		-ben
-- 
"Do you seek knowledge in time travel?"

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:11                 ` H. Peter Anvin
@ 2002-12-17 21:39                   ` Benjamin LaHaise
  2002-12-17 21:41                     ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Benjamin LaHaise @ 2002-12-17 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

On Tue, Dec 17, 2002 at 11:11:19AM -0800, H. Peter Anvin wrote:
> > against 0xfffffxxx and "rollback" (or complete) any incomplete
> > gettimeofday call prior to saving a task's state.  but i bet that test is
> > undesirable on all interrupt paths.
> > 
> 
> Exactly.  This is a real problem.

No, just take the number of context switches before and after the attempt 
to read the time of day.

		-ben
-- 
"Do you seek knowledge in time travel?"

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 21:38                           ` Benjamin LaHaise
@ 2002-12-17 21:41                             ` H. Peter Anvin
  0 siblings, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 21:41 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Ulrich Drepper, Alan Cox, Linus Torvalds, Matti Aarnio,
	Hugh Dickins, Dave Jones, Ingo Molnar, Linux Kernel Mailing List

Benjamin LaHaise wrote:
> On Tue, Dec 17, 2002 at 10:57:29AM -0800, Ulrich Drepper wrote:
> 
>>But this is exactly what I expect to happen.  If you want to implement
>>gettimeofday() at user-level you need to modify the page.  Some of the
>>information the kernel has to keep for the thread group can be stored in
>>this place and eventually be used by some uerlevel code  executed by
>>jumping to 0xfffff000 or whatever the address is.
> 
> 
> You don't actually need to modify the page, rather the data for the user 
> level gettimeofday needs to be in a shared page and some register (like 
> %tr) must expose the current cpu number to index into the data.  Either 
> way, it's an internal implementation detail for the kernel to take care 
> of, with multiple potential solutions.
> 

That's not the problem... the problem is that the userland code can get
preempted at any time and rescheduled on another CPU.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 21:39                   ` Benjamin LaHaise
@ 2002-12-17 21:41                     ` H. Peter Anvin
  2002-12-17 21:53                       ` Benjamin LaHaise
  0 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-17 21:41 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

Benjamin LaHaise wrote:
> On Tue, Dec 17, 2002 at 11:11:19AM -0800, H. Peter Anvin wrote:
> 
>>>against 0xfffffxxx and "rollback" (or complete) any incomplete
>>>gettimeofday call prior to saving a task's state.  but i bet that test is
>>>undesirable on all interrupt paths.
>>>
>>
>>Exactly.  This is a real problem.
> 
> 
> No, just take the number of context switches before and after the attempt 
> to read the time of day.
> 

How do you do that from userspace, atomically?  A counter in the shared
page?

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 21:36                         ` H. Peter Anvin
@ 2002-12-17 21:50                           ` Benjamin LaHaise
  0 siblings, 0 replies; 176+ messages in thread
From: Benjamin LaHaise @ 2002-12-17 21:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Alan Cox, Ulrich Drepper, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

On Tue, Dec 17, 2002 at 01:36:53PM -0800, H. Peter Anvin wrote:
> Benjamin LaHaise wrote:
> >
> > The stubs I used for the vsyscall bits just did an absolute jump to 
> > the vsyscall page, which would then do a ret to the original calling 
> > userspace code (since that provided library symbols for the user to 
> > bind against).
> > 
> 
> What kind of "absolute jumps" were this?

It was a far jump (ljmp $__USER_CS,<address>).

		-ben
-- 
"Do you seek knowledge in time travel?"

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 21:41                     ` H. Peter Anvin
@ 2002-12-17 21:53                       ` Benjamin LaHaise
  0 siblings, 0 replies; 176+ messages in thread
From: Benjamin LaHaise @ 2002-12-17 21:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

On Tue, Dec 17, 2002 at 01:41:55PM -0800, H. Peter Anvin wrote:
> > No, just take the number of context switches before and after the attempt 
> > to read the time of day.

> How do you do that from userspace, atomically?  A counter in the shared
> page?

Yup.  You need some shared data for the TSC offset such anyways, so 
moving the context switch counter onto such a page won't be much of 
a problem.  Using the %tr trick to get the CPU number would allow for 
some of these data structures to be per-cpu without incurring any LOCK 
overhead, too.

		-ben
-- 
"Do you seek knowledge in time travel?"

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 12:40               ` Dave Jones
@ 2002-12-17 23:18                 ` Andre Hedrick
  0 siblings, 0 replies; 176+ messages in thread
From: Andre Hedrick @ 2002-12-17 23:18 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, Ingo Molnar, linux-kernel, hpa


Okay I will go back to my storage cave, call when you need something.

Got some meat tenderizer for the shoe leather to make choking it down
easier?

Cheers,

On Tue, 17 Dec 2002, Dave Jones wrote:

> On Tue, Dec 17, 2002 at 01:45:52AM -0800, Andre Hedrick wrote:
>  
>  > Are you serious about moving of the banging we currently do on 0x80?
>  > If so, I have a P4 development board with leds to monitor all the lower io
>  > ports and can decode for you.
> 
> INT 0x80 != IO port 0x80
> 
> 8-)
> 
> 		Dave
> 
> -- 
> | Dave Jones.        http://www.codemonkey.org.uk
> | SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:06                                 ` Ulrich Drepper
  2002-12-17 20:35                                   ` Daniel Jacobowitz
@ 2002-12-18  0:20                                   ` Linus Torvalds
  2002-12-18  0:38                                     ` Ulrich Drepper
  1 sibling, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18  0:20 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: H. Peter Anvin, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List



On Tue, 17 Dec 2002, Ulrich Drepper wrote:
>
> This is why I'd say mkae no distinction at all.  Have the first
> nr_syscalls * 8 bytes starting at 0xfffff000 as a jump table.

No, the way sysenter works, the table approach just sucks up dcache space
(the kernel cannot know which sysenter is the one that the user uses
anyway, so the jump table would have to just add back some index and we'd
be back exactly where we started)

I'll keep it the way it is now.

		Linus


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  0:20                                   ` Linus Torvalds
@ 2002-12-18  0:38                                     ` Ulrich Drepper
  0 siblings, 0 replies; 176+ messages in thread
From: Ulrich Drepper @ 2002-12-18  0:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List

Linus Torvalds wrote:

> No, the way sysenter works, the table approach just sucks up dcache space
> (the kernel cannot know which sysenter is the one that the user uses
> anyway, so the jump table would have to just add back some index and we'd
> be back exactly where we started)
> 
> I'll keep it the way it is now.

I won't argue since honestly, not doing it is much easier for me.  But I
want to be sure I'm clear.

What I suggested is to have the first part of the global page be


   .p2align 3
   jmp sysenter_label
   .p2align 3
   jmp sysenter_label
   ...
   .p2align
   jmp userlevel_gettimeofday

sysenter_label:
  the usual sysenter code

userlevel_gettimeofday:
  whatever necessary


All this would be in the global page.  There is only one sysenter call.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:17                               ` Ulrich Drepper
@ 2002-12-18  4:15                                 ` Linus Torvalds
  0 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18  4:15 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

On Tue, 17 Dec 2002, Ulrich Drepper wrote:

> > -		0x55,			/* push %ebp */
> > +		0x55,			/* push %ebp */
> > +		0x5d,			/* pop %ebp */
> > -		0x5d,			/* pop %ebp */
>
> Instead of duplicating the push/pop %ebp just use the first one by using

No, it's not duplicating it. Look closer. It's just _moving_ it, so that
the old %ebp value will naturally be pointed to by %esp, which is what we
want.

Anyway, I reverted the %ebp games from my kernel, because they are
fundamentally not restartable and thus not really a good idea. Besides, it
might be wrong to try to optimize the fast system calls to handle six
arguments too, if that makes the (much more common case) the other system
calls slower. So the six-argument case might as well just continue to use
"int 0x80".

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 20:01                             ` Linus Torvalds
  2002-12-17 20:17                               ` Ulrich Drepper
@ 2002-12-18  4:15                               ` Linus Torvalds
  2002-12-18  4:39                                 ` H. Peter Anvin
                                                   ` (2 more replies)
  1 sibling, 3 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18  4:15 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Matti Aarnio, Hugh Dickins, Dave Jones, Ingo Molnar, linux-kernel,
	hpa

On Tue, 17 Dec 2002, Linus Torvalds wrote:
>
> How about this diff? It does both the 6-parameter thing _and_ the
> AT_SYSINFO addition.

The 6-parameter thing is broken. It's clever, but playing games with %ebp
is not going to work with restarting of the system call - we need to
restart with the proper %ebp.

I pushed out the AT_SYSINFO stuff, but we're back to the "needs to use
'int $0x80' for system calls that take 6 arguments" drawing board.

The only sane way I see to fix the %ebp problem is to actually expand the
kernel "struct ptregs" to have separate "ebp" and "arg6" fields (so that
we can re-start with the right ebp, and have arg6 as the right argument on
the stack). That would work but is not really worth it.

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  4:15                               ` Linus Torvalds
@ 2002-12-18  4:39                                 ` H. Peter Anvin
  2002-12-18  4:49                                   ` Linus Torvalds
  2002-12-18 13:17                                 ` Richard B. Johnson
  2002-12-18 13:40                                 ` Horst von Brand
  2 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-18  4:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel

Linus Torvalds wrote:
> On Tue, 17 Dec 2002, Linus Torvalds wrote:
> 
>>How about this diff? It does both the 6-parameter thing _and_ the
>>AT_SYSINFO addition.
> 
> 
> The 6-parameter thing is broken. It's clever, but playing games with %ebp
> is not going to work with restarting of the system call - we need to
> restart with the proper %ebp.
> 

This confuses me -- there seems to be no reason this shouldn't work as 
long as %esp == %ebp on sysexit.  The SYSEXIT-trashed GPRs seem like a 
bigger problem.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  4:39                                 ` H. Peter Anvin
@ 2002-12-18  4:49                                   ` Linus Torvalds
  2002-12-18  6:38                                     ` Linus Torvalds
  0 siblings, 1 reply; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18  4:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel

On Tue, 17 Dec 2002, H. Peter Anvin wrote:
>
> This confuses me -- there seems to be no reason this shouldn't work as
> long as %esp == %ebp on sysexit.  The SYSEXIT-trashed GPRs seem like a
> bigger problem.

The thing is, the argument save area == the kernel stack frame. This is
part of the reason why Linux has very fast system calls - there is
absolutely _zero_ extraneous setup. No argument fetching and marshalling,
it's all part of just setting up the regular kernel stack.

So to get the right argument in arg6, the argument _needs_ to be saved in
the %ebp entry on the kernel stack. Which means that on return from the
system call (which may not actually be through a "sysenter" at all, if
signals happen it will go through the generic paths), %ebp will have been
updated as part of the kernel stack unwinding.

Which is ok for a regular fast system call (ebp will get restored
immediately), but it is NOT ok for the system call restart case, since in
that case we want %ebp to contain the old stack pointer, not the sixth
argument.

If we just save the stack pointer value (== the initial %ebx value), the
right thing will get restored, but then system calls will see the stack
pointer value as arg6 - because of the 1:1 relationship between arguments
and stack save.

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 17:55                   ` Linus Torvalds
                                       ` (2 preceding siblings ...)
  2002-12-17 18:39                     ` Jeff Dike
@ 2002-12-18  5:34                     ` Jeremy Fitzhardinge
  2002-12-18  5:38                       ` H. Peter Anvin
  2002-12-18 15:50                       ` Alan Cox
  3 siblings, 2 replies; 176+ messages in thread
From: Jeremy Fitzhardinge @ 2002-12-18  5:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel List, H. Peter Anvin

On Tue, 2002-12-17 at 09:55, Linus Torvalds wrote:
> Uli, how about I just add one ne warchitecture-specific ELF AT flag, which
> is the "base of sysinfo page". Right now that page is all zeroes except
> for the system call trampoline at the beginning, but we might want to add
> other system information to the page in the future (it is readable, after
> all).

The P4 optimisation guide promises horrible things if you write within
2k of a cached instruction from another CPU (it dumps the whole trace
cache, it seems), so you'd need to be careful about mixing mutable data
and the syscall code in that page.

Immutable data should be fine.
        
        J


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  5:34                     ` Jeremy Fitzhardinge
@ 2002-12-18  5:38                       ` H. Peter Anvin
  2002-12-18 15:50                       ` Alan Cox
  1 sibling, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-18  5:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ulrich Drepper, Matti Aarnio, Hugh Dickins,
	Dave Jones, Ingo Molnar, Linux Kernel List

Jeremy Fitzhardinge wrote:
> On Tue, 2002-12-17 at 09:55, Linus Torvalds wrote:
> 
>>Uli, how about I just add one ne warchitecture-specific ELF AT flag, which
>>is the "base of sysinfo page". Right now that page is all zeroes except
>>for the system call trampoline at the beginning, but we might want to add
>>other system information to the page in the future (it is readable, after
>>all).
> 
> 
> The P4 optimisation guide promises horrible things if you write within
> 2k of a cached instruction from another CPU (it dumps the whole trace
> cache, it seems), so you'd need to be careful about mixing mutable data
> and the syscall code in that page.
> 
> Immutable data should be fine.
>         

Yes, you really want to use a second page.

	-hpa




^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  4:49                                   ` Linus Torvalds
@ 2002-12-18  6:38                                     ` Linus Torvalds
  0 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18  6:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel

On Tue, 17 Dec 2002, Linus Torvalds wrote:
>
> Which is ok for a regular fast system call (ebp will get restored
> immediately), but it is NOT ok for the system call restart case, since in
> that case we want %ebp to contain the old stack pointer, not the sixth
> argument.

I came up with an absolutely wonderfully _disgusting_ solution for this.

The thing to realize on how to solve this is that since "sysenter" loses
track of EIP, there's really no real reason to try to return directly
after the "sysenter" instruction anyway. The return point is really
totally arbitrary, after all.

Now, couple this with the fact that system call restarting will always
just subtract two from the "return point" aka saved EIP value (that's the
size of an "int 0x80" instruction), and what you can do is to make the
kernel point the sysexit return point not at just past the "sysenter", but
instead make it point to just past a totally unrelated 2-byte jump
instruction.

With that in mind, I made the sysentry trampoline look like this:

        static const char sysent[] = {
                0x51,                   /* push %ecx */
                0x52,                   /* push %edx */
                0x55,                   /* push %ebp */
                0x89, 0xe5,             /* movl %esp,%ebp */
                0x0f, 0x34,             /* sysenter */
        /* System call restart point is here! (SYSENTER_RETURN - 2) */
                0xeb, 0xfa,             /* jmp to "movl %esp,%ebp" */
        /* System call normal return point is here! (SYSENTER_RETURN in entry.S) */
                0x5d,                   /* pop %ebp */
                0x5a,                   /* pop %edx */
                0x59,                   /* pop %ecx */
                0xc3                    /* ret */
        };

which does the right thing for a "restarted" system call (ie when it
restarts, it won't re-do just the sysenter instruction, it will really
restart at the backwards jump, and thus re-start the "movl %esp,%ebp"
too).

Which means that now the kernel can happily trash %ebp as part of the
sixth argument setup, since system call restarting will re-initialize it
to point to the user-level stack that we need in %ebp because otherwise it
gets totally lost.

I'm a disgusting pig, and proud of it to boot.

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:58                           ` Linus Torvalds
@ 2002-12-18  7:20                             ` Kai Henningsen
  0 siblings, 0 replies; 176+ messages in thread
From: Kai Henningsen @ 2002-12-18  7:20 UTC (permalink / raw)
  To: linux-kernel

torvalds@transmeta.com (Linus Torvalds)  wrote on 17.12.02 in <Pine.LNX.4.44.0212171157050.1095-100000@home.transmeta.com>:

> On Tue, 17 Dec 2002, Richard B. Johnson wrote:
> >
> > You can call intersegment with a full pointer. I don't know how
> > expensive that is.
>
> It's so expensive as to not be worth it, it's cheaper to load a register
> or something, i eyou can do
>
> 	pushl $0xfffff000
> 	call *(%esp)
>
> faster than doing a far call.

Hmm ...

How expensive would it be to have a special virtual DSO built into ld.so  
which exported this (and any other future entry points), to be linked  
against like any other DSO? That way, the *actual* interface would only be  
between the kernel and ld.so.

MfG Kai

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:37                               ` Linus Torvalds
                                                   ` (2 preceding siblings ...)
  2002-12-17 20:06                                 ` Ulrich Drepper
@ 2002-12-18  7:41                                 ` Kai Henningsen
  2002-12-18 13:00                                 ` Rogier Wolff
  4 siblings, 0 replies; 176+ messages in thread
From: Kai Henningsen @ 2002-12-18  7:41 UTC (permalink / raw)
  To: linux-kernel

torvalds@transmeta.com (Linus Torvalds)  wrote on 17.12.02 in <Pine.LNX.4.44.0212171132530.1095-100000@home.transmeta.com>:

> On Tue, 17 Dec 2002, H. Peter Anvin wrote:
> >
> > Let's see... it works fine on UP and on *most* SMP, and on the ones
> > where it doesn't work you just fill in a system call into the vsyscall
> > slot.  It just means that gettimeofday() needs a different vsyscall slot.
>
> The thing is, gettimeofday() isn't _that_ special. It's just not worth a
> vsyscall of it's own, I feel. Where do you stop? Do we do getpid() too?
> Just because we can?

It's special enough that while programming under DOS, I had my own special  
routine which just took the BIOS ticker from low memory for a lot of  
things - even to decide if calling the actual time-of-day syscall was  
useful or if I should expect to get the same value back as last time.

That was a *serious* performance improvement. (Of course, DOS syscalls are  
S-L-O-W ...)

These days, the equivalent does call gettimeofday(). It's still probably  
the most-used syscall by far. (Hmm - maybe I can get some numbers for  
that? Must see if I get time today.) And *that* is why optimizing this one  
call makes sense.

> This is especially true since the people who _really_ might care about
> gettimeofday() are exactly the people who wouldn't be able to use the fast
> user-space-only version.

Say what? Why wouldn't I be able to use it? Right now, I know of no SMP  
installation that's even in the planning ...

> How much do you think gettimeofday() really matters on a desktop? Sure, X

Why desktop? We use the same kind of thing in the server, and it's much  
more important there. Client performance is uninteresting - clients mostly  
wait anyway.

> The people who really call for gettimeofday() as a performance thing seem
> to be database people who want it as a timestamp. But those are the same

Not database, but otherwise on the nail.

> people who also want NUMA machines which don't necessarily have
> synchronized clocks.

Nope, no interest in those. SMP *might* become interesting, but I don't  
think we'd ever want to care about weird stuff like NUMA ... at least not  
for the next five years or so.

We don't shovel nearly as much data around as the database guys.

MfG Kai

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:10                           ` Linus Torvalds
  2002-12-17 19:21                             ` H. Peter Anvin
  2002-12-17 19:47                             ` Dave Jones
@ 2002-12-18 12:57                             ` Rogier Wolff
  2002-12-19  0:14                               ` Pavel Machek
  2 siblings, 1 reply; 176+ messages in thread
From: Rogier Wolff @ 2002-12-18 12:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Alan Cox, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, Linux Kernel Mailing List, hpa

On Tue, Dec 17, 2002 at 11:10:20AM -0800, Linus Torvalds wrote:
> 
> 
> On Tue, 17 Dec 2002, Ulrich Drepper wrote:
> >
> > But this is exactly what I expect to happen.  If you want to implement
> > gettimeofday() at user-level you need to modify the page.
> 
> Note that I really don't think we ever want to do the user-level
> gettimeofday(). The complexity just argues against it, it's better to try
> to make system calls be cheap enough that you really don't care.

I'd say that this should not be "fixed" from userspace, but from the
kernel. Thus if the kernel knows that the "gettimeofday" can be made
faster by doing it completely in userspace, then that system call
should be "patched" by the kernel to do it faster for everybody.

Next, someone might find a faster (full-userspace) way to do some
"reads"(*). Then it might pay to check for that specific
filedescriptor in userspace, and only call into the kernel for the
other filedescriptors. The idea is that the kernel knows best when
optimizations are possible.

Thus that ONE magic address is IMHO not the right way to do it. By
demultiplexing the stuff in userspace, you can do "sane" things with
specific syscalls. 

So for example, the code at 0xffff80000 would be: 
	mov 0x00,%eax
	int $80
	ret

(in the case where sysenter & friends is not available)

moving the "load syscall number into the register" into the
kernel-modifiable area does not cost a thing, but because we have
demultiplexed the code, we can easily replace the gettimeofday call by
something that (when it's easy) doesn't require the 600-cycle call 
into kernel mode. 

The "syscall _NR" would then become: 

	call	0xffff8000 + _NR * 0x80

allowing for up to 0x80 bytes of "patchup code" or "do it quickly"
code, but also for a jump to some other "magic page", that has more
extensive code.

(Oh, I'm showing a base of 0xffff8000: A bit lower than previous
suggestions: allowing for a per-syscall entrypoint, and up to 0x80
bytes of fixup or "do it really quickly" code.)

P.S. People might argue that using this large "stride" would have a
larger cache-footprint. I think that all "where it matters" programs
will have a very small working-set of system calls. It might pay to
use a stride of say 0xa0 to spread the different
system-call-code-points over different cache-lines whenever possible.

		Roger. 

(*) I was trying to pick a particularly unlikely case, but I can even
see a case where this is useful. For example, a kernel might be
compiled with "high performance pipes", which would move most of the
pipe reads and writes into userspace, through a shared-memory window. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currently in such an      * 
* excursion: The stable situation does not include humans. ***************

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 19:37                               ` Linus Torvalds
                                                   ` (3 preceding siblings ...)
  2002-12-18  7:41                                 ` Kai Henningsen
@ 2002-12-18 13:00                                 ` Rogier Wolff
  4 siblings, 0 replies; 176+ messages in thread
From: Rogier Wolff @ 2002-12-18 13:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Ulrich Drepper, Alan Cox, Matti Aarnio,
	Hugh Dickins, Dave Jones, Ingo Molnar, Linux Kernel Mailing List

On Tue, Dec 17, 2002 at 11:37:04AM -0800, Linus Torvalds wrote:
> How much do you think gettimeofday() really matters on a desktop? Sure, X
> apps do gettimeofday() calls, but they do a whole lot more of _other_
> calls, and gettimeofday() is really far far down in the noise for them.
> The people who really call for gettimeofday() as a performance thing seem
> to be database people who want it as a timestamp. But those are the same
> people who also want NUMA machines which don't necessarily have
> synchronized clocks.

Once the kernel provides the right infrastructure, doing it may become
so easy that it can be tried and implemented and benchmarked with so
little effort that it would simply stick.

			Roger. 


-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currently in such an      * 
* excursion: The stable situation does not include humans. ***************

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  4:15                               ` Linus Torvalds
  2002-12-18  4:39                                 ` H. Peter Anvin
@ 2002-12-18 13:17                                 ` Richard B. Johnson
  2002-12-18 13:40                                 ` Horst von Brand
  2 siblings, 0 replies; 176+ messages in thread
From: Richard B. Johnson @ 2002-12-18 13:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ulrich Drepper, Matti Aarnio, Hugh Dickins, Dave Jones,
	Ingo Molnar, linux-kernel, hpa

On Tue, 17 Dec 2002, Linus Torvalds wrote:

> 
> On Tue, 17 Dec 2002, Linus Torvalds wrote:
> >
> > How about this diff? It does both the 6-parameter thing _and_ the
> > AT_SYSINFO addition.
> 
> The 6-parameter thing is broken. It's clever, but playing games with %ebp
> is not going to work with restarting of the system call - we need to
> restart with the proper %ebp.
> 
> I pushed out the AT_SYSINFO stuff, but we're back to the "needs to use
> 'int $0x80' for system calls that take 6 arguments" drawing board.
> 
> The only sane way I see to fix the %ebp problem is to actually expand the
> kernel "struct ptregs" to have separate "ebp" and "arg6" fields (so that
> we can re-start with the right ebp, and have arg6 as the right argument on
> the stack). That would work but is not really worth it.
> 
> 		Linus
> 

How about for the new interface, a one-parameter arg, i.e., a pointer
to a descriptor (structure)?? For the typical one-argument call, i.e.,
getpid(), it's just one de-reference. The pointer register can be
EAX on Intel, a register normally available in a 'C' call.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  4:15                               ` Linus Torvalds
  2002-12-18  4:39                                 ` H. Peter Anvin
  2002-12-18 13:17                                 ` Richard B. Johnson
@ 2002-12-18 13:40                                 ` Horst von Brand
  2002-12-18 13:47                                   ` Sean Neakums
                                                     ` (2 more replies)
  2 siblings, 3 replies; 176+ messages in thread
From: Horst von Brand @ 2002-12-18 13:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

[Extremely interesting new syscall mechanism tread elided]

What happened to "feature freeze"?
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 13:40                                 ` Horst von Brand
@ 2002-12-18 13:47                                   ` Sean Neakums
  2002-12-18 14:10                                     ` Horst von Brand
  2002-12-18 15:52                                   ` Alan Cox
  2002-12-18 16:41                                   ` Dave Jones
  2 siblings, 1 reply; 176+ messages in thread
From: Sean Neakums @ 2002-12-18 13:47 UTC (permalink / raw)
  To: linux-kernel

commence  Horst von Brand quotation:

> [Extremely interesting new syscall mechanism tread elided]
>
> What happened to "feature freeze"?

How are system calls a new feature?  Or is optimizing an existing
feature not allowed by your definition of "feature freeze"?

-- 
 /                          |
[|] Sean Neakums            |  Questions are a burden to others;
[|] <sneakums@zork.net>     |      answers a prison for oneself.
 \                          |

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 13:47                                   ` Sean Neakums
@ 2002-12-18 14:10                                     ` Horst von Brand
  2002-12-18 14:51                                       ` dada1
  2002-12-18 19:12                                       ` Mark Mielke
  0 siblings, 2 replies; 176+ messages in thread
From: Horst von Brand @ 2002-12-18 14:10 UTC (permalink / raw)
  To: linux-kernel

Sean Neakums <sneakums@zork.net> said:
> commence  Horst von Brand quotation:
> 
> > [Extremely interesting new syscall mechanism tread elided]
> >
> > What happened to "feature freeze"?
> 
> How are system calls a new feature?  Or is optimizing an existing
> feature not allowed by your definition of "feature freeze"?

This "optimizing" is very much userspace-visible, and a radical change in
an interface this fundamental counts as a new feature in my book.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 14:10                                     ` Horst von Brand
@ 2002-12-18 14:51                                       ` dada1
  2002-12-18 19:12                                       ` Mark Mielke
  1 sibling, 0 replies; 176+ messages in thread
From: dada1 @ 2002-12-18 14:51 UTC (permalink / raw)
  To: linux-kernel, Horst von Brand

From: "Horst von Brand" <vonbrand@inf.utfsm.cl>
> > How are system calls a new feature?  Or is optimizing an existing
> > feature not allowed by your definition of "feature freeze"?
>
> This "optimizing" is very much userspace-visible, and a radical change in
> an interface this fundamental counts as a new feature in my book.

Since int 0x80 is supported/ will be supported for the next 20 years, I dont
think this is a radical change.
No userspace visible at all.
You are free to use the old way of calling the kernel...


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18  5:34                     ` Jeremy Fitzhardinge
  2002-12-18  5:38                       ` H. Peter Anvin
@ 2002-12-18 15:50                       ` Alan Cox
  1 sibling, 0 replies; 176+ messages in thread
From: Alan Cox @ 2002-12-18 15:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ulrich Drepper, Matti Aarnio, Hugh Dickins,
	Dave Jones, Ingo Molnar, Linux Kernel Mailing List,
	H. Peter Anvin

On Wed, 2002-12-18 at 05:34, Jeremy Fitzhardinge wrote:
> The P4 optimisation guide promises horrible things if you write within
> 2k of a cached instruction from another CPU (it dumps the whole trace
> cache, it seems), so you'd need to be careful about mixing mutable data
> and the syscall code in that page.

The PIII errata promise worse things with SMP and code modified as
another cpu ruins it and seems to mark them WONTFIX, so there is another
dragon to beware of


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 13:40                                 ` Horst von Brand
  2002-12-18 13:47                                   ` Sean Neakums
@ 2002-12-18 15:52                                   ` Alan Cox
  2002-12-18 16:41                                   ` Dave Jones
  2 siblings, 0 replies; 176+ messages in thread
From: Alan Cox @ 2002-12-18 15:52 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Wed, 2002-12-18 at 13:40, Horst von Brand wrote:
> [Extremely interesting new syscall mechanism tread elided]
> 
> What happened to "feature freeze"?

I'm wondering that. 2.5.49 was usable for devel work, no kernel since
has been. Its stopped IDE getting touched until January.

Linus. you are doing the slow slide into a second round of development
work again, just like mid 2.3, just like 1.3.60, ...


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 13:40                                 ` Horst von Brand
  2002-12-18 13:47                                   ` Sean Neakums
  2002-12-18 15:52                                   ` Alan Cox
@ 2002-12-18 16:41                                   ` Dave Jones
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
  2002-12-18 18:41                                     ` Intel P6 vs P7 system call performance Horst von Brand
  2 siblings, 2 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-18 16:41 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Linus Torvalds, linux-kernel

On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
 > [Extremely interesting new syscall mechanism tread elided]
 > 
 > What happened to "feature freeze"?

*bites lip* it's fairly low impact *duck*.
Given the wins seem to be fairly impressive across the board, spending
a few days on getting this right isn't going to push 2.6 back any
noticable amount of time.

This stuff is mostly of the case "it either works, or it doesn't".
And right now, corner cases like apm aside, it seems to be holding up
so far. This isn't as far reaching as it sounds. There are still
drivers being turned upside down which are changing things in a
lot bigger ways than this[1]

		Dave

[1] Myself being one of the guilty parties there, wrt agp.

-- 
| Dave Jones.        http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:41                                   ` Dave Jones
@ 2002-12-18 16:49                                     ` Linus Torvalds
  2002-12-18 16:56                                       ` Larry McVoy
                                                         ` (4 more replies)
  2002-12-18 18:41                                     ` Intel P6 vs P7 system call performance Horst von Brand
  1 sibling, 5 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18 16:49 UTC (permalink / raw)
  To: Dave Jones; +Cc: Horst von Brand, linux-kernel, Alan Cox, Andrew Morton

On Wed, 18 Dec 2002, Dave Jones wrote:
> On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
>  > [Extremely interesting new syscall mechanism tread elided]
>  >
>  > What happened to "feature freeze"?
>
> *bites lip* it's fairly low impact *duck*.

However, it's a fair question.

I've been wondering how to formalize patch acceptance at code freeze, but
it might be a good idea to start talking about some way to maybe put
brakes on patches earlier, ie some kind of "required approval process".

I think the system call thing is very localized and thus not a big issue,
but in general we do need to have something in place.

I just don't know what that "something" should be. Any ideas? I thought
about the code freeze require buy-in from three of four people (me, Alan,
Dave and Andrew come to mind) for a patch to go in, but that's probably
too draconian for now. Or is it (maybe start with "needs approval by two"
and switch it to three when going into code freeze)?

			Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
@ 2002-12-18 16:56                                       ` Larry McVoy
  2002-12-18 16:58                                       ` Dave Jones
                                                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 176+ messages in thread
From: Larry McVoy @ 2002-12-18 16:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox,
	Andrew Morton

> I've been wondering how to formalize patch acceptance at code freeze, but
> it might be a good idea to start talking about some way to maybe put
> brakes on patches earlier, ie some kind of "required approval process".

We went through this here for the bk-3.0 release.  We're a much smaller 
team so this may not work at all for you, but it was very successful 
for us, so much so that we are looking at formalizing it in BK.  But
you can apply the same process outside of BK just fine.

We created a well known spot for pending patches; all reviewers need access
to that spot.  Here's the README from that directory:

    There should be the following subdirectories here

	    ready/		-> waiting on review 
	    done/		-> in the tree
	    rejected/	-> no good


    In the ready/ subdirectory, for each repository which has changes that
    want to be in bk-3.0 but are not, I want:

	    ready/atrev -> /home/bk/wscott/bk-3.0-atrev
	    ready/atrev.RTI
	    ready/atrev.REVIEWED

    The first is a symlink to the location of the repository.

    The second is an RTI request which describes what is in the repo and why
    it should go in.

    The third contains the review comments in the form

	    lm (approved|not approved)
		    review comments
	    wscott (approved|not approved)
		    review comments
	    etc.

    Once the REVIEWED file contains enough approvals, in the judgement
    of the gatekeeper, then he pulls the repo into the bk-3.0 tree and moves
    the 3 files from ready/* to done/*

The things which worked very well were:

    a) extremely simple.  As we added developers they understood right away
       what the process was.
    b) centralized location.  Anyone could be bored and go do a review.
    c) tight control on the tree.



We're thinking about formalizing this in the context of BK as follows:

NAME
	bk queue - manage the queue of pending changes

DESCRIPTION
	bk queue is used to manage a queue of changes to a repository.
	It is typically used on integration repositories where tighter
	controls on change are desirable.  

	In all commands, if no URL is specified, the implied URL is the
	parent of the current repository, if any.  The URL "." means this
	repository.

	XXX - need a large paragraph on the importance of not circulating
	changesets which are in review state.  They'll come back.

	bk queue [-n<name>] [-R<rti>] [<URL>]
	    This is like a bk push but wants a "request to integrate"
	    (RTI) which is sent with the changes.  It also wants a name
	    for the set of changes.  All pending changesets are pushed.
	    If no name is given, the user is prompted for one.	If no
	    RTI is given, the user is prompted for one.

	bk queue -l [-n<name>] [<URL>]
	    Lists the set of pending changes in the queue like so:
	    <name> <date> <user> <state>

	    Values for the <state> field:
		unreviewed - nobody has looked at it yet
		reviewed by <reviewer> on <date> - obvious
		accepted - it is in accepted state but not integrated
		rejected - reviewed and rejected

	    Note that if there are multiple reviewers of a change, there
	    will be multiple lines in the listing for that change.

	    If the <name> arg is present then restrict the listing to 
	    that name.  If the <name> arg is present more than once,
	    restrict the listing to the set of named changes.

	    Could also have a -s<state> option which restricts the listing
	    to those changes in <state> state.

	bk queue -pR [-o<file>] <name> [<URL>]
	    Retrieves and displays the RTI for change <name>.  
	    If <file> is specified, put the form there.

	bk queue -pr [-o<file>] [-u<user>] <name> [<URL>]
	    Retrieves and displays the review form[s] for change <name>.
	    If a user is specified, retrieve that users' review only.
	    If <file> is specified, put the form there.

	bk queue -uR [<rti>] [<URL>]
	    Replaces any existing RTI with the specified RTI.  If no RTI
	    is specified, it prompts you for one like bk setup does.

	bk queue -ur [<review>] [<URL>]
	    Adds or replaces any existing review form with the specified
	    review.  If no review is specified, it prompts you for one
	    like bk setup does.  You may only replace your own reviews.  

	bk queue -O[<owner>] [<URL>]
	    Sets the owner of the repository to <owner>.  Only the owner
	    may update the repository.  Only the current owner can change
	    the ownership.  If no owner is specified and there is an owner
	    and the caller is the owner, then delete the owner.
	    (This is nothing more than a pre-{incoming,commit}-owner trigger)

	bk queue -d<name> [-f] [<URL>]
	    Delete the named change from the queue.  This deletes EVERYTHING,
	    the patch, rti, reviews, everything.  Only the submitter of the
	    change may delete the change unless the -f option is supplied.

	bk queue -U<name> [-R<rti>] [<URL>]
	    Replace the changes in the queue <name> with the set of
	    changesets in the current repository.  If the <rti> is
	    present, replace the current RTI form with the specified form.
	    All reviews, if any, are updated with a note that indicates
	    the existing review was against changes which have been replaced.

GUI
	This is a command line tool; Bryan gets to do bk queuetool
	using these interfaces.

TODO
	- how do we merge?  
	- define a format for the RTI
	- define a format for reviews
	- should the RTI & review files be KV files?
	- should the {name/RTI/REVIEWS} live as part of the repo and be
	  propogated?  I think yes for upstream propogation, no for 
	  downstream.  Hard to say.
	- need a way to add a queue item with no changes, i.e., an RFE which
	  needs to be in the tree but there are no changes yet.

FILES
	BitKeeper/queue/<name>/CSETS - changeset keys for change <name>
	BitKeeper/queue/<name>/RTI - RTI for change <name>
	BitKeeper/queue/<name>/PATCH - BK patch for change <name>
	BitKeeper/queue/<name>/RESYNC - exploded patch for change <name>
	BitKeeper/queue/<name>/review.user - review by user for change <name>
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
  2002-12-18 16:56                                       ` Larry McVoy
@ 2002-12-18 16:58                                       ` Dave Jones
  2002-12-18 17:41                                         ` Linus Torvalds
  2002-12-18 17:06                                       ` Eli Carter
                                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 176+ messages in thread
From: Dave Jones @ 2002-12-18 16:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Horst von Brand, linux-kernel, Alan Cox, Andrew Morton

On Wed, Dec 18, 2002 at 08:49:37AM -0800, Linus Torvalds wrote:

 > >  > What happened to "feature freeze"?
 > > *bites lip* it's fairly low impact *duck*.
 > However, it's a fair question.

Indeed. Were you merging something like preempt at this stage, I'd be wondering
if you'd broken out the eggnog a little too soon.

 > I just don't know what that "something" should be. Any ideas? I thought
 > about the code freeze require buy-in from three of four people (me, Alan,
 > Dave and Andrew come to mind) for a patch to go in, but that's probably
 > too draconian for now. Or is it (maybe start with "needs approval by two"
 > and switch it to three when going into code freeze)?

You'd likely need an odd number of folks in this cabal^Winner circle
though, or would you just do it and be damned if you got an equal
number of 'aye's and 'nay's ? 8-)

Other than that, it reminds me of the way the gcc folks work, with a
number of people reviewing patches before acceptance [not that this
doesn't happen on l-k already], and at least 1 approval from someone
prepared to approve submissions.

The approval process does seem to be quite a lot of work though.
I think it was rth last year at OLS who told me that at that time
he'd been doing more approving of other peoples stuff than coding himself.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
  2002-12-18 16:56                                       ` Larry McVoy
  2002-12-18 16:58                                       ` Dave Jones
@ 2002-12-18 17:06                                       ` Eli Carter
  2002-12-18 17:08                                       ` Andrew Morton
  2002-12-18 18:25                                       ` John Alvord
  4 siblings, 0 replies; 176+ messages in thread
From: Eli Carter @ 2002-12-18 17:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox,
	Andrew Morton

Linus Torvalds wrote:
> 
> On Wed, 18 Dec 2002, Dave Jones wrote:
> 
>>On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
>> > [Extremely interesting new syscall mechanism tread elided]
>> >
>> > What happened to "feature freeze"?
>>
>>*bites lip* it's fairly low impact *duck*.
> 
> 
> However, it's a fair question.
> 
> I've been wondering how to formalize patch acceptance at code freeze, but
> it might be a good idea to start talking about some way to maybe put
> brakes on patches earlier, ie some kind of "required approval process".
> 
> I think the system call thing is very localized and thus not a big issue,
> but in general we do need to have something in place.
> 
> I just don't know what that "something" should be. Any ideas? I thought
> about the code freeze require buy-in from three of four people (me, Alan,
> Dave and Andrew come to mind) for a patch to go in, but that's probably
> too draconian for now. Or is it (maybe start with "needs approval by two"
> and switch it to three when going into code freeze)?

Well, Linus, you're not the most conservative when it comes to freezes. 
   (Hey! Watch it with those thunderbolts!)  Alan, on the other hand, I 
would trust to be pretty conservative.
I'm afraid I haven't followed Dave & Andrew well enough in that light.

But my question is... if 2 are required, and say, Dave is as slushy on 
freezes as you are, then have we gained much?

Perhaps 2 of 4 approve with no dissenting votes?

If Dave and Andrew are relatively conservative on freezes, then this 
concern is sufficiently addressed already.

Food for thought from a relative nobody. ;)

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
                                                         ` (2 preceding siblings ...)
  2002-12-18 17:06                                       ` Eli Carter
@ 2002-12-18 17:08                                       ` Andrew Morton
  2002-12-18 18:25                                       ` John Alvord
  4 siblings, 0 replies; 176+ messages in thread
From: Andrew Morton @ 2002-12-18 17:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox

Linus Torvalds wrote:
> 
> On Wed, 18 Dec 2002, Dave Jones wrote:
> > On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
> >  > [Extremely interesting new syscall mechanism tread elided]
> >  >
> >  > What happened to "feature freeze"?
> >
> > *bites lip* it's fairly low impact *duck*.
> 
> However, it's a fair question.
> 
> I've been wondering how to formalize patch acceptance at code freeze, but
> it might be a good idea to start talking about some way to maybe put
> brakes on patches earlier, ie some kind of "required approval process".
> 
> I think the system call thing is very localized and thus not a big issue,
> but in general we do need to have something in place.
> 
> I just don't know what that "something" should be. Any ideas? I thought
> about the code freeze require buy-in from three of four people (me, Alan,
> Dave and Andrew come to mind) for a patch to go in, but that's probably
> too draconian for now. Or is it (maybe start with "needs approval by two"
> and switch it to three when going into code freeze)?
> 

It does sound a little bureacratic for this point in development.

The first thing we need is a set of widely-understood guidelines.
Such as:

Only
	- bugfixes
	- speedups
	- previously-agreed-to or in-progress features
	- totally new things (new drivers, new filesystems)

Once everyone understands this framework then it becomes easy to
decide what to drop, what not.

So right now, sysenter is "in".  Later, even "speedups" falls off
the list and sysenter would at that stage be "out".

Can it be that simple?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:58                                       ` Dave Jones
@ 2002-12-18 17:41                                         ` Linus Torvalds
  2002-12-18 18:03                                           ` Jeff Garzik
                                                             ` (2 more replies)
  0 siblings, 3 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-18 17:41 UTC (permalink / raw)
  To: Dave Jones; +Cc: Horst von Brand, linux-kernel, Alan Cox, Andrew Morton

On Wed, 18 Dec 2002, Dave Jones wrote:
>
>  > I just don't know what that "something" should be. Any ideas? I thought
>  > about the code freeze require buy-in from three of four people (me, Alan,
>  > Dave and Andrew come to mind) for a patch to go in, but that's probably
>  > too draconian for now. Or is it (maybe start with "needs approval by two"
>  > and switch it to three when going into code freeze)?
>
> You'd likely need an odd number of folks in this cabal^Winner circle
> though, or would you just do it and be damned if you got an equal
> number of 'aye's and 'nay's ? 8-)

Quite frankly, I wouldn't expect a lot of dissent.

I suspect a group approach has very little inherent disagreement, and to
me the main result of having an "approval process" is to really just slow
things down and make people think about the submitting.  The actual
approval itself is secondary (it _looks_ like a primary objective, but in
real life it's just the _existence_ of rules that make more of a
difference).

> The approval process does seem to be quite a lot of work though.
> I think it was rth last year at OLS who told me that at that time
> he'd been doing more approving of other peoples stuff than coding himself.

I heartily disagree with the approval process for development, just
because it gets so much in the way and just annoys people. But for
stabilization, that's exactly what you want. So I think gcc is using the
approval process much too much, but apparently it works for them.

And I think it could work for the kernel too, especially the stable
releases and for the process of getting there. I just don't really know
how to set it up well.

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 17:41                                         ` Linus Torvalds
@ 2002-12-18 18:03                                           ` Jeff Garzik
  2002-12-18 18:09                                             ` Mike Dresser
  2002-12-18 19:08                                           ` Alan Cox
  2002-12-18 19:50                                           ` Oliver Xymoron
  2 siblings, 1 reply; 176+ messages in thread
From: Jeff Garzik @ 2002-12-18 18:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox,
	Andrew Morton

Linus Torvalds wrote:
> On Wed, 18 Dec 2002, Dave Jones wrote:
>>The approval process does seem to be quite a lot of work though.
>>I think it was rth last year at OLS who told me that at that time
>>he'd been doing more approving of other peoples stuff than coding himself.
> 
> 
> I heartily disagree with the approval process for development, just
> because it gets so much in the way and just annoys people. But for
> stabilization, that's exactly what you want. So I think gcc is using the
> approval process much too much, but apparently it works for them.


gcc's approval process looks a lot like the Linux approval process. 
Dave's description of rth's work sounds a lot like the Linus Role in 
Linux...  with the exception I guess that there are multiple peer Linii 
in gcc, and they read every patch <runs for cover>  More seriously, gcc 
appears to be "post the patch to gcc-patches, hope someone applies it" 
which is a lot more like Linux than some think :)

	Jeff




^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 18:03                                           ` Jeff Garzik
@ 2002-12-18 18:09                                             ` Mike Dresser
  2002-12-23 12:34                                               ` Kai Henningsen
  0 siblings, 1 reply; 176+ messages in thread
From: Mike Dresser @ 2002-12-18 18:09 UTC (permalink / raw)
  To: linux-kernel

On Wed, 18 Dec 2002, Jeff Garzik wrote:

> Linux...  with the exception I guess that there are multiple peer Linii

Perhaps this is the solution.  Would someone please obtain a DNA sample
from Linus?

Mike


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
                                                         ` (3 preceding siblings ...)
  2002-12-18 17:08                                       ` Andrew Morton
@ 2002-12-18 18:25                                       ` John Alvord
  4 siblings, 0 replies; 176+ messages in thread
From: John Alvord @ 2002-12-18 18:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox,
	Andrew Morton

On Wed, 18 Dec 2002 08:49:37 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> wrote:

>
>
>On Wed, 18 Dec 2002, Dave Jones wrote:
>> On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
>>  > [Extremely interesting new syscall mechanism tread elided]
>>  >
>>  > What happened to "feature freeze"?
>>
>> *bites lip* it's fairly low impact *duck*.
>
>However, it's a fair question.
>
>I've been wondering how to formalize patch acceptance at code freeze, but
>it might be a good idea to start talking about some way to maybe put
>brakes on patches earlier, ie some kind of "required approval process".
>
>I think the system call thing is very localized and thus not a big issue,
>but in general we do need to have something in place.
>
>I just don't know what that "something" should be. Any ideas? I thought
>about the code freeze require buy-in from three of four people (me, Alan,
>Dave and Andrew come to mind) for a patch to go in, but that's probably
>too draconian for now. Or is it (maybe start with "needs approval by two"
>and switch it to three when going into code freeze)?
>
>			Linus

I think there should be a distinction between changes which make an
API change/new function/user interface change, versus bug fixes,
adapting to new APIs, documentation, etc.

john alvord

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 16:41                                   ` Dave Jones
  2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
@ 2002-12-18 18:41                                     ` Horst von Brand
  1 sibling, 0 replies; 176+ messages in thread
From: Horst von Brand @ 2002-12-18 18:41 UTC (permalink / raw)
  To: Dave Jones, Horst von Brand, Linus Torvalds, linux-kernel

Dave Jones <davej@codemonkey.org.uk> said:
> On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
>  > [Extremely interesting new syscall mechanism tread elided]
>  > 
>  > What happened to "feature freeze"?

> *bites lip* it's fairly low impact *duck*.
> Given the wins seem to be fairly impressive across the board, spending
> a few days on getting this right isn't going to push 2.6 back any
> noticable amount of time.

Ever hear Larry McVoy's [I think, please correct me if wrong] standard
rant of how $UNIX_FROM_BIG_VENDOR sucks, one "almost unnoticeable
performance impact" feature at a time? 

Similarly, Fred Brooks tells in "The Mythical Man Month" how schedules
don't slip by months, they slip a day at a time...

> This stuff is mostly of the case "it either works, or it doesn't".
> And right now, corner cases like apm aside, it seems to be holding up
> so far. This isn't as far reaching as it sounds. There are still
> drivers being turned upside down which are changing things in a
> lot bigger ways than this[1]
> 
> 		Dave
> 
> [1] Myself being one of the guilty parties there, wrt agp.

Fixing a broken feature is in for me. Adding new features is supposed to be
out until 2.7 opens.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 17:41                                         ` Linus Torvalds
  2002-12-18 18:03                                           ` Jeff Garzik
@ 2002-12-18 19:08                                           ` Alan Cox
  2002-12-18 19:23                                             ` Larry McVoy
  2002-12-19  5:34                                             ` Timothy D. Witham
  2002-12-18 19:50                                           ` Oliver Xymoron
  2 siblings, 2 replies; 176+ messages in thread
From: Alan Cox @ 2002-12-18 19:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox,
	Andrew Morton

> And I think it could work for the kernel too, especially the stable
> releases and for the process of getting there. I just don't really know
> how to set it up well.

A start might be

1.	Ack large patches you don't want with "Not for 2.6" instead
	of ignoring them. I'm bored of seeing the 18th resend of 
	this and that wildly bogus patch. 

	Then people know the status

2.	Apply patches only after they have been approved by the maintainer
	of that code area.

	Where it is core code run it past Andrew, Al and other people
	with extremely good taste.

3.	Anything which changes core stuff and needs new tools, setup
	etc please just say NO to for now. Modules was a mistake (hindsight
	I grant is a great thing), but its done. We don't want any more


4.	Violate 1-3 when appropriate as always, but preferably not to
	often and after consulting the good taste department 8)

Alan

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 14:10                                     ` Horst von Brand
  2002-12-18 14:51                                       ` dada1
@ 2002-12-18 19:12                                       ` Mark Mielke
  1 sibling, 0 replies; 176+ messages in thread
From: Mark Mielke @ 2002-12-18 19:12 UTC (permalink / raw)
  To: Horst von Brand; +Cc: linux-kernel

On Wed, Dec 18, 2002 at 11:10:50AM -0300, Horst von Brand wrote:
> Sean Neakums <sneakums@zork.net> said:
> > How are system calls a new feature?  Or is optimizing an existing
> > feature not allowed by your definition of "feature freeze"?
> This "optimizing" is very much userspace-visible, and a radical change in
> an interface this fundamental counts as a new feature in my book.

Since operating systems like WIN32 are at least published to take
advantage of SYSENTER, it may not be in Linux's interest to
purposefully use a slower interface until 2.8 (how long will that be
until people can use?). The last thing I want to read about in a
technical journal is how WIN32 has lower system call overhead than
Linux on IA-32 architectures. That might just be selfish of me for
the Linux community... :-)

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:08                                           ` Alan Cox
@ 2002-12-18 19:23                                             ` Larry McVoy
  2002-12-18 19:30                                               ` Alan Cox
  2002-12-19  5:34                                             ` Timothy D. Witham
  1 sibling, 1 reply; 176+ messages in thread
From: Larry McVoy @ 2002-12-18 19:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Dave Jones, Horst von Brand, linux-kernel,
	Andrew Morton

Make it async.  So anyone can review stuff and record their feelings in a
centralized place.  We have a spare machine set up, kernel.bkbits.net, 
that could be used as a dumping grounds for patches and reviews if
master.kernel.org is too locked down.

If you force the review process into a "push" model where patches are 
sent to someone, then you are stuck waiting for them to review it and
it may or may not happen.  Do the reviews in a centralized place where
everyone can see them and add their own comments.

On Wed, Dec 18, 2002 at 02:08:02PM -0500, Alan Cox wrote:
> > And I think it could work for the kernel too, especially the stable
> > releases and for the process of getting there. I just don't really know
> > how to set it up well.
> 
> A start might be
> 
> 1.	Ack large patches you don't want with "Not for 2.6" instead
> 	of ignoring them. I'm bored of seeing the 18th resend of 
> 	this and that wildly bogus patch. 
> 
> 	Then people know the status
> 
> 2.	Apply patches only after they have been approved by the maintainer
> 	of that code area.
> 
> 	Where it is core code run it past Andrew, Al and other people
> 	with extremely good taste.
> 
> 3.	Anything which changes core stuff and needs new tools, setup
> 	etc please just say NO to for now. Modules was a mistake (hindsight
> 	I grant is a great thing), but its done. We don't want any more
> 
> 
> 4.	Violate 1-3 when appropriate as always, but preferably not to
> 	often and after consulting the good taste department 8)
> 
> Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:23                                             ` Larry McVoy
@ 2002-12-18 19:30                                               ` Alan Cox
  2002-12-18 19:33                                                 ` Larry McVoy
  0 siblings, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-18 19:30 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Alan Cox, Linus Torvalds, Dave Jones, Horst von Brand,
	linux-kernel, Andrew Morton

We've got one - its called linux-kernel.

Alan

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:30                                               ` Alan Cox
@ 2002-12-18 19:33                                                 ` Larry McVoy
  2002-12-18 19:42                                                   ` Alan Cox
  0 siblings, 1 reply; 176+ messages in thread
From: Larry McVoy @ 2002-12-18 19:33 UTC (permalink / raw)
  To: Alan Cox
  Cc: Larry McVoy, Linus Torvalds, Dave Jones, Horst von Brand,
	linux-kernel, Andrew Morton

On Wed, Dec 18, 2002 at 02:30:48PM -0500, Alan Cox wrote:
> We've got one - its called linux-kernel.

Huh?  That's like saying "we don't need a bug database, we have a mailing
list".  That's patently wrong and so is your statement.  If you want 
reviews you need some place to store them.  A mailing list isn't storage.

You'll do it however you want of course, but you are being stupid about it.
Why is that?
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:33                                                 ` Larry McVoy
@ 2002-12-18 19:42                                                   ` Alan Cox
  2002-12-18 19:45                                                     ` Larry McVoy
  0 siblings, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-18 19:42 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Alan Cox, Larry McVoy, Linus Torvalds, Dave Jones,
	Horst von Brand, linux-kernel, Andrew Morton

> On Wed, Dec 18, 2002 at 02:30:48PM -0500, Alan Cox wrote:
> > We've got one - its called linux-kernel.
> 
> Huh?  That's like saying "we don't need a bug database, we have a mailing
> list".  That's patently wrong and so is your statement.  If you want 
> reviews you need some place to store them.  A mailing list isn't storage.
> 
> You'll do it however you want of course, but you are being stupid about it.
> Why is that?

We've got a bug database (bugzilla), we've got a system for seeing what opinion
appears to be -kernel-list



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:42                                                   ` Alan Cox
@ 2002-12-18 19:45                                                     ` Larry McVoy
  2002-12-18 20:39                                                       ` John Bradford
  0 siblings, 1 reply; 176+ messages in thread
From: Larry McVoy @ 2002-12-18 19:45 UTC (permalink / raw)
  To: Alan Cox
  Cc: Larry McVoy, Linus Torvalds, Dave Jones, Horst von Brand,
	linux-kernel, Andrew Morton

On Wed, Dec 18, 2002 at 02:42:51PM -0500, Alan Cox wrote:
> > On Wed, Dec 18, 2002 at 02:30:48PM -0500, Alan Cox wrote:
> > > We've got one - its called linux-kernel.
> > 
> > Huh?  That's like saying "we don't need a bug database, we have a mailing
> > list".  That's patently wrong and so is your statement.  If you want 
> > reviews you need some place to store them.  A mailing list isn't storage.
> > 
> > You'll do it however you want of course, but you are being stupid about it.
> > Why is that?
> 
> We've got a bug database (bugzilla), we've got a system for seeing what opinion
> appears to be -kernel-list

And exactly how is your statement different than

    "we have a system for seeing what bugs appear to be -kernel-list"

?
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 17:41                                         ` Linus Torvalds
  2002-12-18 18:03                                           ` Jeff Garzik
  2002-12-18 19:08                                           ` Alan Cox
@ 2002-12-18 19:50                                           ` Oliver Xymoron
  2 siblings, 0 replies; 176+ messages in thread
From: Oliver Xymoron @ 2002-12-18 19:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Horst von Brand, linux-kernel, Alan Cox,
	Andrew Morton

On Wed, Dec 18, 2002 at 09:41:15AM -0800, Linus Torvalds wrote:
> 
> > The approval process does seem to be quite a lot of work though.
> > I think it was rth last year at OLS who told me that at that time
> > he'd been doing more approving of other peoples stuff than coding himself.
> 
> I heartily disagree with the approval process for development, just
> because it gets so much in the way and just annoys people. But for
> stabilization, that's exactly what you want. So I think gcc is using the
> approval process much too much, but apparently it works for them.
> 
> And I think it could work for the kernel too, especially the stable
> releases and for the process of getting there. I just don't really know
> how to set it up well.

Actually, I think Marcello's got the stable process pretty well
figured out without any of this committee business. And given that his
credibility as 2.4 maintainer depends on his holding to the mandate to
make the kernel stable, he probably doesn't have too hard a time
holding the line. As benevolent dictator, you're simply not beholden
to such expectations and I doubt the committee approach would work for
long either.

So perhaps you should throw out a date for 'code freeze' and then plan to
hand off to the 2.6 maintainer at that date. 

The other piece that will help is if the timeline for 2.7 shows up
around then and is short enough so that people won't despair of ever
getting their big feature in.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:45                                                     ` Larry McVoy
@ 2002-12-18 20:39                                                       ` John Bradford
  2002-12-18 22:08                                                         ` Larry McVoy
  0 siblings, 1 reply; 176+ messages in thread
From: John Bradford @ 2002-12-18 20:39 UTC (permalink / raw)
  To: Larry McVoy; +Cc: alan, lm, torvalds, davej, vonbrand, linux-kernel, akpm

> On Wed, Dec 18, 2002 at 02:42:51PM -0500, Alan Cox wrote:
> > > On Wed, Dec 18, 2002 at 02:30:48PM -0500, Alan Cox wrote:
> > > > We've got one - its called linux-kernel.
> > > 
> > > Huh?  That's like saying "we don't need a bug database, we have a mailing
> > > list".  That's patently wrong and so is your statement.  If you want 
> > > reviews you need some place to store them.  A mailing list isn't storage.
> > > 
> > > You'll do it however you want of course, but you are being
> > > stupid about it.
> > > Why is that?
> > 
> > We've got a bug database (bugzilla), we've got a system for seeing
> > what opinion appears to be -kernel-list
> 
> And exactly how is your statement different than
> 
>     "we have a system for seeing what bugs appear to be -kernel-list"
> 
> ?

This forthcoming BK-related flamewar falls in to category 1, I.E. is
not a 2.6 feature :-)

John.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* RE: Freezing.. (was Re: Intel P6 vs P7 system call performance)
@ 2002-12-18 22:00 Nakajima, Jun
  0 siblings, 0 replies; 176+ messages in thread
From: Nakajima, Jun @ 2002-12-18 22:00 UTC (permalink / raw)
  To: Linus Torvalds, Dave Jones
  Cc: Horst von Brand, linux-kernel, Alan Cox, Andrew Morton,
	Saxena, Sunil, Mallick, Asit K

BTW, in terms of validation, I think we might want to compare the results from LTP (http://ltp.sourceforge.net/), for example, by having it run on the two setups (sysenter/sysexit and int/iret). 

Jun

> -----Original Message-----
> From: Linus Torvalds [mailto:torvalds@transmeta.com]
> Sent: Wednesday, December 18, 2002 8:50 AM
> To: Dave Jones
> Cc: Horst von Brand; linux-kernel@vger.kernel.org; Alan Cox; Andrew Morton
> Subject: Freezing.. (was Re: Intel P6 vs P7 system call performance)
> 
> 
> 
> On Wed, 18 Dec 2002, Dave Jones wrote:
> > On Wed, Dec 18, 2002 at 10:40:24AM -0300, Horst von Brand wrote:
> >  > [Extremely interesting new syscall mechanism tread elided]
> >  >
> >  > What happened to "feature freeze"?
> >
> > *bites lip* it's fairly low impact *duck*.
> 
> However, it's a fair question.
> 
> I've been wondering how to formalize patch acceptance at code freeze, but
> it might be a good idea to start talking about some way to maybe put
> brakes on patches earlier, ie some kind of "required approval process".
> 
> I think the system call thing is very localized and thus not a big issue,
> but in general we do need to have something in place.
> 
> I just don't know what that "something" should be. Any ideas? I thought
> about the code freeze require buy-in from three of four people (me, Alan,
> Dave and Andrew come to mind) for a patch to go in, but that's probably
> too draconian for now. Or is it (maybe start with "needs approval by two"
> and switch it to three when going into code freeze)?
> 
> 			Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 20:39                                                       ` John Bradford
@ 2002-12-18 22:08                                                         ` Larry McVoy
  2002-12-18 22:37                                                           ` John Bradford
                                                                             ` (2 more replies)
  0 siblings, 3 replies; 176+ messages in thread
From: Larry McVoy @ 2002-12-18 22:08 UTC (permalink / raw)
  To: John Bradford
  Cc: Larry McVoy, alan, torvalds, davej, vonbrand, linux-kernel, akpm

> > And exactly how is your statement different than
> > 
> >     "we have a system for seeing what bugs appear to be -kernel-list"
> > ?
> 
> This forthcoming BK-related flamewar falls in to category 1, I.E. is
> not a 2.6 feature :-)

I don't understand why BK is part of the conversation.  It has nothing to
do with it.  If every time I post to this list the assumption is that it's
"time to beat larry up about BK" then it's time for me to get off the list.

I can understand it when we're discussing BK; other than that, it's pretty
friggin lame.  If that's what was behind your posts, Alan, there is an
easy procmail fix for that.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 22:08                                                         ` Larry McVoy
@ 2002-12-18 22:37                                                           ` John Bradford
  2002-12-19  1:09                                                             ` Alan Cox
  2002-12-19  0:08                                                           ` Alan Cox
  2002-12-19 13:17                                                           ` Stephen Satchell
  2 siblings, 1 reply; 176+ messages in thread
From: John Bradford @ 2002-12-18 22:37 UTC (permalink / raw)
  To: Larry McVoy; +Cc: lm, alan, torvalds, davej, vonbrand, linux-kernel, akpm

> > This forthcoming BK-related flamewar falls in to category 1, I.E. is
> > not a 2.6 feature :-)
> 
> I don't understand why BK is part of the conversation.  It has nothing to
> do with it.  If every time I post to this list the assumption is that it's
> "time to beat larry up about BK" then it's time for me to get off
> the list.
> I can understand it when we're discussing BK; other than that, it's pretty
> friggin lame.  If that's what was behind your posts, Alan, there is an
> easy procmail fix for that.

My interpretation was that that is what he meant.  If I was wrong, I
appologise.

I was trying to point out in an amusing way that a repeat of the BK
flamewar we've seen on LKML was inappropriate.

John.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  5:55           ` Linus Torvalds
                               ` (3 preceding siblings ...)
  2002-12-17 16:12             ` Hugh Dickins
@ 2002-12-18 23:51             ` Pavel Machek
  4 siblings, 0 replies; 176+ messages in thread
From: Pavel Machek @ 2002-12-18 23:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Ingo Molnar, linux-kernel, hpa

Hi!

> > (Modulo the missing syscall page I already mentioned and potential bugs
> > in the code itself, of course ;)
> 
> Ok, I did the vsyscall page too, and tried to make it do the right thing
> (but I didn't bother to test it on a non-SEP machine).
> 
> I'm pushing the changes out right now, but basically it boils down to the
> fact that with these changes, user space can instead of doing an
> 
> 	int $0x80
> 
> instruction for a system call just do a
> 
> 	call 0xfffff000
> 
> instead. The vsyscall page will be set up to use sysenter if the CPU
> supports it, and if it doesn't, it will just do the old "int $0x80"
> instead (and it could use the AMD syscall instruction if it wants to).
> User mode shouldn't know or care, the calling convention is the same as it
> ever was.

Perhaps it makes sense to define that gettimeofday is done by

	call 0xfffff100,

NOW? So we can add vsyscalls later?
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:43               ` dean gaudet
  2002-12-17 16:50                 ` Linus Torvalds
  2002-12-17 19:11                 ` H. Peter Anvin
@ 2002-12-18 23:53                 ` Pavel Machek
  2002-12-19 22:18                   ` H. Peter Anvin
  2 siblings, 1 reply; 176+ messages in thread
From: Pavel Machek @ 2002-12-18 23:53 UTC (permalink / raw)
  To: dean gaudet; +Cc: Linus Torvalds, Dave Jones, Ingo Molnar, linux-kernel, hpa

Hi!

> > It's not as good as a pure user-mode solution using tsc could be, but
> > we've seen the kinds of complexities that has with multi-CPU systems, and
> > they are so painful that I suspect the sysenter approach is a lot more
> > palatable even if it doesn't allow for the absolute best theoretical
> > numbers.
> 
> don't many of the multi-CPU problems with tsc go away because you've got a
> per-cpu physical page for the vsyscall?
> 
> i.e. per-cpu tsc epoch and scaling can be set on that page.

Problem is that cpu's can randomly drift +/- 100 clocks or so... Not
nice at all.
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 15:12               ` Alan Cox
@ 2002-12-18 23:55                 ` Pavel Machek
  2002-12-19 22:17                   ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Pavel Machek @ 2002-12-18 23:55 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andre Hedrick, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List, hpa

Hi!

> > Are you serious about moving of the banging we currently do on 0x80?
> > If so, I have a P4 development board with leds to monitor all the lower io
> > ports and can decode for you.
> 
> Different thing - int 0x80 syscall not i/o port 80. I've done I/O port
> 80 (its very easy), but requires we set up some udelay constants with an
> initial safety value right at boot (which we should do - we udelay
> before it is initialised)

Actually that would be nice -- I have leds on 0x80 too ;-).
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17 10:53             ` Ulrich Drepper
  2002-12-17 11:17               ` dada1
  2002-12-17 17:06               ` Linus Torvalds
@ 2002-12-18 23:59               ` Pavel Machek
  2 siblings, 0 replies; 176+ messages in thread
From: Pavel Machek @ 2002-12-18 23:59 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linus Torvalds, Dave Jones, Ingo Molnar, linux-kernel, hpa

Hi!

> I've created a modified glibc which uses the syscall code for almost
> everything.  There are a few int $0x80 left here and there but mostly it
> is a centralized change.
> 
> The result: all works as expected.  Nice.
> 
> On my test machine your little test program performs the syscalls on
> roughly twice as fast (HT P4, pretty new).  Your numbers are perhaps for
> the P4 Xeons.  Anyway, when measuring some more involved code (I ran my
> thread benchmark) I got only about 3% performance increase.  It's doing
> a fair amount of system calls.  But again, the good news is your code
> survived even this stress test.
> 
> 
> The problem with the current solution is the instruction set of the x86.
>  In your test code you simply use call 0xfffff000 and it magically work.
>  But this is only the case because your program is linked statically.
> 
> For the libc DSO I had to play some dirty tricks.  The x86 CPU has no
> absolute call.  The variant with an immediate parameter is a relative
> jump.  Only when jumping through a register or memory location is it
> possible to jump to an absolute address.  To be clear, if I have
> 
>     call 0xfffff000
> 
> in a DSO which is loaded at address 0x80000000 the jumps ends at
> 0x7fffffff.  The problem is that the static linker doesn't know the load
> address.  We could of course have the dynamic linker fix up the
> addresses but this is plain stupid.  It would mean fixing up a lot of
> places and making of those pages covered non-sharable.

Can't you do call far __SOME_CS, 0xfffff000?

								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 22:08                                                         ` Larry McVoy
  2002-12-18 22:37                                                           ` John Bradford
@ 2002-12-19  0:08                                                           ` Alan Cox
  2002-12-19  0:53                                                             ` John Bradford
  2002-12-19 13:17                                                           ` Stephen Satchell
  2 siblings, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-19  0:08 UTC (permalink / raw)
  To: Larry McVoy
  Cc: John Bradford, Larry McVoy, alan, torvalds, davej, vonbrand,
	linux-kernel, akpm

> I don't understand why BK is part of the conversation.  It has nothing to
> do with it.  If every time I post to this list the assumption is that it's
> "time to beat larry up about BK" then it's time for me to get off the list.
> 
> I can understand it when we're discussing BK; other than that, it's pretty
> friggin lame.  If that's what was behind your posts, Alan, there is an
> easy procmail fix for that.

It wasnt me who brought up bitkeeper

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 12:57                             ` Rogier Wolff
@ 2002-12-19  0:14                               ` Pavel Machek
  0 siblings, 0 replies; 176+ messages in thread
From: Pavel Machek @ 2002-12-19  0:14 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Linus Torvalds, Ulrich Drepper, Alan Cox, Matti Aarnio,
	Hugh Dickins, Dave Jones, Ingo Molnar, Linux Kernel Mailing List,
	hpa

Hi!

> > > But this is exactly what I expect to happen.  If you want to implement
> > > gettimeofday() at user-level you need to modify the page.
> > 
> > Note that I really don't think we ever want to do the user-level
> > gettimeofday(). The complexity just argues against it, it's better to try
> > to make system calls be cheap enough that you really don't care.
> 
> I'd say that this should not be "fixed" from userspace, but from the
> kernel. Thus if the kernel knows that the "gettimeofday" can be made
> faster by doing it completely in userspace, then that system call
> should be "patched" by the kernel to do it faster for everybody.
> 
> Next, someone might find a faster (full-userspace) way to do some
> "reads"(*). Then it might pay to check for that specific
> filedescriptor in userspace, and only call into the kernel for the
> other filedescriptors. The idea is that the kernel knows best when
> optimizations are possible.
> 
> Thus that ONE magic address is IMHO not the right way to do it. By
> demultiplexing the stuff in userspace, you can do "sane" things with
> specific syscalls. 
> 
> So for example, the code at 0xffff80000 would be: 
> 	mov 0x00,%eax
> 	int $80
> 	ret
> 
> (in the case where sysenter & friends is not available)

This could save that one register needed for 6-args syscalls. If code
at 0xffff8000 was mov %ebp, %eax; sysenter; ret for P4, you could do
6-args syscalls this way.
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
       [not found] <20021218161506.M7976@work.bitmover.com>
@ 2002-12-19  0:18 ` Alan Cox
  2002-12-19  0:37   ` Larry McVoy
  0 siblings, 1 reply; 176+ messages in thread
From: Alan Cox @ 2002-12-19  0:18 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Alan Cox, torvalds, davej, linux-kernel

> > > I can understand it when we're discussing BK; other than that, it's pretty
> > > friggin lame.  If that's what was behind your posts, Alan, there is an
> > > easy procmail fix for that.
> > 
> > It wasnt me who brought up bitkeeper
> 
> PLONK.  Into kernel-spam you go.  I've had it with ax grinders.

Oh dear me. Larry McVoy has flipped

I'm now being added to his spam list for *not* mentioning bitkeeper

Poor Larry, I hope has a nice christmas break, he clearly needs it


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  1:09                                                             ` Alan Cox
@ 2002-12-19  0:37                                                               ` Russell King
  2002-12-19  0:58                                                                 ` Jeff Garzik
                                                                                   ` (2 more replies)
  2002-12-19  0:59                                                               ` John Bradford
  2002-12-19  1:17                                                               ` Linus Torvalds
  2 siblings, 3 replies; 176+ messages in thread
From: Russell King @ 2002-12-19  0:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Thu, Dec 19, 2002 at 01:09:17AM +0000, Alan Cox wrote:
> How the actual patches get applied really isnt relevant. I know Linus
> hated jitterbug, Im guessing he hates bugzilla too ?

I'm waiting for the kernel bugzilla to become useful - currently the
record for me has been:

3 bugs total
3 bugs for serial code for drivers I don't maintain, reassigned to mbligh.

This means I write (choose one):

1. non-buggy code (highly unlikely)
2. code no one tests
3. code people do test but report via other means (eg, email, irc)

If it's (3), which it seems to be, it means that bugzilla is failing to
do its job properly, which is most unfortunate.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  0:18 ` Alan Cox
@ 2002-12-19  0:37   ` Larry McVoy
  0 siblings, 0 replies; 176+ messages in thread
From: Larry McVoy @ 2002-12-19  0:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Wed, Dec 18, 2002 at 07:18:44PM -0500, Alan Cox wrote:
> > > > I can understand it when we're discussing BK; other than that, it's pretty
> > > > friggin lame.  If that's what was behind your posts, Alan, there is an
> > > > easy procmail fix for that.
> > > 
> > > It wasnt me who brought up bitkeeper
> > 
> > PLONK.  Into kernel-spam you go.  I've had it with ax grinders.
> 
> Oh dear me. Larry McVoy has flipped
> 
> I'm now being added to his spam list for *not* mentioning bitkeeper
> 
> Poor Larry, I hope has a nice christmas break, he clearly needs it

Look, Alan and anyone else, I'm sort of sick of the flames about BK.
It's apparent that there will always be people who are looking for
excuses to attack BK because it isn't GPLed and how dare the kernel
hackers use it.  Your mail was so senseless that that was the only sane
explanation I could find and apparently I wasn't being paranoid, that's
what John thought as well.

I have a bad habit of taking things personally and too seriously and
the result is that attacks on me/BK/whatever, imagined or real, stress
me out and waste my time.  Life's too short to for me to deal with that
nonsense anymore.  I discovered procmail and I dump people into a spam
file if I feel they have a track record of yanking my chain.  It's my
fault that I'm such a wuss that I can't handle it but this works.
It's not personal, it's about having a more pleasant life and I find
things to be more pleasant without the flames.

I'll still read your mail, I do so about every 2 weeks, but that way 
whatever yankage you were (or were not) trying to do is in the past and
I'll ignore it.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  0:08                                                           ` Alan Cox
@ 2002-12-19  0:53                                                             ` John Bradford
  0 siblings, 0 replies; 176+ messages in thread
From: John Bradford @ 2002-12-19  0:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: lm, torvalds, davej, vonbrand, linux-kernel, akpm

> > I don't understand why BK is part of the conversation.  It has nothing to
> > do with it.  If every time I post to this list the assumption is that it's
> > "time to beat larry up about BK" then it's time for me to get off the list.
> > 
> > I can understand it when we're discussing BK; other than that, it's pretty
> > friggin lame.  If that's what was behind your posts, Alan, there is an
> > easy procmail fix for that.
> 
> It wasnt me who brought up bitkeeper
> 

No, it's my fault - I was skimming through list traffic, and not
concentrating, (proof of this is the fact that I've had sendmail
configured incorrectly all day, and been posting from the wrong
address, and only just realised :-) ).

I saw Larry mention kernel.bkbits.net, and Alan say, "We've got one -
its called linux-kernel", (in a separate message without quoting
anything, so it's really your fault :-) :-) :-) ), and assumed that a
BK argument was imminent, and I made a joke comment that it, (an
argument), was not a 2.6 required feature.

Sorry about the wasted bandwidth, I'll stop posting as it's now past
midnight, and I obviously need sleep.

Oh, 2.4.20-pre2 compiled OK for me, I hope that proves I've done
something useful tonight.

John.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  0:37                                                               ` Russell King
@ 2002-12-19  0:58                                                                 ` Jeff Garzik
  2002-12-19  1:43                                                                 ` Martin J. Bligh
  2002-12-19 10:50                                                                 ` Dave Jones
  2 siblings, 0 replies; 176+ messages in thread
From: Jeff Garzik @ 2002-12-19  0:58 UTC (permalink / raw)
  To: Alan Cox, Linux Kernel Mailing List

On Thu, Dec 19, 2002 at 12:37:40AM +0000, Russell King wrote:
> This means I write (choose one):
> 3. code people do test but report via other means (eg, email, irc)

> If it's (3), which it seems to be, it means that bugzilla is failing to
> do its job properly, which is most unfortunate.

Given that it started around Halloween, I would at least give it a
chance before claiming its failure.  :)

IMO Bugzilla is gonna become even more useful as the code freeze hits,
and there are bugs we want to track until we get around to fixing
them...

	Jeff





^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  1:09                                                             ` Alan Cox
  2002-12-19  0:37                                                               ` Russell King
@ 2002-12-19  0:59                                                               ` John Bradford
  2002-12-19 10:27                                                                 ` Dave Jones
  2002-12-19  1:17                                                               ` Linus Torvalds
  2 siblings, 1 reply; 176+ messages in thread
From: John Bradford @ 2002-12-19  0:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: lm, lm, torvalds, davej, vonbrand, linux-kernel, akpm

> > I was trying to point out in an amusing way that a repeat of the BK
> > flamewar we've seen on LKML was inappropriate.
> 
> I got the joke but I don't have a US postal address 8)

Eh???  US postal address?  What!?  Now I am really confused.

> More seriously we have defect tracking now - > bugzilla.kernel.org
> We have an advanced scalable groupware communication environment (email)
> 
> How the actual patches get applied really isnt relevant. I know Linus
> hated jitterbug, Im guessing he hates bugzilla too ?

I don't like bugzilla particularly, it's too clunky, and it's
difficult to check that you are not entering a duplicate bug when the
database gets too big.  Maybe that's just my opinion, though.  Maybe I
should write a better bug tracking system...

John.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
@ 2002-12-19  1:08 Adam J. Richter
  2002-12-19  9:13 ` Russell King
  0 siblings, 1 reply; 176+ messages in thread
From: Adam J. Richter @ 2002-12-19  1:08 UTC (permalink / raw)
  To: rmk; +Cc: linux-kernel

Russell King wrote:
>I'm waiting for the kernel bugzilla to become useful - currently the
>record for me has been:
>
>3 bugs total
>3 bugs for serial code for drivers I don't maintain, reassigned to mbligh.
>
>This means I write (choose one):
>
>1. non-buggy code (highly unlikely)
>2. code no one tests
>3. code people do test but report via other means (eg, email, irc)
>
>If it's (3), which it seems to be, it means that bugzilla is failing to
>do its job properly, which is most unfortunate.

	I don't currently use bugzilla (just due to inertia), but the
whole world doesn't have to switch to something overnight in order for
that facility to end up saving more time and resources than it has
cost.  Adoption can grow gradually, and it's probably easier to work
out bugs (in bugzilla) and improvements that way anyhow.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 22:37                                                           ` John Bradford
@ 2002-12-19  1:09                                                             ` Alan Cox
  2002-12-19  0:37                                                               ` Russell King
                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 176+ messages in thread
From: Alan Cox @ 2002-12-19  1:09 UTC (permalink / raw)
  To: John Bradford
  Cc: Larry McVoy, lm, alan, Linus Torvalds, davej, vonbrand,
	Linux Kernel Mailing List, akpm

On Wed, 2002-12-18 at 22:37, John Bradford wrote: 
> I was trying to point out in an amusing way that a repeat of the BK
> flamewar we've seen on LKML was inappropriate.

I got the joke but I don't have a US postal address 8)

More seriously we have defect tracking now - > bugzilla.kernel.org
We have an advanced scalable groupware communication environment (email)

How the actual patches get applied really isnt relevant. I know Linus
hated jitterbug, Im guessing he hates bugzilla too ?


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  1:09                                                             ` Alan Cox
  2002-12-19  0:37                                                               ` Russell King
  2002-12-19  0:59                                                               ` John Bradford
@ 2002-12-19  1:17                                                               ` Linus Torvalds
  2 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-19  1:17 UTC (permalink / raw)
  To: Alan Cox
  Cc: John Bradford, Larry McVoy, lm, alan, davej, vonbrand,
	Linux Kernel Mailing List, akpm

On 19 Dec 2002, Alan Cox wrote:
> 
> How the actual patches get applied really isnt relevant. I know Linus
> hated jitterbug, Im guessing he hates bugzilla too ?

I didn't start out hating jitterbug, I tried it for a while.

I ended up not really being able to work with anything that was so
email-hostile. You had to click on things from a browser, write passwords,
and generally just act "gooey", instead of getting things just _done_.

If I can't do my work by email from a standard keyboard interface, it's
just not worth it. Maybe bugzilla works better, but I seriously expect it
to help _others_ track bugs more than it helps me.

Which is fine. We don't all have to agree on the tools or on how to track 
stuff. The worst we can do (I think) is to _force_ people to work some 
way.

[ This is where the angel chorus behind me started singing "Why can't we 
  all live together in peace and harmony" and put up a big banner saying 
  "Larry [heart] Alan". At that point my fever-induced brain just said 
  "plop" ]

		Linus

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  0:37                                                               ` Russell King
  2002-12-19  0:58                                                                 ` Jeff Garzik
@ 2002-12-19  1:43                                                                 ` Martin J. Bligh
  2002-12-19 10:50                                                                 ` Dave Jones
  2 siblings, 0 replies; 176+ messages in thread
From: Martin J. Bligh @ 2002-12-19  1:43 UTC (permalink / raw)
  To: Russell King, Alan Cox; +Cc: Linux Kernel Mailing List

> This means I write (choose one):
> 
> 1. non-buggy code (highly unlikely)
> 2. code no one tests
> 3. code people do test but report via other means (eg, email, irc)
> 
> If it's (3), which it seems to be, it means that bugzilla is failing to
> do its job properly, which is most unfortunate.

Not everyone will end up using it ... if people want to log bugs from
lkml into bugzilla, I think that'd help gather a critical mass.

Are you getting a lot of bug-reports for serial code on lkml? I use it
heavily, and it seems to work just fine to me .... so I pick (1). Yay! ;-)

Some of the bugs in there lie fallow, but I've seen quite a few get fixed.
The fact that some people (Dave Jones springs to mind) trawl through there
being extremely helpful fixing things is very useful ;-) Lots of things got
fixed, though I can't *prove* it was solely due to it being in Bugzilla.

As the list of bugs increases, it'll become an increasingly powerful 
search engine for information as well .... I'll draw up a list of things
that don't seem to being worked on, and mail it out to kernel-janitors
and/or lkml and see if people are interested in fixing some of the fallow
stuff.

M.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 19:08                                           ` Alan Cox
  2002-12-18 19:23                                             ` Larry McVoy
@ 2002-12-19  5:34                                             ` Timothy D. Witham
  2002-12-19  6:43                                               ` Andrew Morton
  2002-12-19  7:05                                               ` Martin J. Bligh
  1 sibling, 2 replies; 176+ messages in thread
From: Timothy D. Witham @ 2002-12-19  5:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Dave Jones, Horst von Brand, linux-kernel,
	Andrew Morton

Related thought:

  One of the things that we are trying to do is to automate 
patch testing.

  The PLM (www.osdl.org/plm) takes every patch that it gets
and does a quick "Does it compile test".  Right now there
are only 4 kernel configuration files that we try but we are
going to be adding more.  We could expand this to 100's 
if needed as it would just be a matter of adding additional
hardware to make the compiles go faster in parallel.

  Here is the example of the output from a baseline kernel.

http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=986

  A patch would look the same.  The PASS reports are really
short and the FAIL reports just give you the configuration 
files and the tail of the output from the kernel make.

 We've talked to a couple of system vendors about expanding
this to take the configurations that have passed and running
them on their 10's of hardware platforms of interest and we 
would be very happy to expand this to a very large number of
configurations of all sorts.

Tim

On Wed, 2002-12-18 at 11:08, Alan Cox wrote:
> > And I think it could work for the kernel too, especially the stable
> > releases and for the process of getting there. I just don't really know
> > how to set it up well.
> 
> A start might be
> 
> 1.	Ack large patches you don't want with "Not for 2.6" instead
> 	of ignoring them. I'm bored of seeing the 18th resend of 
> 	this and that wildly bogus patch. 
> 
> 	Then people know the status
> 
> 2.	Apply patches only after they have been approved by the maintainer
> 	of that code area.
> 
> 	Where it is core code run it past Andrew, Al and other people
> 	with extremely good taste.
> 
> 3.	Anything which changes core stuff and needs new tools, setup
> 	etc please just say NO to for now. Modules was a mistake (hindsight
> 	I grant is a great thing), but its done. We don't want any more
> 
> 
> 4.	Violate 1-3 when appropriate as always, but preferably not to
> 	often and after consulting the good taste department 8)
> 
> Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham <wookie@osdl.org>
Open Sourcre Development Lab, Inc

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  6:43                                               ` Andrew Morton
@ 2002-12-19  5:45                                                 ` Timothy D. Witham
  0 siblings, 0 replies; 176+ messages in thread
From: Timothy D. Witham @ 2002-12-19  5:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Linus Torvalds, Dave Jones, Horst von Brand,
	linux-kernel

On Wed, 2002-12-18 at 22:43, Andrew Morton wrote:
> "Timothy D. Witham" wrote:
> > 
> > Related thought:
> > 
> >   One of the things that we are trying to do is to automate
> > patch testing.
> > 
> >   The PLM (www.osdl.org/plm) takes every patch that it gets
> > and does a quick "Does it compile test".  Right now there
> > are only 4 kernel configuration files that we try but we are
> > going to be adding more.  We could expand this to 100's
> > if needed as it would just be a matter of adding additional
> > hardware to make the compiles go faster in parallel.
> 
> It would be valuable to be able to test that things compile
> cleanly on non-ia32 machines.  And boot, too.
> 
  The way the software is configured it is fairly easy to
add multiple servers (even different instruction sets) that
have the complies farmed out to them.

> That's probably a lot of ongoing work though.

  The largest portion of the work would be keeping
up with the breakages in the trees.

BTW I'm in Japan so my access times are going to be
a little strange.
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  7:05                                               ` Martin J. Bligh
@ 2002-12-19  6:08                                                 ` Timothy D. Witham
  0 siblings, 0 replies; 176+ messages in thread
From: Timothy D. Witham @ 2002-12-19  6:08 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

  Sorry, they changed it last week and my fingers still
have the old firmware. 

  www.osdl.org/cgi-bin/plm

TIm

On Wed, 2002-12-18 at 23:05, Martin J. Bligh wrote:
> > Related thought:
> >
> >   One of the things that we are trying to do is to automate
> > patch testing.
> >
> >   The PLM (www.osdl.org/plm) takes every patch that it gets
> > and does a quick "Does it compile test".  Right now there
> > are only 4 kernel configuration files that we try but we are
> > going to be adding more.  We could expand this to 100's
> > if needed as it would just be a matter of adding additional
> > hardware to make the compiles go faster in parallel.
> 
> URL doesn't seem to work. But would be cool if you had one SMP
> config, one UP with IO/APIC, and one without IO/APIC. I seem
> to break the middle one whenever I write a patch ;-(
> 
> M.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  5:34                                             ` Timothy D. Witham
@ 2002-12-19  6:43                                               ` Andrew Morton
  2002-12-19  5:45                                                 ` Timothy D. Witham
  2002-12-19  7:05                                               ` Martin J. Bligh
  1 sibling, 1 reply; 176+ messages in thread
From: Andrew Morton @ 2002-12-19  6:43 UTC (permalink / raw)
  To: Timothy D. Witham
  Cc: Alan Cox, Linus Torvalds, Dave Jones, Horst von Brand,
	linux-kernel

"Timothy D. Witham" wrote:
> 
> Related thought:
> 
>   One of the things that we are trying to do is to automate
> patch testing.
> 
>   The PLM (www.osdl.org/plm) takes every patch that it gets
> and does a quick "Does it compile test".  Right now there
> are only 4 kernel configuration files that we try but we are
> going to be adding more.  We could expand this to 100's
> if needed as it would just be a matter of adding additional
> hardware to make the compiles go faster in parallel.

It would be valuable to be able to test that things compile
cleanly on non-ia32 machines.  And boot, too.

That's probably a lot of ongoing work though.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  5:34                                             ` Timothy D. Witham
  2002-12-19  6:43                                               ` Andrew Morton
@ 2002-12-19  7:05                                               ` Martin J. Bligh
  2002-12-19  6:08                                                 ` Timothy D. Witham
  1 sibling, 1 reply; 176+ messages in thread
From: Martin J. Bligh @ 2002-12-19  7:05 UTC (permalink / raw)
  To: Timothy D. Witham; +Cc: linux-kernel

> Related thought:
>
>   One of the things that we are trying to do is to automate
> patch testing.
>
>   The PLM (www.osdl.org/plm) takes every patch that it gets
> and does a quick "Does it compile test".  Right now there
> are only 4 kernel configuration files that we try but we are
> going to be adding more.  We could expand this to 100's
> if needed as it would just be a matter of adding additional
> hardware to make the compiles go faster in parallel.

URL doesn't seem to work. But would be cool if you had one SMP
config, one UP with IO/APIC, and one without IO/APIC. I seem
to break the middle one whenever I write a patch ;-(

M.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  1:08 Adam J. Richter
@ 2002-12-19  9:13 ` Russell King
  2002-12-19 16:39   ` Eli Carter
  0 siblings, 1 reply; 176+ messages in thread
From: Russell King @ 2002-12-19  9:13 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: linux-kernel

On Wed, Dec 18, 2002 at 05:08:45PM -0800, Adam J. Richter wrote:
> 	I don't currently use bugzilla (just due to inertia), but the
> whole world doesn't have to switch to something overnight in order for
> that facility to end up saving more time and resources than it has
> cost.  Adoption can grow gradually, and it's probably easier to work
> out bugs (in bugzilla) and improvements that way anyhow.

I'm not asking the world to switch to it overnight.  Just one person
would be nice. 8)

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  0:59                                                               ` John Bradford
@ 2002-12-19 10:27                                                                 ` Dave Jones
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-19 10:27 UTC (permalink / raw)
  To: John Bradford; +Cc: Alan Cox, lm, lm, torvalds, vonbrand, linux-kernel, akpm

On Thu, Dec 19, 2002 at 12:59:20AM +0000, John Bradford wrote:

 > I don't like bugzilla particularly, it's too clunky, and it's
 > difficult to check that you are not entering a duplicate bug when the
 > database gets too big.

File bug anyway and worry about it later. The bugzilla elves regularly
go through the database cleaning up crufty bits, marking dupes,
closing invalids, world peace etc etc. It seems to be holding
up well so far.  Of the 180 bugs filed, I think I've personally rejected
<10 dupes/invalids. Other folks haven't done that many too.
Here's to hoping it continues to remain high signal.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  0:37                                                               ` Russell King
  2002-12-19  0:58                                                                 ` Jeff Garzik
  2002-12-19  1:43                                                                 ` Martin J. Bligh
@ 2002-12-19 10:50                                                                 ` Dave Jones
  2 siblings, 0 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-19 10:50 UTC (permalink / raw)
  To: Alan Cox, Linux Kernel Mailing List; +Cc: Russell King

On Thu, Dec 19, 2002 at 12:37:40AM +0000, Russell King wrote:
 > On Thu, Dec 19, 2002 at 01:09:17AM +0000, Alan Cox wrote:
 > > How the actual patches get applied really isnt relevant. I know Linus
 > > hated jitterbug, Im guessing he hates bugzilla too ?
 > 
 > I'm waiting for the kernel bugzilla to become useful - currently the
 > record for me has been:
 > 
 > 3 bugs total
 > 3 bugs for serial code for drivers I don't maintain, reassigned to mbligh.

That was unfortunate, and you got dumped with those because some thought
"Ah, serial! RMK!".  Some of the categories in bugzilla still need
broadening IMO.

 > This means I write (choose one):
 > 1. non-buggy code (highly unlikely)
 > 2. code no one tests
 > 3. code people do test but report via other means (eg, email, irc)
 > 
 > If it's (3), which it seems to be, it means that bugzilla is failing to
 > do its job properly, which is most unfortunate.

It's early days. The types of bugs being filed still fall into the
"useful" "not useful" categories though.  I don't think it's really
that important that we track what doesn't compile at this stage.
Those reports are being either closed within a few hours of them
being opened with a "Fixed in BK", or are drivers which no-one currently
wants to fix/can fix (Things like the various sti/cli breakage)

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 22:08                                                         ` Larry McVoy
  2002-12-18 22:37                                                           ` John Bradford
  2002-12-19  0:08                                                           ` Alan Cox
@ 2002-12-19 13:17                                                           ` Stephen Satchell
  2 siblings, 0 replies; 176+ messages in thread
From: Stephen Satchell @ 2002-12-19 13:17 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel

At 02:08 PM 12/18/02 -0800, Larry McVoy wrote:
>I don't understand why BK is part of the conversation.  It has nothing to
>do with it.  If every time I post to this list the assumption is that it's
>"time to beat larry up about BK" then it's time for me to get off the list.
>
>I can understand it when we're discussing BK; other than that, it's pretty
>friggin lame.  If that's what was behind your posts, Alan, there is an
>easy procmail fix for that.

Boy, talk about humor-impaired.  When was the last time you got out and had 
some fun not related to computer, Larry?

I don't read more than 95 percent of this mailing list and I got the joke.

Lighten up, and take a hint from the nearest cat: see the toy in everything.

Satch, another relative nobody.


-- 
The human mind treats a new idea the way the body treats a strange
protein:  it rejects it.  -- P. Medawar
This posting is for entertainment purposes only; it is not a legal opinion.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-17  6:18               ` Linus Torvalds
@ 2002-12-19 14:03                 ` Shuji YAMAMURA
  0 siblings, 0 replies; 176+ messages in thread
From: Shuji YAMAMURA @ 2002-12-19 14:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds

Hi,

We've measured gettimeofday() cost on both of Xeon and P3, too.
We also measured them on different kernels (UP and MP).

                Xeon(2GHz)     P3(1GHz)
=========================================
UP kernel       939[ns]       441[ns]
               1878[cycles]   441[cycles]
-----------------------------------------
MP kernel      1054[ns]       485[ns]
               2108[cycles]   485[cycles]
-----------------------------------------
(The kernel version is 2.4.18)

In this experiment, Xeon is two times slower than P3, despite that the
frequency of the Xeon is two times faster.  More over, the performance
difference between UP and MP is very interesting in Xeon case.  The
difference of Xeon (230 cycles) is five times larger than that of P3
(44 cycles).

We think that the instructions with lock prefix in the MP kernel
damage the Xeon performance which serialize operations in an execution
pipeline.  On the P4/Xeon systems, these lock operations should be
avoided as possible as we can.

The following web page shows the details of this experiment.

http://www.labs.fujitsu.com/en/techinfo/linux/lse-0211/index.html

Regards

At Mon, 16 Dec 2002 22:18:27 -0800 (PST),
Linus wrote:
> On Mon, 16 Dec 2002, Linus Torvalds wrote:
> >
> > For gettimeofday(), the results on my P4 are:
> >
> > 	sysenter:	1280.425844 cycles
> > 	int/iret:	2415.698224 cycles
> > 			1135.272380 cycles diff
> > 	factor 1.886637
> >
> > ie sysenter makes that system call almost twice as fast.
> 
> Final comparison for the evening: a PIII looks very different, since the
> system call overhead is much smaller to begin with. On a PIII, the above
> ends up looking like
> 
>    gettimeofday() testing:
> 	sysenter:	561.697236 cycles
> 	int/iret:	686.170463 cycles
> 			124.473227 cycles diff
> 	factor 1.221602

-----
Shuji Yamamura (yamamura@flab.fujitsu.co.jp)
Grid Computing & Bioinformatics Laboratory
Information Technology Core Laboratories
Fujitsu Laboratories LTD.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-19  9:13 ` Russell King
@ 2002-12-19 16:39   ` Eli Carter
  0 siblings, 0 replies; 176+ messages in thread
From: Eli Carter @ 2002-12-19 16:39 UTC (permalink / raw)
  To: Russell King; +Cc: Adam J. Richter, linux-kernel

Russell King wrote:
> On Wed, Dec 18, 2002 at 05:08:45PM -0800, Adam J. Richter wrote:
> 
>>	I don't currently use bugzilla (just due to inertia), but the
>>whole world doesn't have to switch to something overnight in order for
>>that facility to end up saving more time and resources than it has
>>cost.  Adoption can grow gradually, and it's probably easier to work
>>out bugs (in bugzilla) and improvements that way anyhow.
> 
> 
> I'm not asking the world to switch to it overnight.  Just one person
> would be nice. 8)
> 

Ok, Russell, maybe I can lend a small hand there....

You have a bug tracking mechanism of your own on www.arm.linux.org.uk, 
along with a separate patch tracker.
Do you want ARM bug reports in bugzilla instead of your site?  If so, 
can you link to it from that bug tracker page?  (I suppose you'd want to 
  direct people to bugzilla for just 2.5.* and 2.5.*-rmk*)

I submitted a 2.4 bug to your bug tracker, got an answer to the question 
when I posted to the arm mailing lists (thanks!), and submitted a patch 
to the mailing list.  But nothing has happened on the bug status.  I 
asked if you wanted patches for bugs put in the patch tracker or the bug 
tracker, but got no reply.
I understand that you're fighting the Acorn battle of 2.5.50 -> 2.5.52, 
so I'm trying not to sound like I'm complaining.  (Failing, yes, I know, 
sorry. :/ )  Some assurance that you will acknowledge bugs in bugzilla 
would be greatly encouraging to me.  (Such as a reply to this message?)

I'll try to get 2.5 bug reports for ARM into bugzilla based on your 
comments here, but a couple of suggestions:
- post an announcement to the arm lists of where you want which bugs to go,
- link to the same in a prominent place from your bug and patch trackers
- if you can, perhaps give priority in terms of replies and such to 
those who use bugzilla... I value your replies, and if I can do 
something to increase my chances of even getting an "Ack, I'll look at 
it next week", I'll try to do that.

Comments?

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 23:55                 ` Pavel Machek
@ 2002-12-19 22:17                   ` H. Peter Anvin
  0 siblings, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-19 22:17 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, Andre Hedrick, Linus Torvalds, Dave Jones, Ingo Molnar,
	Linux Kernel Mailing List

Pavel Machek wrote:
>>
>>Different thing - int 0x80 syscall not i/o port 80. I've done I/O port
>>80 (its very easy), but requires we set up some udelay constants with an
>>initial safety value right at boot (which we should do - we udelay
>>before it is initialised)
> 
> Actually that would be nice -- I have leds on 0x80 too ;-).
> 								Pavel

We have tried before, and failed.  Phoenix uses something like 0xe2, but
apparently some machines with non-Phoenix BIOSes actually use that port.

	-hpa


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-18 23:53                 ` Pavel Machek
@ 2002-12-19 22:18                   ` H. Peter Anvin
  2002-12-19 22:21                     ` Pavel Machek
  0 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-19 22:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

Pavel Machek wrote:
>>
>>don't many of the multi-CPU problems with tsc go away because you've got a
>>per-cpu physical page for the vsyscall?
>>
>>i.e. per-cpu tsc epoch and scaling can be set on that page.
> 
> Problem is that cpu's can randomly drift +/- 100 clocks or so... Not
> nice at all.
> 

±100 clocks is what... ±50 ns these days?  You can't get that kind of
accuracy for anything outside the CPU core anyway...

	-hpa


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-19 22:18                   ` H. Peter Anvin
@ 2002-12-19 22:21                     ` Pavel Machek
  2002-12-19 22:23                       ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Pavel Machek @ 2002-12-19 22:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, dean gaudet, Linus Torvalds, Dave Jones,
	Ingo Molnar, linux-kernel

Hi!

> >>don't many of the multi-CPU problems with tsc go away because you've got a
> >>per-cpu physical page for the vsyscall?
> >>
> >>i.e. per-cpu tsc epoch and scaling can be set on that page.
> > 
> > Problem is that cpu's can randomly drift +/- 100 clocks or so... Not
> > nice at all.
> > 
> 
> ?100 clocks is what... ?50 ns these days?  You can't get that kind of
> accuracy for anything outside the CPU core anyway...

50ns is bad enough when it makes your time go backwards.

								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-19 22:21                     ` Pavel Machek
@ 2002-12-19 22:23                       ` H. Peter Anvin
  2002-12-19 22:26                         ` Pavel Machek
  0 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-19 22:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

Pavel Machek wrote:
> Hi!
> 
> 
>>>>don't many of the multi-CPU problems with tsc go away because you've got a
>>>>per-cpu physical page for the vsyscall?
>>>>
>>>>i.e. per-cpu tsc epoch and scaling can be set on that page.
>>>
>>>Problem is that cpu's can randomly drift +/- 100 clocks or so... Not
>>>nice at all.
>>>
>>
>>?100 clocks is what... ?50 ns these days?  You can't get that kind of
>>accuracy for anything outside the CPU core anyway...
> 
> 50ns is bad enough when it makes your time go backwards.
> 

Backwards??  Clock spreading should make the rate change, but it should
never decrement.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-19 22:23                       ` H. Peter Anvin
@ 2002-12-19 22:26                         ` Pavel Machek
  2002-12-19 22:30                           ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Pavel Machek @ 2002-12-19 22:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

Hi!

> >>>>don't many of the multi-CPU problems with tsc go away because you've got a
> >>>>per-cpu physical page for the vsyscall?
> >>>>
> >>>>i.e. per-cpu tsc epoch and scaling can be set on that page.
> >>>
> >>>Problem is that cpu's can randomly drift +/- 100 clocks or so... Not
> >>>nice at all.
> >>>
> >>
> >>?100 clocks is what... ?50 ns these days?  You can't get that kind of
> >>accuracy for anything outside the CPU core anyway...
> > 
> > 50ns is bad enough when it makes your time go backwards.
> > 
> 
> Backwards??  Clock spreading should make the rate change, but it should
> never decrement.

User on cpu1 reads time, communicates it to cpu2, but cpu2 is drifted
-50ns, so it reads time "before" time reported cpu1. And gets confused.

								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-19 22:26                         ` Pavel Machek
@ 2002-12-19 22:30                           ` H. Peter Anvin
  2002-12-19 22:34                             ` Pavel Machek
  0 siblings, 1 reply; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-19 22:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

Pavel Machek wrote:
> 
> User on cpu1 reads time, communicates it to cpu2, but cpu2 is drifted
> -50ns, so it reads time "before" time reported cpu1. And gets confused.
> 

How can you get that communication to happen in < 50 ns?

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-19 22:30                           ` H. Peter Anvin
@ 2002-12-19 22:34                             ` Pavel Machek
  2002-12-19 22:36                               ` H. Peter Anvin
  0 siblings, 1 reply; 176+ messages in thread
From: Pavel Machek @ 2002-12-19 22:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, dean gaudet, Linus Torvalds, Dave Jones,
	Ingo Molnar, linux-kernel

Hi!

> > User on cpu1 reads time, communicates it to cpu2, but cpu2 is drifted
> > -50ns, so it reads time "before" time reported cpu1. And gets confused.
> > 
> 
> How can you get that communication to happen in < 50 ns?

I'm not sure I can do that, but I'm not sure I can't either. CPUs
snoop each other's cache, and that's supposed to be fast...

-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-19 22:34                             ` Pavel Machek
@ 2002-12-19 22:36                               ` H. Peter Anvin
  0 siblings, 0 replies; 176+ messages in thread
From: H. Peter Anvin @ 2002-12-19 22:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dean gaudet, Linus Torvalds, Dave Jones, Ingo Molnar,
	linux-kernel

Pavel Machek wrote:
> Hi!
> 
> 
>>>User on cpu1 reads time, communicates it to cpu2, but cpu2 is drifted
>>>-50ns, so it reads time "before" time reported cpu1. And gets confused.
>>>
>>
>>How can you get that communication to happen in < 50 ns?
> 
> 
> I'm not sure I can do that, but I'm not sure I can't either. CPUs
> snoop each other's cache, and that's supposed to be fast...
> 

Even over a 400 MHz FSB you have 2.5 ns cycles.  I doubt you can
transfer a cache line in 20 FSB cycles.

	-hpa



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Freezing.. (was Re: Intel P6 vs P7 system call performance)
  2002-12-18 18:09                                             ` Mike Dresser
@ 2002-12-23 12:34                                               ` Kai Henningsen
  0 siblings, 0 replies; 176+ messages in thread
From: Kai Henningsen @ 2002-12-23 12:34 UTC (permalink / raw)
  To: linux-kernel

mdresser_l@windsormachine.com (Mike Dresser)  wrote on 18.12.02 in <Pine.LNX.4.33.0212181308380.11644-100000@router.windsormachine.com>:

> On Wed, 18 Dec 2002, Jeff Garzik wrote:
>
> > Linux...  with the exception I guess that there are multiple peer Linii
>
> Perhaps this is the solution.  Would someone please obtain a DNA sample
> from Linus?

It's been in the works for quite some time now, I gather, but the process  
is expected to take maybe two decades more before the first candidate  
becomes available.

It *was* announced here, however.

MfG Kai

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-09 19:46     ` H. Peter Anvin
@ 2002-12-28 20:37       ` Ville Herva
  2002-12-29  2:05         ` Christian Leber
  2002-12-30 11:29         ` Dave Jones
  0 siblings, 2 replies; 176+ messages in thread
From: Ville Herva @ 2002-12-28 20:37 UTC (permalink / raw)
  To: linux-kernel

On Mon, Dec 09, 2002 at 11:46:47AM -0800, you [H. Peter Anvin] wrote:
> 
> SYSCALL is AMD.  SYSENTER is Intel, and is likely to be significantly

Now that Linus has killed the dragon and everybody seems happy with the
shiny new SYSENTER code, let just add one more stupid question to this
thread: has anyone made benchmarks on SYSCALL/SYSENTER/INT80 on Athlon? Is
SYSCALL worth doing separately for Athlon (and perhaps Hammer/32-bit mode)?

-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-28 20:37       ` Ville Herva
@ 2002-12-29  2:05         ` Christian Leber
  2002-12-30 18:22           ` Christian Leber
  2002-12-30 11:29         ` Dave Jones
  1 sibling, 1 reply; 176+ messages in thread
From: Christian Leber @ 2002-12-29  2:05 UTC (permalink / raw)
  To: linux-kernel

On Sat, Dec 28, 2002 at 10:37:06PM +0200, Ville Herva wrote:

> Now that Linus has killed the dragon and everybody seems happy with the
> shiny new SYSENTER code, let just add one more stupid question to this
> thread: has anyone made benchmarks on SYSCALL/SYSENTER/INT80 on Athlon? Is
> SYSCALL worth doing separately for Athlon (and perhaps Hammer/32-bit mode)?

Yes, the output of the programm Linus posted is on a Duron 750 with
2.5.53 like this:

igor3:~# ./a.out
187.894946 cycles  (call 0xffffe000)
299.155075 cycles  (int 80)

(cycles per getpid() call)


Christian Leber

-- 
  "Omnis enim res, quae dando non deficit, dum habetur et non datur,
   nondum habetur, quomodo habenda est."       (Aurelius Augustinus)
  Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-28 20:37       ` Ville Herva
  2002-12-29  2:05         ` Christian Leber
@ 2002-12-30 11:29         ` Dave Jones
  1 sibling, 0 replies; 176+ messages in thread
From: Dave Jones @ 2002-12-30 11:29 UTC (permalink / raw)
  To: Ville Herva, linux-kernel

On Sat, Dec 28, 2002 at 10:37:06PM +0200, Ville Herva wrote:

 > > SYSCALL is AMD.  SYSENTER is Intel, and is likely to be significantly
 > Now that Linus has killed the dragon and everybody seems happy with the
 > shiny new SYSENTER code, let just add one more stupid question to this
 > thread: has anyone made benchmarks on SYSCALL/SYSENTER/INT80 on Athlon? Is
 > SYSCALL worth doing separately for Athlon (and perhaps Hammer/32-bit mode)?

Its something I wondered about too. Even if it isn't a win for K7,
it's possible that the K6 family may benefit from SYSCALL support.
Maybe even the K5 if it was around that early ? (too lazy to check pdf's)

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-29  2:05         ` Christian Leber
@ 2002-12-30 18:22           ` Christian Leber
  2002-12-30 21:22             ` Linus Torvalds
  0 siblings, 1 reply; 176+ messages in thread
From: Christian Leber @ 2002-12-30 18:22 UTC (permalink / raw)
  To: linux-kernel

On Sun, Dec 29, 2002 at 03:05:10AM +0100, Christian Leber wrote:

> > Now that Linus has killed the dragon and everybody seems happy with the
> > shiny new SYSENTER code, let just add one more stupid question to this
> > thread: has anyone made benchmarks on SYSCALL/SYSENTER/INT80 on Athlon? Is
> > SYSCALL worth doing separately for Athlon (and perhaps Hammer/32-bit mode)?
> 
> Yes, the output of the programm Linus posted is on a Duron 750 with
> 2.5.53 like this:
> 
> igor3:~# ./a.out
> 187.894946 cycles  (call 0xffffe000)
> 299.155075 cycles  (int 80)
> (cycles per getpid() call)

Damn, false lines, this where numbers from 2.5.52-bk2+sysenter-patch.

But now the right and interesting lines:

2.5.53:
igor3:~# ./a.out
166.283549 cycles
278.461609 cycles

2.5.53-bk5:
igor3:~# ./a.out
150.895348 cycles
279.441955 cycles

The question is: are the numbers correct?
(I don't know if the TSC thing is actually right)

And why have int 80 also gotten faster?


Is this a valid testprogramm to find out how long a system call takes?
igor3:~# cat sysc.c 
#define rdtscl(low) \
__asm__ __volatile__ ("rdtsc" : "=a" (low) : : "edx")

int getpiddd()
{
        int i=0; return i+10;
}

int main(int argc, char **argv) {
        long a,b,c,d;
        int i1,i2,i3;

        rdtscl(a);
        i1 = getpiddd(); //just to see how long a simple function takes
        rdtscl(b);
        i2 = getpid();
        rdtscl(c);
        i3 = getpid();
        rdtscl(d);
        printf("function call: %lu first: %lu second: %lu cycles\n",b-a,c-b,d-c);
        return 0;
}

I link it against a slightly modified (1 line of code) dietlibc:
igor3:~# dietlibc-0.22/bin-i386/diet gcc sysc.c
igor3:~# ./a.out 
function call: 42 first: 1821 second: 169 cycles

I heard that there are serious problems involved with TSC, therefore I
don't know if the numbers are correct/make seens.


Christian Leber

-- 
  "Omnis enim res, quae dando non deficit, dum habetur et non datur,
   nondum habetur, quomodo habenda est."       (Aurelius Augustinus)
  Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: Intel P6 vs P7 system call performance
  2002-12-30 18:22           ` Christian Leber
@ 2002-12-30 21:22             ` Linus Torvalds
  0 siblings, 0 replies; 176+ messages in thread
From: Linus Torvalds @ 2002-12-30 21:22 UTC (permalink / raw)
  To: linux-kernel

In article <20021230182209.GA3981@core.home>,
Christian Leber  <christian@leber.de> wrote:
>
>But now the right and interesting lines:
>
>2.5.53:
>igor3:~# ./a.out
>166.283549 cycles
>278.461609 cycles
>
>2.5.53-bk5:
>igor3:~# ./a.out
>150.895348 cycles
>279.441955 cycles
>
>The question is: are the numbers correct?

Roughly. The program I posted has some overflow errors (which you will
hit if testing expensive system calls that take >4000 cycles). They also

do an average, which is "mostly correct", but not stable if there is
some load in the machine. The right way to do timings like this is
probably to do minimums for individual calls, and then subtract out the
TSC reading overhead. See attached silly program.

>And why have int 80 also gotten faster?

Random luck. Sometimes you get cacheline alignment magic etc. Or just
because the timings aren't stable for other reasons (background
processes etc).

>Is this a valid testprogramm to find out how long a system call takes?

Not really. The results won't be stable, since you might have cache
misses, page faults, other processes, whatever.

So you'll get _somehat_ correct numbers, but they may be randomly off.

		Linus

---
#include <sys/types.h>
#include <time.h>
#include <sys/time.h>
#include <sys/fcntl.h>
#include <asm/unistd.h>
#include <sys/stat.h>
#include <stdio.h>

#define rdtsc() ({ unsigned long a, d; asm volatile("rdtsc":"=a" (a), "=d" (d)); a; })

// for testing _just_ system call overhead.
//#define __NR_syscall __NR_stat64
#define __NR_syscall __NR_getpid

#define NR (100000)

int main()
{
	int i, ret;
	unsigned long fast = ~0UL, slow = ~0UL, overhead = ~0UL;
	struct timeval x,y;
	char *filename = "test";
	struct stat st;
	int j;

	for (i = 0; i < NR; i++) {
		unsigned long cycles = rdtsc();
		asm volatile("");
		cycles = rdtsc() - cycles;
		if (cycles < overhead)
			overhead = cycles;
	}

	printf("overhead: %6d\n", overhead);

	for (j = 0; j < 10; j++)
	for (i = 0; i < NR; i++) {
		unsigned long cycles = rdtsc();
		asm volatile("call 0xffffe000"
			:"=a" (ret)
			:"0" (__NR_syscall),
			 "b" (filename),
			 "c" (&st));
		cycles = rdtsc() - cycles;
		if (cycles < fast)
			fast = cycles;
	}

	fast -= overhead;
	printf("sysenter: %6d cycles\n", fast);

	for (i = 0; i < NR; i++) {
		unsigned long cycles = rdtsc();
		asm volatile("int $0x80"
			:"=a" (ret)
			:"0" (__NR_syscall),
			 "b" (filename),
			 "c" (&st));
		cycles = rdtsc() - cycles;
		if (cycles < slow)
			slow = cycles;
	}

	slow -= overhead;
	printf("int0x80:  %6d cycles\n", slow);
	printf("          %6d cycles difference\n", slow-fast);
	printf("factor %f\n", (double) slow / fast);
}



^ permalink raw reply	[flat|nested] 176+ messages in thread

end of thread, other threads:[~2002-12-30 21:14 UTC | newest]

Thread overview: 176+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-09  8:30 Intel P6 vs P7 system call performance Mike Hayward
2002-12-09 15:40 ` erich
2002-12-09 17:48 ` Linus Torvalds
2002-12-09 19:36   ` Dave Jones
2002-12-09 19:46     ` H. Peter Anvin
2002-12-28 20:37       ` Ville Herva
2002-12-29  2:05         ` Christian Leber
2002-12-30 18:22           ` Christian Leber
2002-12-30 21:22             ` Linus Torvalds
2002-12-30 11:29         ` Dave Jones
2002-12-17  0:47     ` Linus Torvalds
2002-12-17  1:03       ` Dave Jones
2002-12-17  2:36         ` Linus Torvalds
2002-12-17  5:55           ` Linus Torvalds
2002-12-17  6:09             ` Linus Torvalds
2002-12-17  6:18               ` Linus Torvalds
2002-12-19 14:03                 ` Shuji YAMAMURA
2002-12-17  6:19               ` GrandMasterLee
2002-12-17  6:43               ` dean gaudet
2002-12-17 16:50                 ` Linus Torvalds
2002-12-17 19:11                 ` H. Peter Anvin
2002-12-17 21:39                   ` Benjamin LaHaise
2002-12-17 21:41                     ` H. Peter Anvin
2002-12-17 21:53                       ` Benjamin LaHaise
2002-12-18 23:53                 ` Pavel Machek
2002-12-19 22:18                   ` H. Peter Anvin
2002-12-19 22:21                     ` Pavel Machek
2002-12-19 22:23                       ` H. Peter Anvin
2002-12-19 22:26                         ` Pavel Machek
2002-12-19 22:30                           ` H. Peter Anvin
2002-12-19 22:34                             ` Pavel Machek
2002-12-19 22:36                               ` H. Peter Anvin
2002-12-17 19:12               ` H. Peter Anvin
2002-12-17 19:26                 ` Martin J. Bligh
2002-12-17 20:51                   ` Alan Cox
2002-12-17 20:16                     ` H. Peter Anvin
2002-12-17 20:49                 ` Alan Cox
2002-12-17 20:12                   ` H. Peter Anvin
2002-12-17  9:45             ` Andre Hedrick
2002-12-17 12:40               ` Dave Jones
2002-12-17 23:18                 ` Andre Hedrick
2002-12-17 15:12               ` Alan Cox
2002-12-18 23:55                 ` Pavel Machek
2002-12-19 22:17                   ` H. Peter Anvin
2002-12-17 10:53             ` Ulrich Drepper
2002-12-17 11:17               ` dada1
2002-12-17 17:33                 ` Ulrich Drepper
2002-12-17 17:06               ` Linus Torvalds
2002-12-17 17:55                 ` Ulrich Drepper
2002-12-17 18:01                   ` Linus Torvalds
2002-12-17 19:23                   ` Alan Cox
2002-12-17 18:48                     ` Ulrich Drepper
2002-12-17 19:19                       ` H. Peter Anvin
2002-12-17 19:44                       ` Alan Cox
2002-12-17 19:52                         ` Richard B. Johnson
2002-12-17 19:54                           ` H. Peter Anvin
2002-12-17 19:58                           ` Linus Torvalds
2002-12-18  7:20                             ` Kai Henningsen
2002-12-17 18:49                     ` Linus Torvalds
2002-12-17 19:09                       ` Ross Biro
2002-12-17 21:34                       ` Benjamin LaHaise
2002-12-17 21:36                         ` H. Peter Anvin
2002-12-17 21:50                           ` Benjamin LaHaise
2002-12-18 23:59               ` Pavel Machek
2002-12-17 16:12             ` Hugh Dickins
2002-12-17 16:33               ` Richard B. Johnson
2002-12-17 17:47                 ` Linus Torvalds
2002-12-17 16:54               ` Hugh Dickins
2002-12-17 17:07               ` Linus Torvalds
2002-12-17 17:19                 ` Matti Aarnio
2002-12-17 17:55                   ` Linus Torvalds
2002-12-17 18:24                     ` Linus Torvalds
2002-12-17 18:33                       ` Ulrich Drepper
2002-12-17 18:30                     ` Ulrich Drepper
2002-12-17 19:04                       ` Linus Torvalds
2002-12-17 19:19                         ` Ulrich Drepper
2002-12-17 19:28                         ` Linus Torvalds
2002-12-17 19:32                           ` H. Peter Anvin
2002-12-17 19:44                             ` Linus Torvalds
2002-12-17 19:53                           ` Ulrich Drepper
2002-12-17 20:01                             ` Linus Torvalds
2002-12-17 20:17                               ` Ulrich Drepper
2002-12-18  4:15                                 ` Linus Torvalds
2002-12-18  4:15                               ` Linus Torvalds
2002-12-18  4:39                                 ` H. Peter Anvin
2002-12-18  4:49                                   ` Linus Torvalds
2002-12-18  6:38                                     ` Linus Torvalds
2002-12-18 13:17                                 ` Richard B. Johnson
2002-12-18 13:40                                 ` Horst von Brand
2002-12-18 13:47                                   ` Sean Neakums
2002-12-18 14:10                                     ` Horst von Brand
2002-12-18 14:51                                       ` dada1
2002-12-18 19:12                                       ` Mark Mielke
2002-12-18 15:52                                   ` Alan Cox
2002-12-18 16:41                                   ` Dave Jones
2002-12-18 16:49                                     ` Freezing.. (was Re: Intel P6 vs P7 system call performance) Linus Torvalds
2002-12-18 16:56                                       ` Larry McVoy
2002-12-18 16:58                                       ` Dave Jones
2002-12-18 17:41                                         ` Linus Torvalds
2002-12-18 18:03                                           ` Jeff Garzik
2002-12-18 18:09                                             ` Mike Dresser
2002-12-23 12:34                                               ` Kai Henningsen
2002-12-18 19:08                                           ` Alan Cox
2002-12-18 19:23                                             ` Larry McVoy
2002-12-18 19:30                                               ` Alan Cox
2002-12-18 19:33                                                 ` Larry McVoy
2002-12-18 19:42                                                   ` Alan Cox
2002-12-18 19:45                                                     ` Larry McVoy
2002-12-18 20:39                                                       ` John Bradford
2002-12-18 22:08                                                         ` Larry McVoy
2002-12-18 22:37                                                           ` John Bradford
2002-12-19  1:09                                                             ` Alan Cox
2002-12-19  0:37                                                               ` Russell King
2002-12-19  0:58                                                                 ` Jeff Garzik
2002-12-19  1:43                                                                 ` Martin J. Bligh
2002-12-19 10:50                                                                 ` Dave Jones
2002-12-19  0:59                                                               ` John Bradford
2002-12-19 10:27                                                                 ` Dave Jones
2002-12-19  1:17                                                               ` Linus Torvalds
2002-12-19  0:08                                                           ` Alan Cox
2002-12-19  0:53                                                             ` John Bradford
2002-12-19 13:17                                                           ` Stephen Satchell
2002-12-19  5:34                                             ` Timothy D. Witham
2002-12-19  6:43                                               ` Andrew Morton
2002-12-19  5:45                                                 ` Timothy D. Witham
2002-12-19  7:05                                               ` Martin J. Bligh
2002-12-19  6:08                                                 ` Timothy D. Witham
2002-12-18 19:50                                           ` Oliver Xymoron
2002-12-18 17:06                                       ` Eli Carter
2002-12-18 17:08                                       ` Andrew Morton
2002-12-18 18:25                                       ` John Alvord
2002-12-18 18:41                                     ` Intel P6 vs P7 system call performance Horst von Brand
2002-12-17 19:26                       ` Alan Cox
2002-12-17 18:57                         ` Ulrich Drepper
2002-12-17 19:10                           ` Linus Torvalds
2002-12-17 19:21                             ` H. Peter Anvin
2002-12-17 19:37                               ` Linus Torvalds
2002-12-17 19:43                                 ` H. Peter Anvin
2002-12-17 20:07                                   ` Matti Aarnio
2002-12-17 20:10                                     ` H. Peter Anvin
2002-12-17 19:59                                 ` Matti Aarnio
2002-12-17 20:06                                 ` Ulrich Drepper
2002-12-17 20:35                                   ` Daniel Jacobowitz
2002-12-18  0:20                                   ` Linus Torvalds
2002-12-18  0:38                                     ` Ulrich Drepper
2002-12-18  7:41                                 ` Kai Henningsen
2002-12-18 13:00                                 ` Rogier Wolff
2002-12-17 19:47                             ` Dave Jones
2002-12-18 12:57                             ` Rogier Wolff
2002-12-19  0:14                               ` Pavel Machek
2002-12-17 21:38                           ` Benjamin LaHaise
2002-12-17 21:41                             ` H. Peter Anvin
2002-12-17 18:39                     ` Jeff Dike
2002-12-17 19:05                       ` Linus Torvalds
2002-12-18  5:34                     ` Jeremy Fitzhardinge
2002-12-18  5:38                       ` H. Peter Anvin
2002-12-18 15:50                       ` Alan Cox
2002-12-18 23:51             ` Pavel Machek
2002-12-13 15:45 ` William Lee Irwin III
2002-12-13 16:49   ` Mike Hayward
2002-12-14  0:55     ` GrandMasterLee
2002-12-14  4:41       ` Mike Dresser
2002-12-14  4:53         ` Mike Dresser
2002-12-14 10:01           ` Dave Jones
2002-12-14 17:48             ` Mike Dresser
2002-12-14 18:36             ` GrandMasterLee
2002-12-15  2:03               ` J.A. Magallon
2002-12-15 21:59   ` Pavel Machek
2002-12-15 22:37     ` William Lee Irwin III
2002-12-15 22:43       ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2002-12-18 22:00 Freezing.. (was Re: Intel P6 vs P7 system call performance) Nakajima, Jun
     [not found] <20021218161506.M7976@work.bitmover.com>
2002-12-19  0:18 ` Alan Cox
2002-12-19  0:37   ` Larry McVoy
2002-12-19  1:08 Adam J. Richter
2002-12-19  9:13 ` Russell King
2002-12-19 16:39   ` Eli Carter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).