public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Using %cr2 to reference "current"
@ 2001-11-06  7:18 H. Peter Anvin
  2001-11-06  8:01 ` Robert Love
                   ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: H. Peter Anvin @ 2001-11-06  7:18 UTC (permalink / raw)
  To: linux-kernel

2.4.13-ac8 uses %cr2 rather than (%esp & 0xfffe0000) to get "current".
I've been trying to figure out the point of this... writing a control
register is microcode on all the x86 implementations I know (and you
have to re-set it after every pagefault), and reading one probably is
one on most (not Transmeta, but...)

On the other hand, %esp is a GPR and available to the core directly,
and so are usually plain immediates.

Is using %cr2 really faster than the old implementation, or is there
another reason?  It seems that the alignment constraints on the stack
still remains, since the %esp solution still remains in places...

It might also be worth considering a segment-register based
implementation instead.  The reason we're not using %fs and %gs in the
kernel anymore is because of the setup slowness, but perhaps using
them (use %fs since it's much more likely to be NULL and thus faster
to restore) would be faster than using %cr2?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread
* Re: Using %cr2 to reference "current"
@ 2001-11-06 22:05 Mikael Pettersson
  0 siblings, 0 replies; 61+ messages in thread
From: Mikael Pettersson @ 2001-11-06 22:05 UTC (permalink / raw)
  To: bcrl, torvalds; +Cc: linux-kernel

On Tue, 6 Nov 2001 09:49:15 -0800 (PST), Linus Torvalds wrote:
>	/* Return "current" in %eax, trash %edx */
>	do_get_current:
>		movl $0x0003c000,%eax	// 4 bits at bit 14
>		movl $-16384,%edx	// remove low 14 bits
>		andl $esp,%eax
>		andl $esp,%edx
>		shrl $7,%eax		// color it by 128 bytes
>		addl %edx,%eax
>		ret
>...
>I would not be surprised if "mov %cr2,%reg" will break a netburst trace
>cache entity, or even cause microcode to be executed. While I _guarantee_
>that all future Intel CPU's will continue to be fast at mixtures of simple
>arithmetic operations like "add" and "and".

On my Pentium 4:
- 6.30 cycles to copy %cr2 to %eax
- 1.05 cycles to compute a non-coloured current by masking %esp
- 2.31 cycles to compute a coloured current by your code above

I did some tests on using %cr2 for get_processor_id() a while ago,
but it was clearly slower (58% on P6, 20% on K6-III, 3% on P5MMX)
than *((%esp & mask)+offset), even though the latter also does a load.

/Mikael

^ permalink raw reply	[flat|nested] 61+ messages in thread
* Re: Using %cr2 to reference "current"
@ 2002-11-10 21:23 Igor Levicki
  0 siblings, 0 replies; 61+ messages in thread
From: Igor Levicki @ 2002-11-10 21:23 UTC (permalink / raw)
  To: torvalds@transmeta.com; +Cc: linux-kernel@vger.kernel.org


Hi,

>I could well imagine a x86-compatible chip where %cr2 isn't even
>writable.  In fact, reading the intel documentation, I see _nowhere_ a
>mention of %cr2 being writable at all - it all just says "contains the
>fault address". 

>From Intel System Programmers Guide:

"The control registers can be read and loaded (or modified) using the
move-to-or-from-controlregisters
forms of the MOV instruction. In protected mode, the MOV instructions
allow the
control registers to be read or loaded (at privilege level 0 only).
This restriction means that application
programs or operating-system procedures (running at privilege levels 1,
2, or 3) are
prevented from reading or loading the control registers.
When loading the control register, reserved bits should always be set
to the values previously
read."

>(I don't know what the effect of the P4 half-cacheline
>thing is, I don't know if the CPU can have just a 64-byte block coherent,
>or what..

Cache sector size is 64 bytes on Pentium 4. When CPU reads from memory
it reads 2 sectors x 64 bytes = 128 byte cache line. Hardware
prefetcher fetches 2 x 128 byte cache line = 256 bytes of memory. On
write CPU writes 64 bytes always.
Now if you read 16 bytes from some address and then for example add
something to them and write them back to the same address you will have
a penalty when you read next 16 bytes from that address because you
have just trashed the 64 byte sector and you have to wait for
back-propagation.
Hope this helps.

Regards,
Igor Levicki
levicki@yubc.net



^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2002-11-10 21:17 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-06  7:18 Using %cr2 to reference "current" H. Peter Anvin
2001-11-06  8:01 ` Robert Love
2001-11-06 10:55   ` Alan Cox
2001-11-06 17:31     ` Michael Barabanov
2001-11-06 14:14   ` Manfred Spraul
2001-11-06 10:58 ` Alan Cox
2001-11-06 17:04   ` Linus Torvalds
2001-11-06 17:46     ` Alan Cox
2001-11-06 17:59       ` Linus Torvalds
2001-11-06 18:14         ` Alan Cox
2001-11-06 16:55           ` Marcelo Tosatti
2001-11-06 18:14           ` Linus Torvalds
2001-11-06 18:31             ` Alan Cox
2001-11-06 22:38               ` Linus Torvalds
2001-11-07  0:00           ` Martin Dalecki
2001-11-06 23:19             ` Alan Cox
2001-11-07  0:43               ` Martin Dalecki
2001-11-07  0:27                 ` Alan Cox
2001-11-07  0:35                 ` Jeff Garzik
2001-11-07 14:00               ` Martin Dalecki
2001-11-07 13:38                 ` Alan Cox
2001-11-07 14:59                   ` Martin Dalecki
2001-11-07 14:17                     ` Alan Cox
2001-11-07 14:34                       ` Dirk Moerenhout
2001-11-07 14:54                         ` Alan Cox
2001-11-07 15:32                           ` David Howells
2001-11-07 14:39                       ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl
2001-11-07 22:05                         ` lists
2001-11-07 15:36                       ` Using %cr2 to reference "current" Martin Dalecki
2001-11-08 14:08                       ` Martin Dalecki
2001-11-13 16:49                       ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki
2001-11-13 16:21                         ` Russell King
2001-11-13 17:37                           ` Martin Dalecki
2001-11-13 16:53                             ` Russell King
2001-11-13 18:05                               ` Martin Dalecki
2001-11-13 17:11                             ` Alan Cox
2001-11-13 18:23                               ` Martin Dalecki
2001-11-07 20:04                   ` Using %cr2 to reference "current" Andrew Morton
2001-11-11 13:16                   ` Martin Dalecki
2001-11-11 13:06                     ` Keith Owens
2001-11-12 11:28                     ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki
2001-11-12 16:10                       ` Keith Owens
2001-11-12 16:25                         ` Christoph Hellwig
2001-11-12 17:56                         ` Martin Dalecki
2001-11-12 16:42                       ` Linus Torvalds
2001-11-12 18:51                         ` Martin Dalecki
2001-11-12 20:05                           ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki
2001-11-12 20:13                             ` BUG BUG hunt the bugs!!! patch-2.4.15-pre5 Martin Dalecki
2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds
2001-11-06 17:13   ` Benjamin LaHaise
2001-11-06 17:49     ` Linus Torvalds
2001-11-06 18:19       ` Alan Cox
2001-11-09 21:52         ` Jamie Lokier
2001-11-06 18:42       ` Benjamin LaHaise
2001-11-06 19:09         ` H. Peter Anvin
2001-11-06 19:16         ` Dave Jones
2001-11-06 20:10           ` Ricky Beam
2001-11-06 23:09           ` Alan Cox
2001-11-06 23:15             ` Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2001-11-06 22:05 Mikael Pettersson
2002-11-10 21:23 Igor Levicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox