* [RFC] [PATCH] tsc_disable_B2 with "soft" rdtsc
@ 2002-06-20 22:47 john stultz
2002-06-22 1:31 ` Kasper Dupont
0 siblings, 1 reply; 2+ messages in thread
From: john stultz @ 2002-06-20 22:47 UTC (permalink / raw)
To: lkml; +Cc: Martin J. Bligh
[-- Attachment #1: Type: text/plain, Size: 543 bytes --]
Hello all,
Here is my next rev of the tsc-disable patch. This one corrects a
Configure.in typo (Caught by Gabriel Paubert), and probably more
controversial, implements a soft rdtsc instruction via do_gettimeofday.
This avoids the earlier "box won't boot" problems with i686 optimized
glibc's that called rdtsc. The rdtsc instruction will now be caught, and
faked returning to the user program the same value of gettimeofday. Yes,
its pretty hackish, but it works, albeit slowly.
Anyway, as always, comments and flames are welcome.
-john
[-- Attachment #2: linux-2.4.19-pre10_tsc-disable_B2.patch --]
[-- Type: text/x-patch, Size: 6202 bytes --]
diff -Nru a/Documentation/Configure.help b/Documentation/Configure.help
--- a/Documentation/Configure.help Thu Jun 20 15:14:14 2002
+++ b/Documentation/Configure.help Thu Jun 20 15:14:14 2002
@@ -233,7 +233,23 @@
network and embedded applications. For more information see the
Axis Communication site, <http://developer.axis.com/>.
-Multiquad support for NUMA systems
+Multi-node support for NUMA systems
+CONFIG_X86_NUMA
+ This option is used for getting Linux to run on a NUMA multi-node box.
+ Because multi-node systems suffer from unsynced TSCs, as well as TSC
+ drift, which can cause gettimeofday to return non-monotonic values,
+ this option will turn off the CONFIG_X86_TSC optimization. This
+ allows you to then specify "notsc" as a boot option to force all nodes
+ to use the PIT for gettimeofday.
+
+ In order to avoid crashing applications that use rdtsc, this option
+ also enables a "soft" rdtsc implementation. With this option enabled
+ rdtsc instructions will be caught by the OS, and the value returned
+ will be the same as gettimeofday. Do note: this will cause a major
+ performance penalty when compared to native rdtsc calls. You've
+ been warned.
+
+Multiquad support for NUMAQ systems
CONFIG_MULTIQUAD
This option is used for getting Linux to run on a (IBM/Sequent) NUMA
multiquad box. This changes the way that processors are bootstrapped,
diff -Nru a/arch/i386/config.in b/arch/i386/config.in
--- a/arch/i386/config.in Thu Jun 20 15:14:14 2002
+++ b/arch/i386/config.in Thu Jun 20 15:14:14 2002
@@ -80,20 +80,20 @@
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_USE_STRING_486 y
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_PPRO_FENCE y
fi
if [ "$CONFIG_M586MMX" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_USE_STRING_486 y
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_PPRO_FENCE y
fi
if [ "$CONFIG_M686" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_PGE y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
@@ -101,14 +101,14 @@
fi
if [ "$CONFIG_MPENTIUMIII" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_PGE y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
fi
if [ "$CONFIG_MPENTIUM4" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 7
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_PGE y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
@@ -116,12 +116,12 @@
if [ "$CONFIG_MK6" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
fi
if [ "$CONFIG_MK7" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 6
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_USE_3DNOW y
define_bool CONFIG_X86_PGE y
@@ -134,14 +134,14 @@
fi
if [ "$CONFIG_MCYRIXIII" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_USE_3DNOW y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
fi
if [ "$CONFIG_MCRUSOE" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
fi
if [ "$CONFIG_MWINCHIPC6" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -152,14 +152,14 @@
if [ "$CONFIG_MWINCHIP2" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
fi
if [ "$CONFIG_MWINCHIP3D" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
fi
@@ -198,8 +198,20 @@
define_bool CONFIG_X86_IO_APIC y
fi
else
- bool 'Multiquad NUMA system' CONFIG_MULTIQUAD
+ bool 'Multi-node NUMA system support (Caution: Read help first)' CONFIG_X86_NUMA
+ if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ bool 'Multiquad (IBM/Sequent) NUMAQ support' CONFIG_MULTIQUAD
+ fi
fi
+
+if [ "$CONFIG_X86_HAS_TSC" = "y" ]; then
+ if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ define_bool CONFIG_X86_TSC n
+ else
+ define_bool CONFIG_X86_TSC y
+ fi
+fi
+
if [ "$CONFIG_SMP" = "y" -a "$CONFIG_X86_CMPXCHG" = "y" ]; then
define_bool CONFIG_HAVE_DEC_LOCK y
diff -Nru a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c Thu Jun 20 15:14:14 2002
+++ b/arch/i386/kernel/traps.c Thu Jun 20 15:14:14 2002
@@ -392,6 +392,23 @@
if (!(regs->xcs & 3))
goto gp_in_kernel;
+#ifdef CONFIG_X86_NUMA
+ /* "soft" rdtsc implementation */
+ if(!cpu_has_tsc)
+ {
+ char rdtsc_inst[2] = {0x0f, 0x31}; /*rdtsc opcode*/
+ char* inst_ptr = (char*)regs->eip;
+ if((inst_ptr[0] == rdtsc_inst[0])
+ &&(inst_ptr[1] == rdtsc_inst[1])){
+ struct timeval tv;
+ do_gettimeofday(&tv);
+ regs->eax = tv.tv_usec;
+ regs->edx = tv.tv_sec;
+ regs->eip += 2; /*= size of opcode*/
+ return;
+ }
+ }
+#endif
current->thread.error_code = error_code;
current->thread.trap_no = 13;
force_sig(SIGSEGV, current);
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [RFC] [PATCH] tsc_disable_B2 with "soft" rdtsc
2002-06-20 22:47 [RFC] [PATCH] tsc_disable_B2 with "soft" rdtsc john stultz
@ 2002-06-22 1:31 ` Kasper Dupont
0 siblings, 0 replies; 2+ messages in thread
From: Kasper Dupont @ 2002-06-22 1:31 UTC (permalink / raw)
To: john stultz; +Cc: lkml, Martin J. Bligh
john stultz wrote:
>
> Hello all,
>
> Here is my next rev of the tsc-disable patch. This one corrects a
> Configure.in typo (Caught by Gabriel Paubert), and probably more
> controversial, implements a soft rdtsc instruction via do_gettimeofday.
>
> This avoids the earlier "box won't boot" problems with i686 optimized
> glibc's that called rdtsc. The rdtsc instruction will now be caught, and
> faked returning to the user program the same value of gettimeofday. Yes,
> its pretty hackish, but it works, albeit slowly.
+#ifdef CONFIG_X86_NUMA
+ /* "soft" rdtsc implementation */
+ if(!cpu_has_tsc)
+ {
+ char rdtsc_inst[2] = {0x0f, 0x31}; /*rdtsc opcode*/
+ char* inst_ptr = (char*)regs->eip;
+ if((inst_ptr[0] == rdtsc_inst[0])
+ &&(inst_ptr[1] == rdtsc_inst[1])){
Any particular reason for puting the opcode in an
array and verify against that instead of just
verifying inst_ptr[i] against the constants?
+ struct timeval tv;
+ do_gettimeofday(&tv);
+ regs->eax = tv.tv_usec;
+ regs->edx = tv.tv_sec;
Looks like the soft tsc is going to be jumping
rather than runing. It is going to be increasing
at a constat rate most of the time, but will make
a big jump ahead every second. Couldn't the jump
easilly be made a lot smaller by using:
regs->eax = tv.tv_usec<<2;
Of course this is not completely accurate either,
are we willing to pay the price for a more
accurate version?
+ regs->eip += 2; /*= size of opcode*/
+ return;
+ }
+ }
+#endif
--
Kasper Dupont -- der bruger for meget tid på usenet.
For sending spam use mailto:razor-report@daimi.au.dk
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-06-22 1:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-20 22:47 [RFC] [PATCH] tsc_disable_B2 with "soft" rdtsc john stultz
2002-06-22 1:31 ` Kasper Dupont
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox