public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Voyager subarchitecture for 2.5.46
@ 2002-11-05 20:45 J.E.J. Bottomley
  2002-11-06  2:31 ` john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: J.E.J. Bottomley @ 2002-11-05 20:45 UTC (permalink / raw)
  To: torvalds; +Cc: James.Bottomley, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1797 bytes --]

I got it all cleaned up and moved over to the new Kconfig.  I've also removed 
the irq stack pieces that leaked from Alan's tree.

It includes the boot GDT stuff, added configuration options for the 
kernel/timers directory so that things which can't maintain a TSC can turn it 
off at compile time.

This is the diffstat over 2.5.46:

 arch/i386/Kconfig                      |   52 +++++++++++++++++++++++-------
 arch/i386/Makefile                     |    4 ++
 arch/i386/boot/compressed/head.S       |    8 ++--
 arch/i386/boot/compressed/misc.c       |    2 -
 arch/i386/boot/setup.S                 |   56 +++++++++++++++++++++++++++-----
 arch/i386/kernel/Makefile              |    3 +
 arch/i386/kernel/head.S                |   22 +++++++++---
 arch/i386/kernel/irq.c                 |    2 -
 arch/i386/kernel/timers/Makefile       |    6 +--
 arch/i386/kernel/timers/timer.c        |    4 +-
 arch/i386/kernel/timers/timer_pit.c    |    2 +
 arch/i386/kernel/trampoline.S          |    6 +--
 arch/i386/mach-voyager/voyager_basic.c |   28 +++++++++++-----
 arch/i386/mach-voyager/voyager_smp.c   |   57 +++++++++-----------------------
-
 drivers/char/sysrq.c                   |   18 ----------
 include/asm-i386/desc.h                |    1 
 include/asm-i386/hw_irq.h              |    2 -
 include/asm-i386/segment.h             |    8 ++++
 include/asm-i386/smp.h                 |   21 ++++++++----
 include/asm-i386/voyager.h             |    1 
 20 files changed, 188 insertions(+), 115 deletions(-)

The changes to smp.h are to introduce a new macro to loop efficiently over a 
sparse CPU bitmap, and a bit of rearrangement for some functions voyager needs.

This compiles and boots correctly for me.

It's all uploaded to

http://linux-voyager.bkbits.net/voyager-2.5

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 28465 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.808.1.94 -> 1.824  
#	arch/i386/kernel/process.c	1.19.1.15 -> 1.35   
#	include/asm-i386/desc.h	1.9.1.1 -> 1.11   
#	arch/i386/kernel/timers/Makefile	1.2     -> 1.3    
#	arch/i386/boot/setup.S	1.8.2.3 -> 1.16   
#	arch/i386/kernel/irq.c	1.8.1.16 -> 1.21   
#	arch/i386/kernel/head.S	1.15.1.2 -> 1.18   
#	include/asm-i386/irq.h	1.3.1.4 -> 1.8    
#	arch/i386/mach-voyager/Makefile	1.1     ->         (deleted)      
#	   arch/i386/Kconfig	1.2.2.2 -> 1.8    
#	arch/i386/kernel/traps.c	1.20.1.12 -> 1.34   
#	arch/i386/kernel/Makefile	1.4.1.22 -> 1.32   
#	arch/i386/mach-voyager/irq_vectors.h	1.1     ->         (deleted)      
#	include/asm-i386/segment.h	1.2     -> 1.3    
#	arch/i386/kernel/entry.S	1.33.1.7 -> 1.41   
#	arch/i386/mach-voyager/voyager_basic.c	1.1     ->         (deleted)      
#	arch/i386/mach-generic/Makefile	1.1.1.5 -> 1.7    
#	arch/i386/kernel/mpparse.c	1.9.1.16 -> 1.27   
#	  arch/i386/Makefile	1.4.1.24 -> 1.24   
#	arch/i386/mach-voyager/setup_arch_post.h	1.1     ->         (deleted)      
#	arch/i386/mach-voyager/voyager_smp.c	1.1     ->         (deleted)      
#	arch/i386/kernel/trampoline.S	1.4.1.1 -> 1.6    
#	arch/i386/pci/common.c	1.21.1.12 -> 1.35   
#	arch/i386/mach-voyager/setup_arch_pre.h	1.1     ->         (deleted)      
#	     fs/binfmt_elf.c	1.25.1.6 -> 1.31   
#	arch/i386/kernel/setup.c	1.40.1.23 -> 1.67   
#	arch/i386/kernel/smpboot.c	1.15.1.23 -> 1.40   
#	arch/i386/kernel/time.c	1.5.1.16 -> 1.21   
#	include/asm-i386/vic.h	1.1     ->         (deleted)      
#	include/asm-i386/smp.h	1.9.1.8 -> 1.19   
#	arch/i386/boot/compressed/misc.c	1.7.1.2 -> 1.10   
#	         MAINTAINERS	1.68.1.49 -> 1.97   
#	Documentation/voyager.txt	1.1     ->         (deleted)      
#	arch/i386/mach-voyager/voyager_thread.c	1.1     ->         (deleted)      
#	include/asm-i386/hardirq.h	1.5.1.10 -> 1.15   
#	arch/i386/kernel/mca.c	1.4.1.5 -> 1.11   
#	arch/i386/mach-voyager/do_timer.h	1.1     ->         (deleted)      
#	arch/i386/mach-voyager/setup.c	1.1     ->         (deleted)      
#	arch/i386/mach-voyager/entry_arch.h	1.1     ->         (deleted)      
#	      kernel/sched.c	1.51.1.90 -> 1.82   
#	arch/i386/mach-voyager/voyager_cat.c	1.1     ->         (deleted)      
#	arch/i386/kernel/timers/timer_pit.c	1.3.1.2 -> 1.7    
#	drivers/char/sysrq.c	1.9.1.13 -> 1.22   
#	include/asm-i386/voyager.h	1.1     ->         (deleted)      
#	arch/i386/kernel/i8259.c	1.7.1.11 -> 1.19   
#	arch/i386/boot/compressed/head.S	1.2     -> 1.3    
#	arch/i386/kernel/timers/timer.c	1.3     -> 1.4    
#	include/asm-i386/hw_irq.h	1.9.1.6 -> 1.16   
#	arch/i386/kernel/i386_ksyms.c	1.18.1.22 -> 1.37   
#	 arch/i386/pci/irq.c	1.12.1.8 -> 1.21   
#	arch/i386/kernel/apic.c	1.19.1.8 -> 1.24   
#	               (new)	        -> 1.2     arch/i386/mach-voyager/entry_arch.h
#	               (new)	        -> 1.3     include/asm-i386/voyager.h
#	               (new)	        -> 1.2     arch/i386/mach-voyager/irq_vectors.h
#	               (new)	        -> 1.4     arch/i386/mach-voyager/setup_arch_post.h
#	               (new)	        -> 1.7     arch/i386/mach-voyager/voyager_cat.c
#	               (new)	        -> 1.4     include/asm-i386/vic.h
#	               (new)	        -> 1.4     arch/i386/mach-voyager/do_timer.h
#	               (new)	        -> 1.29    arch/i386/mach-voyager/voyager_smp.c
#	               (new)	        -> 1.8     arch/i386/mach-voyager/voyager_thread.c
#	               (new)	        -> 1.9     arch/i386/mach-voyager/Makefile
#	               (new)	        -> 1.10    arch/i386/mach-voyager/voyager_basic.c
#	               (new)	        -> 1.3     arch/i386/mach-voyager/setup.c
#	               (new)	        -> 1.2     Documentation/voyager.txt
#	               (new)	        -> 1.2     arch/i386/mach-voyager/setup_arch_pre.h
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/11/05	jejb@mulgrave.(none)	1.823
# Merge mulgrave.(none):/home/jejb/BK/voyager-2.5
# into mulgrave.(none):/home/jejb/BK/voyager-new-2.5
# --------------------------------------------
# 02/11/05	jejb@mulgrave.(none)	1.824
# [VOYAGER] remove other voyager code remnants from sysrq.c
# --------------------------------------------
#
diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/Kconfig	Tue Nov  5 15:35:01 2002
@@ -253,11 +253,6 @@
 	depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCYRIXIII || MELAN || MK6 || M586MMX || M586TSC || M586 || M486
 	default y
 
-config X86_TSC
-	bool
-	depends on MWINCHIP3D || MWINCHIP2 || MCRUSOE || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX || M586TSC
-	default y
-
 config X86_GOOD_APIC
 	bool
 	depends on MK7 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX
@@ -335,8 +330,21 @@
 	  Say Y here if you are building a kernel for a desktop, embedded
 	  or real-time system.  Say N if you are unsure.
 
+config VOYAGER
+	bool "NCR Voyager Architecture"
+	---help---
+	  Voyager is a MCA based 32 way capable SMP architecture proprietary
+	  to NCR Corp.  Machine classes 345x/35xx/4100/51xx are voyager based.
+	  
+	  *** WARNING ***
+	
+	  If you do not specifically know you have a Voyager based machine,
+	  say N here otherwise the kernel you build will not be bootable.
+
+
 config X86_UP_APIC
 	bool "Local APIC support on uniprocessors" if !SMP
+	depends on !VOYAGER
 	default y if SMP
 	---help---
 	  A local APIC (Advanced Programmable Interrupt Controller) is an
@@ -789,6 +797,7 @@
 
 
 menu "Power management options (ACPI, APM)"
+	depends on !VOYAGER
 
 source "drivers/acpi/Kconfig"
 
@@ -972,11 +981,12 @@
 
 config X86_LOCAL_APIC
 	bool
-	depends on !VISWS && SMP || VISWS
+	depends on ((!VISWS && SMP) || VISWS) && !VOYAGER
 	default y
 
 config PCI
 	bool "PCI support" if !VISWS
+	depends on !VOYAGER
 	default y if VISWS
 	help
 	  Find out whether you have a PCI motherboard. PCI is the name of a
@@ -991,7 +1001,7 @@
 
 config X86_IO_APIC
 	bool
-	depends on !VISWS && SMP
+	depends on !VISWS && SMP && !VOYAGER
 	default y
 
 choice
@@ -1035,6 +1045,7 @@
 
 config SCx200
 	tristate "NatSemi SCx200 support"
+	depends on !VOYAGER
 	help
 	  This provides basic support for the National Semiconductor SCx200 
 	  processor.  Right now this is just a driver for the GPIO pins.
@@ -1048,6 +1059,7 @@
 
 config ISA
 	bool "ISA support"
+	depends on !VOYAGER
 	help
 	  Find out whether you have ISA slots on your motherboard.  ISA is the
 	  name of a bus system, i.e. the way the CPU talks to the other stuff
@@ -1072,8 +1084,9 @@
 	  Otherwise, say N.
 
 config MCA
-	bool "MCA support"
+	bool "MCA support" if !VOYAGER
 	depends on !VISWS
+	default y if VOYAGER
 	help
 	  MicroChannel Architecture is found in some IBM PS/2 machines and
 	  laptops.  It is a bus system similar to PCI or ISA. See
@@ -1615,12 +1628,12 @@
 
 config X86_EXTRA_IRQS
 	bool
-	depends on X86_LOCAL_APIC
+	depends on X86_LOCAL_APIC || VOYAGER
 	default y
 
 config X86_FIND_SMP_CONFIG
 	bool
-	depends on X86_LOCAL_APIC
+	depends on X86_LOCAL_APIC || VOYAGER
 	default y
 
 config X86_MPPARSE
@@ -1636,17 +1649,32 @@
 
 source "lib/Kconfig"
 
+config X86_TSC
+	bool
+	depends on  !VOYAGER && (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX || M586TSC)
+	default y
+
+config X86_PIT
+	bool
+	depends on M386 || M486 || M586 || M586TSC || VOYAGER
+	default y
+
 config X86_SMP
 	bool
-	depends on SMP
+	depends on SMP && !VOYAGER
 	default y
 
 config X86_HT
 	bool
-	depends on SMP
+	depends on SMP && !VOYAGER
 	default y
 
 config X86_BIOS_REBOOT
 	bool
+	depends on !VOYAGER
 	default y
 
+config X86_TRAMPOLINE
+	bool
+	depends on SMP
+	default y
diff -Nru a/arch/i386/Makefile b/arch/i386/Makefile
--- a/arch/i386/Makefile	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/Makefile	Tue Nov  5 15:35:01 2002
@@ -49,7 +49,11 @@
 ifdef CONFIG_VISWS
 MACHINE	:= mach-visws
 else
+ifdef CONFIG_VOYAGER
+MACHINE:= mach-voyager
+else
 MACHINE	:= mach-generic
+endif
 endif
 
 HEAD := arch/i386/kernel/head.o arch/i386/kernel/init_task.o
diff -Nru a/arch/i386/boot/compressed/head.S b/arch/i386/boot/compressed/head.S
--- a/arch/i386/boot/compressed/head.S	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/boot/compressed/head.S	Tue Nov  5 15:35:01 2002
@@ -31,7 +31,7 @@
 startup_32:
 	cld
 	cli
-	movl $(__KERNEL_DS),%eax
+	movl $(__BOOT_DS),%eax
 	movl %eax,%ds
 	movl %eax,%es
 	movl %eax,%fs
@@ -74,7 +74,7 @@
 	popl %esi	# discard address
 	popl %esi	# real mode pointer
 	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $0x100000
+	ljmp $(__BOOT_CS), $0x100000
 
 /*
  * We come here, if we were loaded high.
@@ -101,7 +101,7 @@
 	popl %eax	# hcount
 	movl $0x100000,%edi
 	cli		# make sure we don't get interrupted
-	ljmp $(__KERNEL_CS), $0x1000 # and jump to the move routine
+	ljmp $(__BOOT_CS), $0x1000 # and jump to the move routine
 
 /*
  * Routine (template) for moving the decompressed kernel in place,
@@ -124,5 +124,5 @@
 	movsl
 	movl %ebx,%esi	# Restore setup pointer
 	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $0x100000
+	ljmp $(__BOOT_CS), $0x100000
 move_routine_end:
diff -Nru a/arch/i386/boot/compressed/misc.c b/arch/i386/boot/compressed/misc.c
--- a/arch/i386/boot/compressed/misc.c	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/boot/compressed/misc.c	Tue Nov  5 15:35:01 2002
@@ -299,7 +299,7 @@
 struct {
 	long * a;
 	short b;
-	} stack_start = { & user_stack [STACK_SIZE] , __KERNEL_DS };
+	} stack_start = { & user_stack [STACK_SIZE] , __BOOT_DS };
 
 static void setup_normal_output_buffer(void)
 {
diff -Nru a/arch/i386/boot/setup.S b/arch/i386/boot/setup.S
--- a/arch/i386/boot/setup.S	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/boot/setup.S	Tue Nov  5 15:35:01 2002
@@ -476,6 +476,24 @@
 	movsb
 	popw	%ds
 no_mca:
+#ifdef CONFIG_VOYAGER
+	movb	$0xff, 0x40	# flag on config found
+	movb	$0xc0, %al
+	mov	$0xff, %ah
+	int	$0x15		# put voyager config info at es:di
+	jc	no_voyager
+	movw	$0x40, %si	# place voyager info in apm table
+	cld
+	movw	$7, %cx
+voyager_rep:
+	movb	%es:(%di), %al
+	movb	%al,(%si)
+	incw	%di
+	incw	%si
+	decw	%cx
+	jnz	voyager_rep
+no_voyager:	
+#endif
 # Check for PS/2 pointing device
 	movw	%cs, %ax			# aka SETUPSEG
 	subw	$DELTA_INITSEG, %ax		# aka INITSEG
@@ -740,6 +758,7 @@
 A20_ENABLE_LOOPS	= 255		# Total loops to try		
 
 
+#ifndef CONFIG_VOYAGER
 a20_try_loop:
 
 	# First, see if we are on a system with no A20 gate.
@@ -758,11 +777,14 @@
 	jnz	a20_done
 
 	# Try enabling A20 through the keyboard controller
+#endif /* CONFIG_VOYAGER */
 a20_kbc:
 	call	empty_8042
 
+#ifndef CONFIG_VOYAGER
 	call	a20_test			# Just in case the BIOS worked
 	jnz	a20_done			# but had a delayed reaction.
+#endif
 
 	movb	$0xD1, %al			# command write
 	outb	%al, $0x64
@@ -772,6 +794,7 @@
 	outb	%al, $0x60
 	call	empty_8042
 
+#ifndef CONFIG_VOYAGER
 	# Wait until a20 really *is* enabled; it can take a fair amount of
 	# time on certain systems; Toshiba Tecras are known to have this
 	# problem.
@@ -819,6 +842,7 @@
 	# If we get here, all is good
 a20_done:
 
+#endif /* CONFIG_VOYAGER */
 # set up gdt and idt
 	lidt	idt_48				# load idt with 0,0
 	xorl	%eax, %eax			# Compute gdt_base
@@ -870,7 +894,7 @@
 	subw	$DELTA_INITSEG, %si
 	shll	$4, %esi			# Convert to 32-bit pointer
 # NOTE: For high loaded big kernels we need a
-#	jmpi    0x100000,__KERNEL_CS
+#	jmpi    0x100000,__BOOT_CS
 #
 #	but we yet haven't reloaded the CS register, so the default size 
 #	of the target offset still is 16 bit.
@@ -881,7 +905,7 @@
 	.byte 0x66, 0xea			# prefix + jmpi-opcode
 code32:	.long	0x1000				# will be set to 0x100000
 						# for big kernels
-	.word	__KERNEL_CS
+	.word	__BOOT_CS
 
 # Here's a bunch of information about your current kernel..
 kernel_version:	.ascii	UTS_RELEASE
@@ -985,6 +1009,7 @@
 	.string	"INT15 refuses to access high mem, giving up."
 
 
+#ifndef CONFIG_VOYAGER
 # This routine tests whether or not A20 is enabled.  If so, it
 # exits with zf = 0.
 #
@@ -1015,6 +1040,8 @@
 	popw	%cx
 	ret	
 
+#endif /* CONFIG_VOYAGER */
+
 # This routine checks that the keyboard command queue is empty
 # (after emptying the output buffers)
 #
@@ -1075,13 +1102,19 @@
 
 # Descriptor tables
 #
-# NOTE: if you think the GDT is large, you can make it smaller by just
-# defining the KERNEL_CS and KERNEL_DS entries and shifting the gdt
-# address down by GDT_ENTRY_KERNEL_CS*8. This puts bogus entries into
-# the GDT, but those wont be used so it's not a problem.
+# NOTE: The intel manual says gdt should be sixteen bytes aligned for
+# efficiency reasons.  However, there are machines which are known not
+# to boot with misaligned GDTs, so alter this at your peril!  If you alter
+# GDT_ENTRY_BOOT_CS (in asm/segment.h) remember to leave at least two
+# empty GDT entries (one for NULL and one reserved).
+#
+# NOTE:	On some CPUs, the GDT must be 8 byte aligned.  This is
+# true for the Voyager Quad CPU card which will not boot without
+# This directive.  16 byte aligment is recommended by intel.
 #
+	.align 16
 gdt:
-	.fill GDT_ENTRY_KERNEL_CS,8,0
+	.fill GDT_ENTRY_BOOT_CS,8,0
 
 	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
 	.word	0				# base address = 0
@@ -1094,12 +1127,17 @@
 	.word	0x9200				# data read/write
 	.word	0x00CF				# granularity = 4096, 386
 						#  (+5th nibble of limit)
+gdt_end:
+	.align	4
+	
+	.word	0				# alignment byte
 idt_48:
 	.word	0				# idt limit = 0
 	.word	0, 0				# idt base = 0L
-gdt_48:
-	.word	GDT_ENTRY_KERNEL_CS*8 + 16 - 1	# gdt limit
 
+	.word	0				# alignment byte
+gdt_48:
+	.word	gdt_end - gdt - 1		# gdt limit
 	.word	0, 0				# gdt base (filled in later)
 
 # Include video setup & detection code
diff -Nru a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/Makefile	Tue Nov  5 15:35:01 2002
@@ -21,7 +21,8 @@
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_ACPI)		+= acpi.o
 obj-$(CONFIG_ACPI_SLEEP)	+= acpi_wakeup.o
-obj-$(CONFIG_X86_SMP)		+= smp.o smpboot.o trampoline.o
+obj-$(CONFIG_X86_SMP)		+= smp.o smpboot.o
+obj-$(CONFIG_X86_TRAMPOLINE)	+= trampoline.o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-$(CONFIG_X86_LOCAL_APIC)	+= apic.o nmi.o
 obj-$(CONFIG_X86_IO_APIC)	+= io_apic.o
diff -Nru a/arch/i386/kernel/head.S b/arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/head.S	Tue Nov  5 15:35:01 2002
@@ -15,6 +15,7 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/desc.h>
+#include <asm/cache.h>
 
 #define OLD_CL_MAGIC_ADDR	0x90020
 #define OLD_CL_MAGIC		0xA33F
@@ -46,7 +47,7 @@
  * Set segments to known values
  */
 	cld
-	movl $(__KERNEL_DS),%eax
+	movl $(__BOOT_DS),%eax
 	movl %eax,%ds
 	movl %eax,%es
 	movl %eax,%fs
@@ -306,7 +307,7 @@
 
 ENTRY(stack_start)
 	.long init_thread_union+8192
-	.long __KERNEL_DS
+	.long __BOOT_DS
 
 /* This is the default interrupt "handler" :-) */
 int_msg:
@@ -349,12 +350,12 @@
 	.long idt_table
 
 # boot GDT descriptor (later on used by CPU#0):
-
+	.word 0				# 32 bit align gdt_desc.address
 cpu_gdt_descr:
 	.word GDT_ENTRIES*8-1
 	.long cpu_gdt_table
 
-	.fill NR_CPUS-1,6,0		# space for the other GDT descriptors
+	.fill NR_CPUS-1,8,0		# space for the other GDT descriptors
 
 /*
  * This is initialized to create an identity-mapping at 0-8M (for bootup
@@ -405,10 +406,21 @@
  */
 .data
 
-ALIGN
 /*
  * The Global Descriptor Table contains 28 quadwords, per-CPU.
  */
+#ifdef CONFIG_SMP
+/*
+ * The boot_gdt_table must mirror the equivalent in setup.S and is
+ * used only by the trampoline for booting other CPUs
+ */
+	.align L1_CACHE_BYTES
+ENTRY(boot_gdt_table)
+	.fill GDT_ENTRY_BOOT_CS,8,0
+	.quad 0x00cf9a000000ffff	/* kernel 4GB code at 0x00000000 */
+	.quad 0x00cf92000000ffff	/* kernel 4GB data at 0x00000000 */
+#endif
+	.align L1_CACHE_BYTES
 ENTRY(cpu_gdt_table)
 	.quad 0x0000000000000000	/* NULL descriptor */
 	.quad 0x0000000000000000	/* 0x0b reserved */
diff -Nru a/arch/i386/kernel/irq.c b/arch/i386/kernel/irq.c
--- a/arch/i386/kernel/irq.c	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/irq.c	Tue Nov  5 15:35:01 2002
@@ -167,7 +167,7 @@
 		if (cpu_online(j))
 			p += seq_printf(p, "%10u ", nmi_count(j));
 	seq_putc(p, '\n');
-#if CONFIG_X86_LOCAL_APIC
+#ifdef CONFIG_X86_LOCAL_APIC
 	seq_printf(p, "LOC: ");
 	for (j = 0; j < NR_CPUS; j++)
 		if (cpu_online(j))
diff -Nru a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile
--- a/arch/i386/kernel/timers/Makefile	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/timers/Makefile	Tue Nov  5 15:35:01 2002
@@ -4,8 +4,8 @@
 
 obj-y := timer.o
 
-obj-y += timer_tsc.o
-obj-y += timer_pit.o
-obj-$(CONFIG_X86_CYCLONE)   += timer_cyclone.o
+obj-$(CONFIG_X86_TSC)		+= timer_tsc.o
+obj-$(CONFIG_X86_PIT)		+= timer_pit.o
+obj-$(CONFIG_X86_CYCLONE)	+= timer_cyclone.o
 
 include $(TOPDIR)/Rules.make
diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
--- a/arch/i386/kernel/timers/timer.c	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/timers/timer.c	Tue Nov  5 15:35:01 2002
@@ -7,8 +7,10 @@
 
 /* list of timers, ordered by preference, NULL terminated */
 static struct timer_opts* timers[] = {
+#ifdef CONFIG_X86_TSC
 	&timer_tsc,
-#ifndef CONFIG_X86_TSC
+#endif
+#ifdef CONFIG_X86_PIT
 	&timer_pit,
 #endif
 	NULL,
diff -Nru a/arch/i386/kernel/timers/timer_pit.c b/arch/i386/kernel/timers/timer_pit.c
--- a/arch/i386/kernel/timers/timer_pit.c	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/timers/timer_pit.c	Tue Nov  5 15:35:01 2002
@@ -9,7 +9,9 @@
 #include <linux/irq.h>
 #include <asm/mpspec.h>
 #include <asm/timer.h>
+#include <asm/smp.h>
 #include <asm/io.h>
+#include <asm/arch_hooks.h>
 
 extern spinlock_t i8259A_lock;
 extern spinlock_t i8253_lock;
diff -Nru a/arch/i386/kernel/trampoline.S b/arch/i386/kernel/trampoline.S
--- a/arch/i386/kernel/trampoline.S	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/kernel/trampoline.S	Tue Nov  5 15:35:01 2002
@@ -54,7 +54,7 @@
 	lmsw	%ax		# into protected mode
 	jmp	flush_instr
 flush_instr:
-	ljmpl	$__KERNEL_CS, $0x00100000
+	ljmpl	$__BOOT_CS, $0x00100000
 			# jump to startup_32 in arch/i386/kernel/head.S
 
 idt_48:
@@ -67,8 +67,8 @@
 #
 
 gdt_48:
-	.word	0x0800			# gdt limit = 2048, 256 GDT entries
-	.long	cpu_gdt_table-__PAGE_OFFSET	# gdt base = gdt (first SMP CPU)
+	.word	__BOOT_DS + 7			# gdt limit
+	.long	boot_gdt_table-__PAGE_OFFSET	# gdt base = gdt (first SMP CPU)
 
 .globl trampoline_end
 trampoline_end:
diff -Nru a/arch/i386/mach-voyager/voyager_basic.c b/arch/i386/mach-voyager/voyager_basic.c
--- a/arch/i386/mach-voyager/voyager_basic.c	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/mach-voyager/voyager_basic.c	Tue Nov  5 15:35:01 2002
@@ -21,6 +21,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/reboot.h>
+#include <linux/sysrq.h>
 #include <asm/io.h>
 #include <asm/pgalloc.h>
 #include <asm/voyager.h>
@@ -41,6 +42,21 @@
 
 struct voyager_SUS *voyager_SUS = NULL;
 
+#ifdef CONFIG_SMP
+static void
+voyager_dump(int dummy1, struct pt_regs *dummy2, struct tty_struct *dummy3)
+{
+	/* get here via a sysrq */
+	voyager_smp_dump();
+}
+
+static struct sysrq_key_op sysrq_voyager_dump_op = {
+	.handler	= voyager_dump,
+	.help_msg	= "voyager",
+	.action_msg	= "Dump Voyager Status\n",
+};
+#endif
+
 void
 voyager_detect(struct voyager_bios_info *bios)
 {
@@ -62,6 +78,9 @@
 			printk("\n**WARNING**: Voyager HAL only supports Levels 4 and 5 Architectures at the moment\n\n");
 		/* install the power off handler */
 		pm_power_off = voyager_power_off;
+#ifdef CONFIG_SMP
+		register_sysrq_key('c', &sysrq_voyager_dump_op);
+#endif
 	} else {
 		printk("\n\n**WARNING**: No Voyager Subsystem Found\n");
 	}
@@ -141,15 +160,6 @@
 	pg0[0] = old;
 	local_flush_tlb();
 	return retval;
-}
-
-void
-voyager_dump()
-{
-	/* get here via a sysrq */
-#ifdef CONFIG_SMP
-	voyager_smp_dump();
-#endif
 }
 
 /* voyager specific handling code for timer interrupts.  Used to hand
diff -Nru a/arch/i386/mach-voyager/voyager_smp.c b/arch/i386/mach-voyager/voyager_smp.c
--- a/arch/i386/mach-voyager/voyager_smp.c	Tue Nov  5 15:35:01 2002
+++ b/arch/i386/mach-voyager/voyager_smp.c	Tue Nov  5 15:35:01 2002
@@ -50,11 +50,6 @@
  * indexed physically */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
-/* Per CPU interrupt stacks */
-extern union thread_union init_irq_union;
-union thread_union *irq_stacks[NR_CPUS] __cacheline_aligned =
-	{ &init_irq_union, };
-
 /* physical ID of the CPU used to boot the system */
 unsigned char boot_cpu_id;
 
@@ -450,6 +445,7 @@
 	struct cpuinfo_x86 *c=&cpu_data[id];
 
 	*c = boot_cpu_data;
+
 	identify_cpu(c);
 }
 
@@ -512,6 +508,11 @@
 	/* if we're a quad, we may need to bootstrap other CPUs */
 	do_quad_bootstrap();
 
+	/* FIXME: this is rather a poor hack to prevent the CPU
+	 * activating softirqs while it's supposed to be waiting for
+	 * permission to proceed.  Without this, the new per CPU stuff
+	 * in the softirqs will fail */
+	local_irq_disable();
 	set_bit(cpuid, &cpu_callin_map);
 
 	/* signal that we're done */
@@ -519,6 +520,7 @@
 
 	while (!test_bit(cpuid, &smp_commenced_mask))
 		rep_nop();
+	local_irq_enable();
 
 	local_flush_tlb();
 
@@ -537,28 +539,6 @@
 }
 
 
-static void __init setup_irq_stack(struct task_struct *p, int cpu)
-{
-	unsigned long stk;
-
-	stk = __get_free_pages(GFP_KERNEL, THREAD_ORDER+1);
-	if (!stk)
-		panic("I can't seem to allocate my irq stack.  Oh well, giving up.");
-
-	irq_stacks[cpu] = (void *)stk;
-	memset(irq_stacks[cpu], 0, THREAD_SIZE);
-	irq_stacks[cpu]->thread_info.cpu = cpu;
-	irq_stacks[cpu]->thread_info.preempt_count = 1;
-					/* interrupts are not preemptable */
-	p->thread_info->irq_stack = irq_stacks[cpu];
-
-	/* If we want to make the irq stack more than one unit
-	 * deep, we can chain then off of the irq_stack pointer
-	 * here.
-	 */
-}
-
-
 /* Routine to kick start the given CPU and wait for it to report ready
  * (or timeout in startup).  When this routine returns, the requested
  * CPU is either fully running and configured or known to be dead.
@@ -617,20 +597,17 @@
 	if(IS_ERR(idle))
 		panic("failed fork for CPU%d", cpu);
 
-	setup_irq_stack(idle, cpu);
-
 	init_idle(idle, cpu);
 
 	idle->thread.eip = (unsigned long) start_secondary;
 	unhash_process(idle);
-
-	/* The -4 is to correct for the fact that the stack pointer
-	 * is used to find the location of the thread_info structure
-	 * by masking off several of the LSBs.  Without the -4, esp
-	 * is pointing to the page after the one the stack is on.
-	 */
-	stack_start.esp = (void *)(THREAD_SIZE - 4 + (char *)idle->thread_info);
-
+	/* init_tasks (in sched.c) is indexed logically */
+#if 0
+	// for AC kernels
+	stack_start.esp = (THREAD_SIZE + (__u8 *)TSK_TO_KSTACK(idle));
+#else
+	stack_start.esp = (void *) (1024 + PAGE_SIZE + (char *)idle->thread_info);
+#endif
 	/* Note: Don't modify initial ss override */
 	VDEBUG(("VOYAGER SMP: Booting CPU%d at 0x%lx[%x:%x], stack %p\n", cpu, 
 		(unsigned long)hijack_source.val, hijack_source.idt.Segment,
@@ -764,6 +741,9 @@
 
 	/* enable our own CPIs */
 	vic_enable_cpi();
+
+	set_bit(boot_cpu_id, &cpu_online_map);
+	set_bit(boot_cpu_id, &cpu_callout_map);
 	
 	/* loop over all the extended VIC CPUs and boot them.  The 
 	 * Quad CPUs must be bootstrapped by their extended VIC cpu */
@@ -1312,12 +1292,9 @@
 static inline void
 wrapper_smp_local_timer_interrupt(struct pt_regs *regs)
 {
-	__u8 cpu = smp_processor_id();
-
 	irq_enter();
 	smp_local_timer_interrupt(regs);
 	irq_exit();
-	
 }
 
 /* local (per CPU) timer interrupt.  It does both profiling and
diff -Nru a/drivers/char/sysrq.c b/drivers/char/sysrq.c
--- a/drivers/char/sysrq.c	Tue Nov  5 15:35:01 2002
+++ b/drivers/char/sysrq.c	Tue Nov  5 15:35:01 2002
@@ -35,10 +35,6 @@
 
 #include <asm/ptrace.h>
 
-#ifdef CONFIG_VOYAGER
-#include <asm/voyager.h>
-#endif
-
 extern void reset_vc(unsigned int);
 extern struct list_head super_blocks;
 
@@ -323,14 +319,6 @@
 	action_msg:	"Terminate All Tasks",
 };
 
-#ifdef CONFIG_VOYAGER
-static struct sysrq_key_op sysrq_voyager_dump_op = {
-	handler:	voyager_dump,
-	help_msg:	"voyager",
-	action_msg:	"Dump Voyager Status\n",
-};
-#endif
-
 static void sysrq_handle_kill(int key, struct pt_regs *pt_regs,
 			      struct tty_struct *tty) 
 {
@@ -364,11 +352,7 @@
 		 it is handled specially on the sparc
 		 and will never arrive */
 /* b */	&sysrq_reboot_op,
-#ifdef CONFIG_VOYAGER
-/* c */ &sysrq_voyager_dump_op,
-#else
-/* c */	NULL,
-#endif
+/* c */ NULL, /* May be assigned at init time by SMP VOYAGER */
 /* d */	NULL,
 /* e */	&sysrq_term_op,
 /* f */	NULL,
diff -Nru a/include/asm-i386/desc.h b/include/asm-i386/desc.h
--- a/include/asm-i386/desc.h	Tue Nov  5 15:35:01 2002
+++ b/include/asm-i386/desc.h	Tue Nov  5 15:35:01 2002
@@ -13,6 +13,7 @@
 struct Xgt_desc_struct {
 	unsigned short size;
 	unsigned long address __attribute__((packed));
+	unsigned short pad;
 } __attribute__ ((packed));
 
 extern struct Xgt_desc_struct idt_descr, cpu_gdt_descr[NR_CPUS];
diff -Nru a/include/asm-i386/hw_irq.h b/include/asm-i386/hw_irq.h
--- a/include/asm-i386/hw_irq.h	Tue Nov  5 15:35:01 2002
+++ b/include/asm-i386/hw_irq.h	Tue Nov  5 15:35:01 2002
@@ -131,7 +131,7 @@
 
 #endif /* CONFIG_PROFILING */
  
-#ifdef CONFIG_SMP /*more of this file should probably be ifdefed SMP */
+#if defined(CONFIG_SMP) && !defined(CONFIG_VOYAGER) /*more of this file should probably be ifdefed SMP */
 static inline void hw_resend_irq(struct hw_interrupt_type *h, unsigned int i) {
 	if (IO_APIC_IRQ(i))
 		send_IPI_self(IO_APIC_VECTOR(i));
diff -Nru a/include/asm-i386/segment.h b/include/asm-i386/segment.h
--- a/include/asm-i386/segment.h	Tue Nov  5 15:35:01 2002
+++ b/include/asm-i386/segment.h	Tue Nov  5 15:35:01 2002
@@ -69,6 +69,14 @@
 
 #define GDT_SIZE (GDT_ENTRIES * 8)
 
+/* Simple and small GDT entries for booting only */
+
+#define GDT_ENTRY_BOOT_CS		2
+#define __BOOT_CS	(GDT_ENTRY_BOOT_CS * 8)
+
+#define GDT_ENTRY_BOOT_DS		(GDT_ENTRY_BOOT_CS + 1)
+#define __BOOT_DS	(GDT_ENTRY_BOOT_DS * 8)
+
 /*
  * The interrupt descriptor table has room for 256 idt's,
  * the global descriptor table is dependent on the number
diff -Nru a/include/asm-i386/smp.h b/include/asm-i386/smp.h
--- a/include/asm-i386/smp.h	Tue Nov  5 15:35:01 2002
+++ b/include/asm-i386/smp.h	Tue Nov  5 15:35:01 2002
@@ -6,6 +6,7 @@
  */
 #ifndef __ASSEMBLY__
 #include <linux/config.h>
+#include <linux/kernel.h>
 #include <linux/threads.h>
 #endif
 
@@ -83,11 +84,22 @@
 #define cpu_possible(cpu) (cpu_callout_map & (1<<(cpu)))
 #define cpu_online(cpu) (cpu_online_map & (1<<(cpu)))
 
+#define for_each_cpu(cpu, mask) \
+	for(mask = cpu_online_map; \
+	    cpu = __ffs(mask), mask != 0; \
+	    mask &= ~(1<<cpu))
+
 extern inline unsigned int num_online_cpus(void)
 {
 	return hweight32(cpu_online_map);
 }
 
+/* We don't mark CPUs online until __cpu_up(), so we need another measure */
+static inline int num_booting_cpus(void)
+{
+	return hweight32(cpu_callout_map);
+}
+
 extern inline int any_online_cpu(unsigned int mask)
 {
 	if (mask & cpu_online_map)
@@ -95,7 +107,7 @@
 
 	return -1;
 }
-
+#ifdef CONFIG_X86_LOCAL_APIC
 static __inline int hard_smp_processor_id(void)
 {
 	/* we don't want to mark this access volatile - bad code generation */
@@ -108,12 +120,7 @@
 	return GET_APIC_LOGICAL_ID(*(unsigned long *)(APIC_BASE+APIC_LDR));
 }
 
-/* We don't mark CPUs online until __cpu_up(), so we need another measure */
-static inline int num_booting_cpus(void)
-{
-	return hweight32(cpu_callout_map);
-}
-
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #define NO_PROC_ID		0xFF		/* No processor magic marker */
diff -Nru a/include/asm-i386/voyager.h b/include/asm-i386/voyager.h
--- a/include/asm-i386/voyager.h	Tue Nov  5 15:35:01 2002
+++ b/include/asm-i386/voyager.h	Tue Nov  5 15:35:01 2002
@@ -504,7 +504,6 @@
 extern int voyager_memory_detect(int region, __u32 *addr, __u32 *length);
 extern void voyager_smp_intr_init(void);
 extern __u8 voyager_extended_cmos_read(__u16 cmos_address);
-extern void voyager_dump(void);
 extern void voyager_smp_dump(void);
 extern void voyager_timer_interrupt(struct pt_regs *regs);
 extern void smp_local_timer_interrupt(struct pt_regs * regs);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-05 20:45 Voyager subarchitecture for 2.5.46 J.E.J. Bottomley
@ 2002-11-06  2:31 ` john stultz
  2002-11-06 13:43   ` Alan Cox
  2002-11-06 15:03   ` J.E.J. Bottomley
  0 siblings, 2 replies; 32+ messages in thread
From: john stultz @ 2002-11-06  2:31 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Linus Torvalds, lkml

On Tue, 2002-11-05 at 16:35, J.E.J. Bottomley wrote:
> It includes the boot GDT stuff, added configuration options for the 
> kernel/timers directory so that things which can't maintain a TSC can turn it 
> off at compile time.

Just a few comments on the CONFIG_X86_TSC changes:

> diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
> --- a/arch/i386/Kconfig	Tue Nov  5 15:35:01 2002
> +++ b/arch/i386/Kconfig	Tue Nov  5 15:35:01 2002
> @@ -1636,17 +1649,32 @@
>  
>  source "lib/Kconfig"
>  
> +config X86_TSC
> +	bool
> +	depends on  !VOYAGER && (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX || M586TSC)
> +	default y
> +
> +config X86_PIT
> +	bool
> +	depends on M386 || M486 || M586 || M586TSC || VOYAGER
> +	default y
> +

I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now. 

Then make the arch/i386/timers/Makefile change to be something like:

obj-y := timer.o timer_tsc.o timer_pit.o
obj-$(CONFIG_X86_TSC)		-= timer_pit.o #does this(-=) work?
obj-$(CONFIG_X86_CYCYLONE)	+= timer_cyclone.o


Then when you boot, boot w/ notsc and you should be fine. 

I do want to add some sort of TSC blacklisting so one doesn't always
have to boot w/ notsc if your machine is
detectable/compiled-exclusively= for. But I've got a few other issues in
the queue first. 

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06  2:31 ` john stultz
@ 2002-11-06 13:43   ` Alan Cox
  2002-11-06 21:35     ` john stultz
  2002-11-06 15:03   ` J.E.J. Bottomley
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Cox @ 2002-11-06 13:43 UTC (permalink / raw)
  To: john stultz; +Cc: J.E.J. Bottomley, Linus Torvalds, lkml

On Wed, 2002-11-06 at 02:31, john stultz wrote:
> I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now. 
> 
> Then make the arch/i386/timers/Makefile change to be something like:
> 
> obj-y := timer.o timer_tsc.o timer_pit.o
> obj-$(CONFIG_X86_TSC)		-= timer_pit.o #does this(-=) work?
> obj-$(CONFIG_X86_CYCYLONE)	+= timer_cyclone.o

Not everything is going to have a PIT. Also I need to know if there is a
PIT for a few other things so I'd prefer to keep it, but I'm not
excessively bothered


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06  2:31 ` john stultz
  2002-11-06 13:43   ` Alan Cox
@ 2002-11-06 15:03   ` J.E.J. Bottomley
  2002-11-06 15:38     ` Alan Cox
                       ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 15:03 UTC (permalink / raw)
  To: john stultz; +Cc: J.E.J. Bottomley, Linus Torvalds, lkml

johnstul@us.ibm.com said:
> I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now.  

> Then when you boot, boot w/ notsc and you should be fine. 
> I do want to add some sort of TSC blacklisting so one doesn't always
> have to boot w/ notsc if your machine is detectable/
> compiled-exclusively= for. But I've got a few other issues in the
> queue first.  

There are certain architectures (voyager is the only one currently supported, 
but I suspect the Numa machines will have this too) where the TSC cannot be 
used for cross CPU timings because the processors are driven by separate 
clocks and may even have different clock speeds.

What I need is an option simply not to compile in the TSC code and use the PIT 
instead.  What I'm trying to do with the TSC and PIT options is give three 
choices:

1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y

2. May use TSC but check first (blacklist, notsc kernel option).  X86_TSC=y, 
X86_PIT=y

3. TSC is always OK so don't need PIT.  X86_TSC=y, X86_PIT=n

We probably need to make the notsc and dodgy tsc check contingent on X86_PIT 
(or a config option that says we have some other timer mechanism compiled in). 
 Really, the options should probably be handled in timer.c.

Theres also another problem in that the timer_init is called too early in the 
boot sequence to get a message out to the user, so the panic in timers.c about 
not finding a suitable timer will never be seen (the system will just lock up 
on boot).

Do we have an option for a deferred panic that will trip just after we init 
the console and clean out the printk buffer?

> Then make the arch/i386/timers/Makefile change to be something like:
> 
> obj-y := timer.o timer_tsc.o timer_pit.o
> obj-$(CONFIG_X86_TSC)		-= timer_pit.o #does this(-=) work?
> obj-$(CONFIG_X86_CYCYLONE)	+= timer_cyclone.o

Even if it works, the config option style is confusing.  It's easier just to 
have a positive option (CONFIG_X86_PIT) for this.

James




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:03   ` J.E.J. Bottomley
@ 2002-11-06 15:38     ` Alan Cox
  2002-11-06 16:09       ` Christer Weinigel
  2002-11-06 15:45     ` Linus Torvalds
  2002-11-06 19:30     ` john stultz
  2 siblings, 1 reply; 32+ messages in thread
From: Alan Cox @ 2002-11-06 15:38 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: john stultz, Linus Torvalds, lkml

On Wed, 2002-11-06 at 15:03, J.E.J. Bottomley wrote:
> There are certain architectures (voyager is the only one currently supported, 
> but I suspect the Numa machines will have this too) where the TSC cannot be 
> used for cross CPU timings because the processors are driven by separate 
> clocks and may even have different clock speeds.

IBM Summit is indeed another one. 

> What I need is an option simply not to compile in the TSC code and use the PIT 
> instead.  What I'm trying to do with the TSC and PIT options is give three 
> choices:
> 
> 1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
> 
> 2. May use TSC but check first (blacklist, notsc kernel option).  X86_TSC=y, 
> X86_PIT=y
> 
> 3. TSC is always OK so don't need PIT.  X86_TSC=y, X86_PIT=n

[Plus we need X86_CYCLONE and we may need X86_SOMETHING else for some
pending stuff]

> We probably need to make the notsc and dodgy tsc check contingent on X86_PIT 
> (or a config option that says we have some other timer mechanism compiled in). 
>  Really, the options should probably be handled in timer.c.

The dodgy_tsc check is now obsolete. The known cases are handled with
workarounds and CS5510/20 can now use the TSC

> Do we have an option for a deferred panic that will trip just after we init 
> the console and clean out the printk buffer?

Point to timer_none, check that later on in the boot



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:03   ` J.E.J. Bottomley
  2002-11-06 15:38     ` Alan Cox
@ 2002-11-06 15:45     ` Linus Torvalds
  2002-11-06 16:19       ` Alan Cox
                         ` (2 more replies)
  2002-11-06 19:30     ` john stultz
  2 siblings, 3 replies; 32+ messages in thread
From: Linus Torvalds @ 2002-11-06 15:45 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: john stultz, lkml


On Wed, 6 Nov 2002, J.E.J. Bottomley wrote:
> 
> There are certain architectures (voyager is the only one currently supported, 
> but I suspect the Numa machines will have this too) where the TSC cannot be 
> used for cross CPU timings because the processors are driven by separate 
> clocks and may even have different clock speeds.

I disagree.

We should use the TSC everywhere (if it exists, of course), and the fact
that two CPU's don't run synchronized shouldn't matter.

The solution is to make all the TSC calibration and offsets be per-CPU.  
That should be fairly trivial, since we _already_ do the calibration
per-CPU anyway for bogomips (for no good reason except the whole process
is obviously just a funny thing to do, which is the point of bogomips).

The only even half-way "interesting" case I see is a udelay() getting
preempted, and I suspect most of those already run non-preemptable, so in
the short run we could just force that with preempt_off()/on() inside
udelay().

In the long run we probably do _not_ want to do that nonpreemptable
udelay(), but even that is debatable (anybody who is willing to be
preempted should not have been using udelay() in the first place, but
actually sleeping - and people who use udelay() for things like IO port
accesses etc almost certainly won't mind not being moved across CPU's).

Let's face it, we don't have that many tsc-related data structures. What, 
we have:

 - loops_per_jiffy, which is already a per-CPU thing, used by udelay()
 - fast_gettimeoffset_quotient - which is global right now and shouldn't 
   be.
 - delay_at_last_interrupt. See previous.
 - possibly even all of xtime and all the NTP stuff

It's clearly stupid in the long run to depend on the TSC synchronization.
We should consider different CPU's to be different clock-domains, and just
synchronize them using the primitives we already have (hey, people can use
ntp to synchronize over networks quite well, and that's without the kind
of synchronization primitives that we have within the same box).

Anybody willin gto look into this? I suspect the numa people should be
more motivated than most of us.. You still want fast gettimeofday() on
NUMA too..

		Linus


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:38     ` Alan Cox
@ 2002-11-06 16:09       ` Christer Weinigel
  0 siblings, 0 replies; 32+ messages in thread
From: Christer Weinigel @ 2002-11-06 16:09 UTC (permalink / raw)
  To: Alan Cox; +Cc: J.E.J. Bottomley, john stultz, Linus Torvalds, lkml

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> > What I need is an option simply not to compile in the TSC code and use the PIT 
> > instead.  What I'm trying to do with the TSC and PIT options is give three 
> > choices:
> > 
> > 1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
> > 
> > 2. May use TSC but check first (blacklist, notsc kernel option).  X86_TSC=y, 
> > X86_PIT=y
> > 
> > 3. TSC is always OK so don't need PIT.  X86_TSC=y, X86_PIT=n
> 
> [Plus we need X86_CYCLONE and we may need X86_SOMETHING else for some
> pending stuff]

Yes, for example the NatSemi SC2200 has a 32+1 bit "High Resolution
Timer" that can be clocked either at 1MHz or 27MHz and that can
generate an interrupt whenever it wraps around.

Just using the High Resolution timer would avoid the known problems
with the TSC (stops on HLT, a bug when the low 32 bits of the TSC wrap
around) and the PIT (something somewhere, maybe SMM mode, seems to
mess upp the latch values).

  /Christer

-- 
"Just how much can I get away with and still go to heaven?"

Freelance consultant specializing in device driver programming for Linux 
Christer Weinigel <christer@weinigel.se>  http://www.weinigel.se

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 16:19       ` Alan Cox
@ 2002-11-06 16:12         ` Linus Torvalds
  2002-11-06 16:45           ` Alan Cox
  2002-11-10 16:30           ` Pavel Machek
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2002-11-06 16:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: J.E.J. Bottomley, john stultz, lkml


On 6 Nov 2002, Alan Cox wrote:
>
> On Wed, 2002-11-06 at 15:45, Linus Torvalds wrote:
> > It's clearly stupid in the long run to depend on the TSC synchronization.
> > We should consider different CPU's to be different clock-domains, and just
> > synchronize them using the primitives we already have (hey, people can use
> > ntp to synchronize over networks quite well, and that's without the kind
> > of synchronization primitives that we have within the same box).
> 
> NTP synchronization assumes the clock runs at approximately the same
> speed and that you can 'bend' ticklength to avoid backward steps. Thats
> a really cool idea for the x440 but I wonder how practical it is when we
> have CPU's that keep changing speeds and not always notifying us about
> it either.

Note that you have a _lot_ more flexibility than NTP thanks to the strong 
synchronization that we actually do have between CPU's in the end.

The synchronization just isn't strong enough to allow us to believe that 
the TSC is exactly the _same_. But it is certainly string enough that we 
should be able to do a really good job.

Of course, if the TSC changes speed without telling us, we have problems. 

But that has nothing to do witht he synchronization protocol itself: we 
have problems with that even on a single CPU on laptops right now. Does it 
mean that gettimeofday() gets confused? Sure as hell. But it doesn't get 
any _worse_ from being done separately on multiple CPU's.

(And it _does_ get slightly better. On multiple CPU's with per-CPU time
structures at least you _can_ handle the case where one CPU runs at a
different speed, so at least you could handle the case where one CPU is
slowed down explicitly much better than we can right now).

As an example of something that is simpler in the MP/NUMA world than in 
NTP: we see the processes migrating, and we can fairly trivially do things 
like

 - every gettimeofday() will always save the value we return, along with a 
   sequence number (which is mainly read-only, so it's ok to share among 
   CPU's)

 - every "settimeofday()" will increase the sequence number

 - when the next gettimeofday happens, we can check the sequence number 
   and the old gettimeofday, and verify that we get monotonic behaviour in 
   the absense of explicit date setting. This allows us to handle small 
   problems gracefully ("return the old time + 1 ns" to make it 
   monotonic even when we screw up), _and_ it will also act as a big clue
   for us that we should try to synchronize - so that we basically never 
   need to worry about "should I check the clocks" (where "basically 
   never" may be "we check the clocks every minute or so if nothing else 
   happens")

Basically, I think NTP itself would be _way_ overkill between CPU's, I 
wasn't really suggesting we use NTP as the main mechanism at that level. I 
just suspect that a lot of the data structures and info that we already 
have to have for NTP might be used as help.

		Linus


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:45     ` Linus Torvalds
@ 2002-11-06 16:19       ` Alan Cox
  2002-11-06 16:12         ` Linus Torvalds
  2002-11-06 20:07       ` john stultz
  2002-11-06 22:36       ` H. Peter Anvin
  2 siblings, 1 reply; 32+ messages in thread
From: Alan Cox @ 2002-11-06 16:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J.E.J. Bottomley, john stultz, lkml

On Wed, 2002-11-06 at 15:45, Linus Torvalds wrote:
> It's clearly stupid in the long run to depend on the TSC synchronization.
> We should consider different CPU's to be different clock-domains, and just
> synchronize them using the primitives we already have (hey, people can use
> ntp to synchronize over networks quite well, and that's without the kind
> of synchronization primitives that we have within the same box).

NTP synchronization assumes the clock runs at approximately the same
speed and that you can 'bend' ticklength to avoid backward steps. Thats
a really cool idea for the x440 but I wonder how practical it is when we
have CPU's that keep changing speeds and not always notifying us about
it either.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 16:12         ` Linus Torvalds
@ 2002-11-06 16:45           ` Alan Cox
  2002-11-10 16:30           ` Pavel Machek
  1 sibling, 0 replies; 32+ messages in thread
From: Alan Cox @ 2002-11-06 16:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J.E.J. Bottomley, john stultz, lkml

On Wed, 2002-11-06 at 16:12, Linus Torvalds wrote:
> Basically, I think NTP itself would be _way_ overkill between CPU's, I 
> wasn't really suggesting we use NTP as the main mechanism at that level. I 
> just suspect that a lot of the data structures and info that we already 
> have to have for NTP might be used as help.

I don't think the NTP algorithms are overkill. We have the same problem
space - multiple nodes some of which can be rogue (eg pit misreads, tsc
weirdness), inability to directly sample the clock on another node, need
for an efficient way to bend clocks. The fundamental algorithm is
extremely simple its all the networks, security, ui and glue that isnt -
stuff we can skip.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:03   ` J.E.J. Bottomley
  2002-11-06 15:38     ` Alan Cox
  2002-11-06 15:45     ` Linus Torvalds
@ 2002-11-06 19:30     ` john stultz
  2 siblings, 0 replies; 32+ messages in thread
From: john stultz @ 2002-11-06 19:30 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Linus Torvalds, lkml

On Wed, 2002-11-06 at 07:03, J.E.J. Bottomley wrote:
> There are certain architectures (voyager is the only one currently supported, 
> but I suspect the Numa machines will have this too) where the TSC cannot be 
> used for cross CPU timings because the processors are driven by separate 
> clocks and may even have different clock speeds.

Yes, I'll confirm your suspicions for some NUMA boxes ;)  The timer_opts
structure was largely created to make it easier to remedy this
situation, allowing alternate time sources to be easily added. 
 
> What I need is an option simply not to compile in the TSC code and use the PIT 
> instead.  What I'm trying to do with the TSC and PIT options is give three 
> choices:
> 
> 1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
> 
> 2. May use TSC but check first (blacklist, notsc kernel option).  X86_TSC=y, 
> X86_PIT=y
> 
> 3. TSC is always OK so don't need PIT.  X86_TSC=y, X86_PIT=n

Almost all systems are going to want #3. For those that need an
alternate time source (NUMAQ, Voyager, x440, etc) do we really need the
PIT only option(#1)? It can easily be dynamically detected in #2, and
the resulting kernel will run correctly on more machines which makes for
one less special kernel distros have to create/manage.


> Theres also another problem in that the timer_init is called too early in the 
> boot sequence to get a message out to the user, so the panic in timers.c about 
> not finding a suitable timer will never be seen (the system will just lock up 
> on boot).
> 
> Do we have an option for a deferred panic that will trip just after we init 
> the console and clean out the printk buffer?

Yea, I'm actually working on exactly what Alan suggested (timer_none),
to solve this. Thanks for bringing it up though, I occasionally need a
kick in the pants for motivation :) 


> > Then make the arch/i386/timers/Makefile change to be something like:
> > 
> > obj-y := timer.o timer_tsc.o timer_pit.o
> > obj-$(CONFIG_X86_TSC)		-= timer_pit.o #does this(-=) work?
> > obj-$(CONFIG_X86_CYCYLONE)	+= timer_cyclone.o
> 
> Even if it works, the config option style is confusing.  It's easier just to 
> have a positive option (CONFIG_X86_PIT) for this.

I realize that the negative-option that _X86_TSC has become is a bit
confusing, but it is an optimization option, not a feature option. I've
been thinking of something similar to _X86_PIT, but I want to avoid the
PIT only case that you had in your patch, and try to come up with
something that isn't more confusing then what we started with. 

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:45     ` Linus Torvalds
  2002-11-06 16:19       ` Alan Cox
@ 2002-11-06 20:07       ` john stultz
  2002-11-06 22:36       ` H. Peter Anvin
  2 siblings, 0 replies; 32+ messages in thread
From: john stultz @ 2002-11-06 20:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J.E.J. Bottomley, lkml

On Wed, 2002-11-06 at 07:45, Linus Torvalds wrote:
> The solution is to make all the TSC calibration and offsets be per-CPU.  
> That should be fairly trivial, since we _already_ do the calibration
> per-CPU anyway for bogomips (for no good reason except the whole process
> is obviously just a funny thing to do, which is the point of bogomips).

This was discussed earlier, but dismissed as being a can of worms. It
still is possible to do (and can be added as just another timer_opt
stucture), but uglies like the spread-spectrum feature on the x440,
which actually runs each node at slightly varying speeds, pop up and
make my head hurt. Regardless, the attempt would probably help clean
things up, as you mentioned below. We also would need to round-robin the
timer interrupt, as each cpu would need a last_tsc_low point to generate
an offset. So I'm not opposed to it, but I'm not exactly eager to
implement it. 

> Let's face it, we don't have that many tsc-related data structures. What, 
> we have:
> 
>  - loops_per_jiffy, which is already a per-CPU thing, used by udelay()
>  - fast_gettimeoffset_quotient - which is global right now and shouldn't 
>    be.

Good to see its on your hit-list. :) I mailed out a patch for this
earlier, I'll resend later today.

>  - delay_at_last_interrupt. See previous.

I'll get to this one too, as well as a few other spots where the
timer_opts abstraction isn't clean enough (cpu_khz needs to be pulled
out of the timer_tsc code, etc)

thanks for the feedback
-john



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 13:43   ` Alan Cox
@ 2002-11-06 21:35     ` john stultz
  0 siblings, 0 replies; 32+ messages in thread
From: john stultz @ 2002-11-06 21:35 UTC (permalink / raw)
  To: Alan Cox; +Cc: J.E.J. Bottomley, Linus Torvalds, lkml

On Wed, 2002-11-06 at 05:43, Alan Cox wrote:
> On Wed, 2002-11-06 at 02:31, john stultz wrote:
> > I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now. 
> > 
> > Then make the arch/i386/timers/Makefile change to be something like:
> > 
> > obj-y := timer.o timer_tsc.o timer_pit.o
> > obj-$(CONFIG_X86_TSC)		-= timer_pit.o #does this(-=) work?
> > obj-$(CONFIG_X86_CYCYLONE)	+= timer_cyclone.o
> 
> Not everything is going to have a PIT. Also I need to know if there is a
> PIT for a few other things so I'd prefer to keep it, but I'm not
> excessively bothered

Hmmm. Ok, How about something like what is below? This is very similar
to what James is suggesting, but tries to fix some of the corner cases
as well. The only problem I see with this is that in some places we're
using _X86_PIT_TIMER as an imagined !_X86_TSC_ONLY, so one couldn't have
a kernel that compiled in the Cyclone timer, not the PIT timer, and
allowed one to disable the TSC. 

I'm still not sure this is the way to go, but at least gives James a
chance to poke at my code rather then the other way around. 

thanks
-john


diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig	Mon Nov  4 17:15:15 2002
+++ b/arch/i386/Kconfig	Mon Nov  4 17:15:15 2002
@@ -430,6 +430,11 @@
 	depends on NUMA
 	default y
 
+config X86_PIT_TIMER
+	bool
+	depends on !X86_TSC || X86_NUMAQ
+	default n
+
 config X86_MCE
 	bool "Machine Check Exception"
 	---help---
diff -Nru a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Mon Nov  4 17:15:15 2002
+++ b/arch/i386/kernel/cpu/common.c	Mon Nov  4 17:15:15 2002
@@ -42,7 +42,7 @@
 }
 __setup("cachesize=", cachesize_setup);
 
-#ifndef CONFIG_X86_TSC
+#ifdef CONFIG_X86_PIT_TIMER
 static int tsc_disable __initdata = 0;
 
 static int __init tsc_setup(char *str)
@@ -55,7 +55,7 @@
 
 static int __init tsc_setup(char *str)
 {
-	printk("notsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC.\n");
+	printk("notsc: Kernel not compiled with CONFIG_X86_PIT_TIMER, cannot disable TSC.\n");
 	return 1;
 }
 #endif
diff -Nru a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile
--- a/arch/i386/kernel/timers/Makefile	Mon Nov  4 17:15:15 2002
+++ b/arch/i386/kernel/timers/Makefile	Mon Nov  4 17:15:15 2002
@@ -5,7 +5,7 @@
 obj-y := timer.o
 
 obj-y += timer_tsc.o
-obj-y += timer_pit.o
+obj-$(CONFIG_X86_PIT_TIMER) += timer_pit.o
 obj-$(CONFIG_X86_CYCLONE)   += timer_cyclone.o
 
 include $(TOPDIR)/Rules.make
diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
--- a/arch/i386/kernel/timers/timer.c	Mon Nov  4 17:15:15 2002
+++ b/arch/i386/kernel/timers/timer.c	Mon Nov  4 17:15:15 2002
@@ -8,7 +8,7 @@
 /* list of timers, ordered by preference, NULL terminated */
 static struct timer_opts* timers[] = {
 	&timer_tsc,
-#ifndef CONFIG_X86_TSC
+#ifdef CONFIG_X86_PIT_TIMER
 	&timer_pit,
 #endif
 	NULL,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 15:45     ` Linus Torvalds
  2002-11-06 16:19       ` Alan Cox
  2002-11-06 20:07       ` john stultz
@ 2002-11-06 22:36       ` H. Peter Anvin
  2 siblings, 0 replies; 32+ messages in thread
From: H. Peter Anvin @ 2002-11-06 22:36 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.44.0211060729210.2393-100000@home.transmeta.com>
By author:    Linus Torvalds <torvalds@transmeta.com>
In newsgroup: linux.dev.kernel
> 
> I disagree.
> 
> We should use the TSC everywhere (if it exists, of course), and the fact
> that two CPU's don't run synchronized shouldn't matter.
> 

If it exists, and works :-/

> It's clearly stupid in the long run to depend on the TSC synchronization.
> We should consider different CPU's to be different clock-domains, and just
> synchronize them using the primitives we already have (hey, people can use
> ntp to synchronize over networks quite well, and that's without the kind
> of synchronization primitives that we have within the same box).

Synchronizing them is nice, since it makes RDTSC usable in user
space (without nodelocking.)  If it ain't doable, then it ain't.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-06 16:12         ` Linus Torvalds
  2002-11-06 16:45           ` Alan Cox
@ 2002-11-10 16:30           ` Pavel Machek
  2002-11-10 18:59             ` Linus Torvalds
  1 sibling, 1 reply; 32+ messages in thread
From: Pavel Machek @ 2002-11-10 16:30 UTC (permalink / raw)
  To: Linus Torvalds, vojtech; +Cc: Alan Cox, J.E.J. Bottomley, john stultz, lkml

Hi!

>  - when the next gettimeofday happens, we can check the sequence number 
>    and the old gettimeofday, and verify that we get monotonic behaviour in 
>    the absense of explicit date setting. This allows us to handle small 
>    problems gracefully ("return the old time + 1 ns" to make it 
>    monotonic even when we screw up), _and_ it will also act as a big clue
>    for us that we should try to synchronize - so that we basically never 
>    need to worry about "should I check the clocks" (where "basically 
>    never" may be "we check the clocks every minute or so if nothing else 
>    happens")

Unfortunately, this means "bye bye vsyscalls for gettimeofday".

								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 16:30           ` Pavel Machek
@ 2002-11-10 18:59             ` Linus Torvalds
  2002-11-10 19:18               ` Pavel Machek
  2002-11-10 19:46               ` Vojtech Pavlik
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2002-11-10 18:59 UTC (permalink / raw)
  To: Pavel Machek; +Cc: vojtech, Alan Cox, J.E.J. Bottomley, john stultz, lkml


On Sun, 10 Nov 2002, Pavel Machek wrote:
> 
> Unfortunately, this means "bye bye vsyscalls for gettimeofday".

Not necessarily. All of the fastpatch and the checking can be done by the
vsyscall, and if the vsyscall notices that there is a backwards jump in
time it just gives up and does a real system call. The vsyscall does need
to figure out the CPU it's running on somehow, but that should be solvable
- indexing through the thread ID or something.

That said, I suspect that the real issue with vsyscalls is that they don't
really make much sense. The only system call we've ever found that matters
at all is gettimeofday(), and the vsyscall implementation there looks like
a "cool idea, but doesn't really matter (and complicates things a lot)".

The system call overhead tends to scale up very well with CPU speed (the
one esception being the P4 which just has some internal problems with "int
0x80" and slowed down compared to a PIII).

So I would just suggest not spending a lot of effort on it, considering
the problems it already has. 

		Linus


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 18:59             ` Linus Torvalds
@ 2002-11-10 19:18               ` Pavel Machek
  2002-11-10 19:31                 ` Linus Torvalds
  2002-11-10 19:46               ` Vojtech Pavlik
  1 sibling, 1 reply; 32+ messages in thread
From: Pavel Machek @ 2002-11-10 19:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: vojtech, Alan Cox, J.E.J. Bottomley, john stultz, lkml

Hi!

> > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
> 
> Not necessarily. All of the fastpatch and the checking can be done by the
> vsyscall, and if the vsyscall notices that there is a backwards jump
> in

I believe you need to *store* last value given to userland. Checking
backwards jump can be dealt with, but to check for time going
backwards you need to *store* result each result of vsyscall. I do not
think that can be done from userlnad.

> That said, I suspect that the real issue with vsyscalls is that they don't
> really make much sense. The only system call we've ever found that matters
> at all is gettimeofday(), and the vsyscall implementation there looks like
> a "cool idea, but doesn't really matter (and complicates things a lot)".

I don't like vsyscalls at all...
									Pavel
-- 
When do you have heart between your knees?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 19:18               ` Pavel Machek
@ 2002-11-10 19:31                 ` Linus Torvalds
  2002-11-10 19:42                   ` Pavel Machek
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2002-11-10 19:31 UTC (permalink / raw)
  To: Pavel Machek; +Cc: vojtech, Alan Cox, J.E.J. Bottomley, john stultz, lkml


On Sun, 10 Nov 2002, Pavel Machek wrote:
> 
> I believe you need to *store* last value given to userland.

But that's trivially done: it doesn't even have to be thread-specific, so 
it can be just a global entry anywhere in the process data structures.

This is just a random sanity check thing, after all. It doesn't have to be 
system-global or even per-cpu. The only really important thing is that 
"gettimeofday()" should return monotonically increasing data - and if it 
doesn't, the vsyscall would have to ask why (sometimes it's fine, if 
somebody did a settimeofday, but usually it's a sign of trouble).

But yes, it's certainly a lot more complex than just doing it in a 
controlled system call environment. Which is why I think vsyscalls are 
eventually not worth it.

			Linus


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 19:31                 ` Linus Torvalds
@ 2002-11-10 19:42                   ` Pavel Machek
  2002-11-10 19:48                     ` Vojtech Pavlik
  2002-11-10 20:02                     ` Sean Neakums
  0 siblings, 2 replies; 32+ messages in thread
From: Pavel Machek @ 2002-11-10 19:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: vojtech, Alan Cox, J.E.J. Bottomley, john stultz, lkml

Hi!

> > I believe you need to *store* last value given to userland.
> 
> But that's trivially done: it doesn't even have to be thread-specific, so 
> it can be just a global entry anywhere in the process data
> structures.

> This is just a random sanity check thing, after all. It doesn't have to be 
> system-global or even per-cpu. The only really important thing is that 
> "gettimeofday()" should return monotonically increasing data - and if it 
> doesn't, the vsyscall would have to ask why (sometimes it's fine, if 
> somebody did a settimeofday, but usually it's a sign of trouble).

I believe you need it system-global. If task A tells task B "its
10:30:00" and than task B does gettimeofday and gets "10:29:59", it
will be confused for sure.
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 18:59             ` Linus Torvalds
  2002-11-10 19:18               ` Pavel Machek
@ 2002-11-10 19:46               ` Vojtech Pavlik
  2002-11-11 20:40                 ` john stultz
  1 sibling, 1 reply; 32+ messages in thread
From: Vojtech Pavlik @ 2002-11-10 19:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Alan Cox, J.E.J. Bottomley, john stultz, lkml

On Sun, Nov 10, 2002 at 10:59:55AM -0800, Linus Torvalds wrote:

> On Sun, 10 Nov 2002, Pavel Machek wrote:
> > 
> > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
> 
> Not necessarily. All of the fastpatch and the checking can be done by the
> vsyscall, and if the vsyscall notices that there is a backwards jump in
> time it just gives up and does a real system call. The vsyscall does need
> to figure out the CPU it's running on somehow, but that should be solvable
> - indexing through the thread ID or something.

I'm planning to store the CPU number in the highest bits of the TSC ...

> That said, I suspect that the real issue with vsyscalls is that they don't
> really make much sense. The only system call we've ever found that matters
> at all is gettimeofday(), and the vsyscall implementation there looks like
> a "cool idea, but doesn't really matter (and complicates things a lot)".

It's not complicating things overly. We'd have to go through most of the
hoops anyway if we wanted a fast gettimeofday syscall instead of a
vsyscall.

> The system call overhead tends to scale up very well with CPU speed (the
> one esception being the P4 which just has some internal problems with "int
> 0x80" and slowed down compared to a PIII).
> 
> So I would just suggest not spending a lot of effort on it, considering
> the problems it already has. 

Agreed. The only problem left I see is the need to have an interrupt of
every CPU from time to time to update the per-cpu time values, and to
synchronize those to the 'global timer interrupt' somehow.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 19:42                   ` Pavel Machek
@ 2002-11-10 19:48                     ` Vojtech Pavlik
  2002-11-10 20:02                     ` Sean Neakums
  1 sibling, 0 replies; 32+ messages in thread
From: Vojtech Pavlik @ 2002-11-10 19:48 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Alan Cox, J.E.J. Bottomley, john stultz, lkml

On Sun, Nov 10, 2002 at 08:42:04PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > I believe you need to *store* last value given to userland.
> > 
> > But that's trivially done: it doesn't even have to be thread-specific, so 
> > it can be just a global entry anywhere in the process data
> > structures.
> 
> > This is just a random sanity check thing, after all. It doesn't have to be 
> > system-global or even per-cpu. The only really important thing is that 
> > "gettimeofday()" should return monotonically increasing data - and if it 
> > doesn't, the vsyscall would have to ask why (sometimes it's fine, if 
> > somebody did a settimeofday, but usually it's a sign of trouble).
> 
> I believe you need it system-global. If task A tells task B "its
> 10:30:00" and than task B does gettimeofday and gets "10:29:59", it
> will be confused for sure.

You just need to make sure the time difference is less than the speed of
light in the system times the distance between the two tasks. ;) Really,
relativity, and the limited speed of travel of information kicks in and
saves us here.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 19:42                   ` Pavel Machek
  2002-11-10 19:48                     ` Vojtech Pavlik
@ 2002-11-10 20:02                     ` Sean Neakums
  2002-11-10 20:16                       ` Lars Marowsky-Bree
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Neakums @ 2002-11-10 20:02 UTC (permalink / raw)
  To: linux-kernel

commence  Pavel Machek quotation:

>> This is just a random sanity check thing, after all. It doesn't have to be 
>> system-global or even per-cpu. The only really important thing is that 
>> "gettimeofday()" should return monotonically increasing data - and if it 
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
>> doesn't, the vsyscall would have to ask why (sometimes it's fine, if 
>> somebody did a settimeofday, but usually it's a sign of trouble).
>
> I believe you need it system-global. If task A tells task B "its
> 10:30:00" and than task B does gettimeofday and gets "10:29:59", it
> will be confused for sure.

Hence the requirement that it be monotonically increasing.

-- 
 /                          |
[|] Sean Neakums            |  Questions are a burden to others;
[|] <sneakums@zork.net>     |      answers a prison for oneself.
 \                          |

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 20:02                     ` Sean Neakums
@ 2002-11-10 20:16                       ` Lars Marowsky-Bree
  2002-11-10 22:11                         ` Alan Cox
  0 siblings, 1 reply; 32+ messages in thread
From: Lars Marowsky-Bree @ 2002-11-10 20:16 UTC (permalink / raw)
  To: linux-kernel

On 2002-11-10T20:02:00,
   Sean Neakums <sneakums@zork.net> said:

> > I believe you need it system-global. If task A tells task B "its
> > 10:30:00" and than task B does gettimeofday and gets "10:29:59", it
> > will be confused for sure.
> Hence the requirement that it be monotonically increasing.

Processes expecting time to increase strictly monotonically across process
boundaries will enjoy life in cluster settings or when the admin adjusts the
time.

In short, those programs are already broken.

Of course, physically that should be true, Star Trek or not ;), but it is a
really hard promise to keep across multiple nodes (think Mosix, CC/NC-NUMA or
even real clusters which distributed processing).

Serializing all gettimeofday() calls via a single counter at least is a rather
bad idea.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Principal Squirrel 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 20:16                       ` Lars Marowsky-Bree
@ 2002-11-10 22:11                         ` Alan Cox
  0 siblings, 0 replies; 32+ messages in thread
From: Alan Cox @ 2002-11-10 22:11 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Linux Kernel Mailing List

On Sun, 2002-11-10 at 20:16, Lars Marowsky-Bree wrote:
> Processes expecting time to increase strictly monotonically across process
> boundaries will enjoy life in cluster settings or when the admin adjusts the
> time.

I'd fix your cluster code. OpenMosix gets this right and clusters
outside the mosix/numa/ssi world don't generally care as you are
restarting services, but even then tend to use NTP to keep a bendy but
forwarding moving time.

Alan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-10 19:46               ` Vojtech Pavlik
@ 2002-11-11 20:40                 ` john stultz
  2002-11-11 20:57                   ` J.E.J. Bottomley
  2002-11-11 22:08                   ` Vojtech Pavlik
  0 siblings, 2 replies; 32+ messages in thread
From: john stultz @ 2002-11-11 20:40 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Linus Torvalds, Pavel Machek, Alan Cox, J.E.J. Bottomley, lkml

On Sun, 2002-11-10 at 11:46, Vojtech Pavlik wrote:
> On Sun, Nov 10, 2002 at 10:59:55AM -0800, Linus Torvalds wrote:
> > On Sun, 10 Nov 2002, Pavel Machek wrote:
> > > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
> > 
> > Not necessarily. All of the fastpatch and the checking can be done by the
> > vsyscall, and if the vsyscall notices that there is a backwards jump in
> > time it just gives up and does a real system call. The vsyscall does need
> > to figure out the CPU it's running on somehow, but that should be solvable
> > - indexing through the thread ID or something.
> 
> I'm planning to store the CPU number in the highest bits of the TSC ...

I could be wrong, but we had considered this earlier, and found that
there isn't a way to set the high bits of the TSC, you can only clear
them. 

 
> > The system call overhead tends to scale up very well with CPU speed (the
> > one esception being the P4 which just has some internal problems with "int
> > 0x80" and slowed down compared to a PIII).
> > 
> > So I would just suggest not spending a lot of effort on it, considering
> > the problems it already has. 
> 
> Agreed. The only problem left I see is the need to have an interrupt of
> every CPU from time to time to update the per-cpu time values, and to
> synchronize those to the 'global timer interrupt' somehow.

Yes, this would be needed for per-cpu tsc. 

thanks
-john



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 20:40                 ` john stultz
@ 2002-11-11 20:57                   ` J.E.J. Bottomley
  2002-11-11 21:36                     ` William Lee Irwin III
                                       ` (2 more replies)
  2002-11-11 22:08                   ` Vojtech Pavlik
  1 sibling, 3 replies; 32+ messages in thread
From: J.E.J. Bottomley @ 2002-11-11 20:57 UTC (permalink / raw)
  To: john stultz
  Cc: Vojtech Pavlik, Linus Torvalds, Pavel Machek, Alan Cox,
	J.E.J. Bottomley, lkml

As a beginning, what about the attached patch?  It eliminates the compile time 
TSC options (and thus hopefully the sources of confusion).  I've exported 
tsc_disable, so it can be set by the subarchs if desired (voyager does this) 
and moved the notsc option into the timer_tsc code (which is where it looks 
like it belongs).

James

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	v2.5.47 -> 1.825  
#	arch/i386/kernel/timers/Makefile	1.2     -> 1.4    
#	arch/i386/kernel/cpu/common.c	1.13    -> 1.14   
#	include/asm-i386/processor.h	1.31    -> 1.32   
#	   arch/i386/Kconfig	1.2.1.4 -> 1.6    
#	arch/i386/kernel/timers/timer_tsc.c	1.5     -> 1.6    
#	arch/i386/mach-voyager/setup.c	1.1     -> 1.2    
#	arch/i386/kernel/timers/timer_pit.c	1.3.1.2 -> 1.7    
#	arch/i386/kernel/timers/timer.c	1.3     -> 1.5    
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/11/10	torvalds@home.transmeta.com	1.823
# Linux v2.5.47
# --------------------------------------------
# 02/11/11	jejb@mulgrave.(none)	1.824
# Merge mulgrave.(none):/home/jejb/BK/timer-2.5
# into mulgrave.(none):/home/jejb/BK/timer-new-2.5
# --------------------------------------------
# 02/11/11	jejb@mulgrave.(none)	1.825
# make TSC purely a run-time determined thing
# 
# - remove X86_TSC and X86_PIT compile time options
# - export tsc_disable for architecture setup
# - disable tsc in voyager pre_arch_setup_hook()
# - move "notsc" option into timers_tsc
# --------------------------------------------
#
diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/Kconfig	Mon Nov 11 15:56:40 2002
@@ -253,11 +253,6 @@
 	depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCYRIXIII || MELAN || 
MK6 || M586MMX || M586TSC || M586 || M486
 	default y
 
-config X86_TSC
-	bool
-	depends on MWINCHIP3D || MWINCHIP2 || MCRUSOE || MCYRIXIII || MK7 || MK6 || 
MPENTIUM4 || MPENTIUMIII || M686 || M586MMX || M586TSC
-	default y
-
 config X86_GOOD_APIC
 	bool
 	depends on MK7 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX
diff -Nru a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/cpu/common.c	Mon Nov 11 15:56:40 2002
@@ -42,25 +42,6 @@
 }
 __setup("cachesize=", cachesize_setup);
 
-#ifndef CONFIG_X86_TSC
-static int tsc_disable __initdata = 0;
-
-static int __init tsc_setup(char *str)
-{
-	tsc_disable = 1;
-	return 1;
-}
-#else
-#define tsc_disable 0
-
-static int __init tsc_setup(char *str)
-{
-	printk("notsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC.\n");
-	return 1;
-}
-#endif
-__setup("notsc", tsc_setup);
-
 int __init get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
diff -Nru a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile
--- a/arch/i386/kernel/timers/Makefile	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/Makefile	Mon Nov 11 15:56:40 2002
@@ -2,10 +2,8 @@
 # Makefile for x86 timers
 #
 
-obj-y:= timer.o
+obj-y:= timer.o timer_tsc.o timer_pit.o
 
-obj-y += timer_tsc.o
-obj-y += timer_pit.o
-obj-$(CONFIG_X86_CYCLONE)   += timer_cyclone.o
+obj-$(CONFIG_X86_CYCLONE)	+= timer_cyclone.o
 
 include $(TOPDIR)/Rules.make
diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
--- a/arch/i386/kernel/timers/timer.c	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/timer.c	Mon Nov 11 15:56:40 2002
@@ -8,9 +8,7 @@
 /* list of timers, ordered by preference, NULL terminated */
 static struct timer_opts* timers[] = {
 	&timer_tsc,
-#ifndef CONFIG_X86_TSC
 	&timer_pit,
-#endif
 	NULL,
 };
 
diff -Nru a/arch/i386/kernel/timers/timer_pit.c b/arch/i386/kernel/timers/timer
_pit.c
--- a/arch/i386/kernel/timers/timer_pit.c	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/timer_pit.c	Mon Nov 11 15:56:40 2002
@@ -9,7 +9,9 @@
 #include <linux/irq.h>
 #include <asm/mpspec.h>
 #include <asm/timer.h>
+#include <asm/smp.h>
 #include <asm/io.h>
+#include <asm/arch_hooks.h>
 
 extern spinlock_t i8259A_lock;
 extern spinlock_t i8253_lock;
diff -Nru a/arch/i386/kernel/timers/timer_tsc.c b/arch/i386/kernel/timers/timer
_tsc.c
--- a/arch/i386/kernel/timers/timer_tsc.c	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/timer_tsc.c	Mon Nov 11 15:56:40 2002
@@ -11,6 +11,10 @@
 
 #include <asm/timer.h>
 #include <asm/io.h>
+/* processor.h for distable_tsc flag */
+#include <asm/processor.h>
+
+int tsc_disable __initdata = 0;
 
 extern int x86_udelay_tsc;
 extern spinlock_t i8253_lock;
@@ -286,6 +290,18 @@
 	}
 	return -ENODEV;
 }
+
+/* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
+ * in cpu/common.c */
+static int __init tsc_setup(char *str)
+{
+	tsc_disable = 1;
+	return 1;
+}
+
+__setup("notsc", tsc_setup);
+
+
 
 /************************************************************/
 
diff -Nru a/arch/i386/mach-voyager/setup.c b/arch/i386/mach-voyager/setup.c
--- a/arch/i386/mach-voyager/setup.c	Mon Nov 11 15:56:40 2002
+++ b/arch/i386/mach-voyager/setup.c	Mon Nov 11 15:56:40 2002
@@ -7,6 +7,7 @@
 #include <linux/irq.h>
 #include <linux/interrupt.h>
 #include <asm/arch_hooks.h>
+#include <asm/processor.h>
 
 void __init pre_intr_init_hook(void)
 {
@@ -29,6 +30,7 @@
 
 void __init pre_setup_arch_hook(void)
 {
+	tsc_disable = 1;
 }
 
 void __init trap_init_hook(void)
diff -Nru a/include/asm-i386/processor.h b/include/asm-i386/processor.h
--- a/include/asm-i386/processor.h	Mon Nov 11 15:56:40 2002
+++ b/include/asm-i386/processor.h	Mon Nov 11 15:56:40 2002
@@ -18,6 +18,9 @@
 #include <linux/config.h>
 #include <linux/threads.h>
 
+/* flag for disabling the tsc */
+extern int tsc_disable;
+
 struct desc_struct {
 	unsigned long a,b;
 };



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 20:57                   ` J.E.J. Bottomley
@ 2002-11-11 21:36                     ` William Lee Irwin III
  2002-11-11 21:58                     ` john stultz
  2002-11-12 12:16                     ` Pavel Machek
  2 siblings, 0 replies; 32+ messages in thread
From: William Lee Irwin III @ 2002-11-11 21:36 UTC (permalink / raw)
  To: J.E.J. Bottomley
  Cc: john stultz, Vojtech Pavlik, Linus Torvalds, Pavel Machek,
	Alan Cox, J.E.J. Bottomley, lkml

On Mon, Nov 11, 2002 at 03:57:28PM -0500, J.E.J. Bottomley wrote:
> As a beginning, what about the attached patch?  It eliminates the
> compile time TSC options (and thus hopefully the sources of confusion).
> I've exported tsc_disable, so it can be set by the subarchs if desired
> (voyager does this) and moved the notsc option into the timer_tsc code
> (which is where it looks like it belongs).

This will be very useful to me.


Thanks,
Bill

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 20:57                   ` J.E.J. Bottomley
  2002-11-11 21:36                     ` William Lee Irwin III
@ 2002-11-11 21:58                     ` john stultz
  2002-11-11 22:49                       ` J.E.J. Bottomley
  2002-11-12 12:16                     ` Pavel Machek
  2 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2002-11-11 21:58 UTC (permalink / raw)
  To: J.E.J. Bottomley
  Cc: Vojtech Pavlik, Linus Torvalds, Pavel Machek, Alan Cox,
	J.E.J. Bottomley, lkml

On Mon, 2002-11-11 at 12:57, J.E.J. Bottomley wrote:
> As a beginning, what about the attached patch?  It eliminates the compile time 
> TSC options (and thus hopefully the sources of confusion).  I've exported 
> tsc_disable, so it can be set by the subarchs if desired (voyager does this) 
> and moved the notsc option into the timer_tsc code (which is where it looks 
> like it belongs).

Looks good to me.

We'd still need to go back and yank out the #ifdef CONFIG_X86_TSC'ed 
macros in profile.h and pksched.h or replace them w/ inlines that wrap
the rdtsc calls w/ if(cpu_has_tsc && !tsc_disable) or some such line. 

But yea, its a start, assuming no one screams about not being able to
optimize out the timer_pit code.

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 20:40                 ` john stultz
  2002-11-11 20:57                   ` J.E.J. Bottomley
@ 2002-11-11 22:08                   ` Vojtech Pavlik
  1 sibling, 0 replies; 32+ messages in thread
From: Vojtech Pavlik @ 2002-11-11 22:08 UTC (permalink / raw)
  To: john stultz
  Cc: Vojtech Pavlik, Linus Torvalds, Pavel Machek, Alan Cox,
	J.E.J. Bottomley, lkml

On Mon, Nov 11, 2002 at 12:40:49PM -0800, john stultz wrote:
> On Sun, 2002-11-10 at 11:46, Vojtech Pavlik wrote:
> > On Sun, Nov 10, 2002 at 10:59:55AM -0800, Linus Torvalds wrote:
> > > On Sun, 10 Nov 2002, Pavel Machek wrote:
> > > > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
> > > 
> > > Not necessarily. All of the fastpatch and the checking can be done by the
> > > vsyscall, and if the vsyscall notices that there is a backwards jump in
> > > time it just gives up and does a real system call. The vsyscall does need
> > > to figure out the CPU it's running on somehow, but that should be solvable
> > > - indexing through the thread ID or something.
> > 
> > I'm planning to store the CPU number in the highest bits of the TSC ...
> 
> I could be wrong, but we had considered this earlier, and found that
> there isn't a way to set the high bits of the TSC, you can only clear
> them. 

I'll have to test that. Another option is per-cpu page mappings for
vsyscalls. But that's rather ugly.

> > > The system call overhead tends to scale up very well with CPU speed (the
> > > one esception being the P4 which just has some internal problems with "int
> > > 0x80" and slowed down compared to a PIII).
> > > 
> > > So I would just suggest not spending a lot of effort on it, considering
> > > the problems it already has. 
> > 
> > Agreed. The only problem left I see is the need to have an interrupt of
> > every CPU from time to time to update the per-cpu time values, and to
> > synchronize those to the 'global timer interrupt' somehow.
> 
> Yes, this would be needed for per-cpu tsc. 

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 21:58                     ` john stultz
@ 2002-11-11 22:49                       ` J.E.J. Bottomley
  2002-11-11 23:12                         ` john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: J.E.J. Bottomley @ 2002-11-11 22:49 UTC (permalink / raw)
  To: john stultz
  Cc: J.E.J. Bottomley, Vojtech Pavlik, Linus Torvalds, Pavel Machek,
	Alan Cox, J.E.J. Bottomley, lkml

johnstul@us.ibm.com said:
> We'd still need to go back and yank out the #ifdef CONFIG_X86_TSC'ed
> macros in profile.h and pksched.h or replace them w/ inlines that wrap
> the rdtsc calls w/ if(cpu_has_tsc && !tsc_disable) or some such line.

Actually, the best way to do this might be to vector the rdtsc calls through a 
function pointer (i.e. they return zero always if the TSC is disabled, or the 
TSC value if it's OK).  I think this might be better than checking the 
cpu_has_tsc flag in the code (well it's more expandable anyway, it won't be 
faster...)

When the TSC code is sorted out on a per cpu basis, consumers are probably 
going to expect rdtsc to return usable values whatever CPU it is called on, so 
vectoring the calls now may help this.

James



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 22:49                       ` J.E.J. Bottomley
@ 2002-11-11 23:12                         ` john stultz
  0 siblings, 0 replies; 32+ messages in thread
From: john stultz @ 2002-11-11 23:12 UTC (permalink / raw)
  To: J.E.J. Bottomley
  Cc: Vojtech Pavlik, Linus Torvalds, Pavel Machek, Alan Cox,
	J.E.J. Bottomley, lkml

On Mon, 2002-11-11 at 14:49, J.E.J. Bottomley wrote:
> johnstul@us.ibm.com said:
> > We'd still need to go back and yank out the #ifdef CONFIG_X86_TSC'ed
> > macros in profile.h and pksched.h or replace them w/ inlines that wrap
> > the rdtsc calls w/ if(cpu_has_tsc && !tsc_disable) or some such line.
> 
> Actually, the best way to do this might be to vector the rdtsc calls through a 
> function pointer (i.e. they return zero always if the TSC is disabled, or the 
> TSC value if it's OK).  I think this might be better than checking the 
> cpu_has_tsc flag in the code (well it's more expandable anyway, it won't be 
> faster...)

Sounds good, I'm planning on moving get_cycles to timer_opts, so how
about using that?

> When the TSC code is sorted out on a per cpu basis, consumers are probably 
> going to expect rdtsc to return usable values whatever CPU it is called on, so 
> vectoring the calls now may help this.

Yea, this is an ugly topic. I'm really not very enthusiastic about
per-cpu tsc, because it doesn't necessarilly solve the problem on the
few machines that can't use the global tsc implementation (such as the
x440). True, many of the points Linus made about the current timer_tsc
implementation are valid. It does need to be cleaned up further, and I
have some patches to do so (I'll resend tomorrow, as I'm out sick
today). We should be looking towards multi-frequency systems, and seeing
what we can do to clean things up (ie: cpu_khz is global, etc). 

If you are deadset on doing the percpu method, I'd strongly suggest
creating a new timer_per_cpu_tsc timer_opt struct and implementing it
there, rather then munging the existing code, which works well for most
systems. 

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Voyager subarchitecture for 2.5.46
  2002-11-11 20:57                   ` J.E.J. Bottomley
  2002-11-11 21:36                     ` William Lee Irwin III
  2002-11-11 21:58                     ` john stultz
@ 2002-11-12 12:16                     ` Pavel Machek
  2 siblings, 0 replies; 32+ messages in thread
From: Pavel Machek @ 2002-11-12 12:16 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: lkml

Hi!

> As a beginning, what about the attached patch?  It eliminates the compile time 
> TSC options (and thus hopefully the sources of confusion).  I've exported 
> tsc_disable, so it can be set by the subarchs if desired (voyager does this) 
> and moved the notsc option into the timer_tsc code (which is where it looks 
> like it belongs).

Looks good to me.
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2002-11-12 12:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-05 20:45 Voyager subarchitecture for 2.5.46 J.E.J. Bottomley
2002-11-06  2:31 ` john stultz
2002-11-06 13:43   ` Alan Cox
2002-11-06 21:35     ` john stultz
2002-11-06 15:03   ` J.E.J. Bottomley
2002-11-06 15:38     ` Alan Cox
2002-11-06 16:09       ` Christer Weinigel
2002-11-06 15:45     ` Linus Torvalds
2002-11-06 16:19       ` Alan Cox
2002-11-06 16:12         ` Linus Torvalds
2002-11-06 16:45           ` Alan Cox
2002-11-10 16:30           ` Pavel Machek
2002-11-10 18:59             ` Linus Torvalds
2002-11-10 19:18               ` Pavel Machek
2002-11-10 19:31                 ` Linus Torvalds
2002-11-10 19:42                   ` Pavel Machek
2002-11-10 19:48                     ` Vojtech Pavlik
2002-11-10 20:02                     ` Sean Neakums
2002-11-10 20:16                       ` Lars Marowsky-Bree
2002-11-10 22:11                         ` Alan Cox
2002-11-10 19:46               ` Vojtech Pavlik
2002-11-11 20:40                 ` john stultz
2002-11-11 20:57                   ` J.E.J. Bottomley
2002-11-11 21:36                     ` William Lee Irwin III
2002-11-11 21:58                     ` john stultz
2002-11-11 22:49                       ` J.E.J. Bottomley
2002-11-11 23:12                         ` john stultz
2002-11-12 12:16                     ` Pavel Machek
2002-11-11 22:08                   ` Vojtech Pavlik
2002-11-06 20:07       ` john stultz
2002-11-06 22:36       ` H. Peter Anvin
2002-11-06 19:30     ` john stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox