public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Andi Kleen <ak@suse.de>, Linus Torvalds <torvalds@transmeta.com>
Cc: davem@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)
Date: Sat, 04 Jan 2003 17:01:04 -0800	[thread overview]
Message-ID: <3E1783D0.5A47A299@digeo.com> (raw)
In-Reply-To: p734r8qnkkp.fsf@oldwotan.suse.de

Andi Kleen wrote:
> 
> P.S.: regarding recent lmbench slow downs: I'm a bit
> worried about the two wrmsrs which are in the i386 context switch
> in load_esp0 for sysenter now. Last time I benchmarked WRMSRs on
> Athlon they were really slow and knowing the P4 it is probably
> even slower there. Imho it would be better to undo that patch
> and use Linus' original trampoline stack.
> 

Looks like you're right.  The indications are that this change
has slowed context switches by ~5% on a PIII.   The backout patch
against 2.5.54 is below.  Testing on a P4 would be useful.

2.5.54, stock:

lmbench:

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
i686-linu  Linux 2.5.54    3     16     44    18     47      20      77

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
i686-linu  Linux 2.5.54     3    16   26    65          78        231

AIM9:

     3 tcp_test            10.00       2767  276.70000        24903.00 TCP/IP Messages/second
     4 udp_test            10.00       4658  465.80000        46580.00 UDP/IP DataGrams/second
     5 fifo_test           10.01       6507  650.04995        65005.00 FIFO Messages/second
     6 dgram_pipe          10.00      11228 1122.80000       112280.00 DataGram Pipe Messages/second
     7 pipe_cpy            10.00      15463 1546.30000       154630.00 Pipe Messages/second

pollbench:
pollbench 1 100 5000
  result with handles 1 processes 100 loops 5000:time  9.609487 sec.
pollbench 2 100 2000
  result with handles 2 processes 100 loops 2000:time  4.016496 sec.
pollbench 5 100 2000
  result with handles 5 processes 100 loops 2000:time  4.917921 sec.


2.5.54, with the below backout patch:

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
i686-linu  Linux 2.5.54    3     14     47    18     50      20      61

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
i686-linu  Linux 2.5.54     3    15   25    64          69        231


     3 tcp_test            10.00       2908  290.80000        26172.00 TCP/IP Messages/second
     4 udp_test            10.00       4971  497.10000        49710.00 UDP/IP DataGrams/second
     5 fifo_test           10.01       6642  663.53646        66353.65 FIFO Messages/second
     6 dgram_pipe          10.00      11516 1151.60000       115160.00 DataGram Pipe Messages/second
     7 pipe_cpy            10.00      15930 1593.00000       159300.00 Pipe Messages/second


pollbench 1 100 5000
  result with handles 1 processes 100 loops 5000:time  9.106732 sec.
pollbench 2 100 2000
  result with handles 2 processes 100 loops 2000:time  3.853814 sec.
pollbench 5 100 2000
  result with handles 5 processes 100 loops 2000:time  4.533519 sec.



 arch/i386/kernel/cpu/common.c |    2 +-
 arch/i386/kernel/process.c    |    2 +-
 arch/i386/kernel/sysenter.c   |   34 ++++++++++++++++++++++++++++++----
 arch/i386/kernel/vm86.c       |    4 +---
 include/asm-i386/cpufeature.h |    3 ---
 include/asm-i386/msr.h        |    4 ----
 include/asm-i386/processor.h  |   16 ----------------
 7 files changed, 33 insertions(+), 32 deletions(-)

--- 25/arch/i386/kernel/cpu/common.c~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/arch/i386/kernel/cpu/common.c	Fri Jan  3 16:36:48 2003
@@ -484,7 +484,7 @@ void __init cpu_init (void)
 		BUG();
 	enter_lazy_tlb(&init_mm, current, cpu);
 
-	load_esp0(t, thread->esp0);
+	t->esp0 = thread->esp0;
 	set_tss_desc(cpu,t);
 	cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff;
 	load_TR_desc();
--- 25/arch/i386/kernel/process.c~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/arch/i386/kernel/process.c	Fri Jan  3 16:36:48 2003
@@ -437,7 +437,7 @@ void __switch_to(struct task_struct *pre
 	/*
 	 * Reload esp0, LDT and the page table pointer:
 	 */
-	load_esp0(tss, next->esp0);
+	tss->esp0 = next->esp0;
 
 	/*
 	 * Load the per-thread Thread-Local Storage descriptor.
--- 25/arch/i386/kernel/sysenter.c~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/arch/i386/kernel/sysenter.c	Fri Jan  3 16:36:48 2003
@@ -34,14 +34,40 @@ struct fake_sep_struct {
 	unsigned char stack[0];
 } __attribute__((aligned(8192)));
 	
+static struct fake_sep_struct *alloc_sep_thread(int cpu)
+{
+	struct fake_sep_struct *entry;
+
+	entry = (struct fake_sep_struct *) __get_free_pages(GFP_ATOMIC, 1);
+	if (!entry)
+		return NULL;
+
+	memset(entry, 0, PAGE_SIZE<<1);
+	entry->thread.task = &entry->task;
+	entry->task.thread_info = &entry->thread;
+	entry->thread.preempt_count = 1;
+	entry->thread.cpu = cpu;	
+
+	return entry;
+}
+
 static void __init enable_sep_cpu(void *info)
 {
 	int cpu = get_cpu();
-	struct tss_struct *tss = init_tss + cpu;
+	struct fake_sep_struct *sep = alloc_sep_thread(cpu);
+	unsigned long *esp0_ptr = &(init_tss + cpu)->esp0;
+	unsigned long rel32;
+
+	rel32 = (unsigned long) sysenter_entry - (unsigned long) (sep->trampoline+11);
+	
+	*(short *) (sep->trampoline+0) = 0x258b;		/* movl xxxxx,%esp */
+	*(long **) (sep->trampoline+2) = esp0_ptr;
+	*(char *)  (sep->trampoline+6) = 0xe9;			/* jmp rl32 */
+	*(long *)  (sep->trampoline+7) = rel32;
 
-	wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
-	wrmsr(MSR_IA32_SYSENTER_ESP, tss->esp0, 0);
-	wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long) sysenter_entry, 0);
+	wrmsr(0x174, __KERNEL_CS, 0);				/* SYSENTER_CS_MSR */
+	wrmsr(0x175, PAGE_SIZE*2 + (unsigned long) sep, 0);	/* SYSENTER_ESP_MSR */
+	wrmsr(0x176, (unsigned long) &sep->trampoline, 0);	/* SYSENTER_EIP_MSR */
 
 	printk("Enabling SEP on CPU %d\n", cpu);
 	put_cpu();	
--- 25/arch/i386/kernel/vm86.c~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/arch/i386/kernel/vm86.c	Fri Jan  3 16:36:48 2003
@@ -113,8 +113,7 @@ struct pt_regs * save_v86_state(struct k
 		do_exit(SIGSEGV);
 	}
 	tss = init_tss + smp_processor_id();
-	current->thread.esp0 = current->thread.saved_esp0;
-	load_esp0(tss, current->thread.esp0);
+	tss->esp0 = current->thread.esp0 = current->thread.saved_esp0;
 	current->thread.saved_esp0 = 0;
 	loadsegment(fs, current->thread.saved_fs);
 	loadsegment(gs, current->thread.saved_gs);
@@ -290,7 +289,6 @@ static void do_sys_vm86(struct kernel_vm
 
 	tss = init_tss + smp_processor_id();
 	tss->esp0 = tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0;
-	disable_sysenter();
 
 	tsk->thread.screen_bitmap = info->screen_bitmap;
 	if (info->flags & VM86_SCREEN_BITMAP)
--- 25/include/asm-i386/cpufeature.h~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/include/asm-i386/cpufeature.h	Fri Jan  3 16:36:48 2003
@@ -7,8 +7,6 @@
 #ifndef __ASM_I386_CPUFEATURE_H
 #define __ASM_I386_CPUFEATURE_H
 
-#include <linux/bitops.h>
-
 #define NCAPINTS	4	/* Currently we have 4 32-bit words worth of info */
 
 /* Intel-defined CPU features, CPUID level 0x00000001, word 0 */
@@ -77,7 +75,6 @@
 #define cpu_has_pge		boot_cpu_has(X86_FEATURE_PGE)
 #define cpu_has_sse2		boot_cpu_has(X86_FEATURE_XMM2)
 #define cpu_has_apic		boot_cpu_has(X86_FEATURE_APIC)
-#define cpu_has_sep		boot_cpu_has(X86_FEATURE_SEP)
 #define cpu_has_mtrr		boot_cpu_has(X86_FEATURE_MTRR)
 #define cpu_has_mmx		boot_cpu_has(X86_FEATURE_MMX)
 #define cpu_has_fxsr		boot_cpu_has(X86_FEATURE_FXSR)
--- 25/include/asm-i386/msr.h~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/include/asm-i386/msr.h	Fri Jan  3 16:36:48 2003
@@ -53,10 +53,6 @@
 
 #define MSR_IA32_BBL_CR_CTL		0x119
 
-#define MSR_IA32_SYSENTER_CS		0x174
-#define MSR_IA32_SYSENTER_ESP		0x175
-#define MSR_IA32_SYSENTER_EIP		0x176
-
 #define MSR_IA32_MCG_CAP		0x179
 #define MSR_IA32_MCG_STATUS		0x17a
 #define MSR_IA32_MCG_CTL		0x17b
--- 25/include/asm-i386/processor.h~bad3	Fri Jan  3 16:36:48 2003
+++ 25-akpm/include/asm-i386/processor.h	Fri Jan  3 16:36:48 2003
@@ -14,7 +14,6 @@
 #include <asm/types.h>
 #include <asm/sigcontext.h>
 #include <asm/cpufeature.h>
-#include <asm/msr.h>
 #include <linux/cache.h>
 #include <linux/config.h>
 #include <linux/threads.h>
@@ -411,21 +410,6 @@ struct thread_struct {
 	.io_bitmap	= { [ 0 ... IO_BITMAP_SIZE ] = ~0 },		\
 }
 
-static inline void load_esp0(struct tss_struct *tss, unsigned long esp0)
-{
-	tss->esp0 = esp0;
-	if (cpu_has_sep) {
-		wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
-		wrmsr(MSR_IA32_SYSENTER_ESP, esp0, 0);
-	}
-}
-
-static inline void disable_sysenter(void)
-{
-	if (cpu_has_sep)  
-		wrmsr(MSR_IA32_SYSENTER_CS, 0, 0);
-}
-
 #define start_thread(regs, new_eip, new_esp) do {		\
 	__asm__("movl %0,%%fs ; movl %0,%%gs": :"r" (0));	\
 	set_fs(USER_DS);					\

_

  parent reply	other threads:[~2003-01-05  0:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <94F20261551DC141B6B559DC4910867204491F@blr-m3-msg.wipro.com.suse.lists.linux.kernel>
     [not found] ` <3E155903.F8C22286@digeo.com.suse.lists.linux.kernel>
2003-01-03 18:40   ` [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements) Andi Kleen
2003-01-03 21:32     ` Andrew Morton
2003-01-05  1:01     ` Andrew Morton [this message]
2003-01-05  3:35       ` Linus Torvalds
2003-01-05  3:51         ` Linus Torvalds
2003-01-05  3:54         ` Andrew Morton
2003-01-05  3:52           ` Linus Torvalds
2003-01-05 10:06             ` Andi Kleen
2003-01-05 18:51               ` Linus Torvalds
2003-01-05 23:46                 ` Andi Kleen
2003-01-06  1:33                   ` Linus Torvalds
2003-01-06  2:05                     ` Andi Kleen
2003-01-06  0:58                 ` H. Peter Anvin
2003-01-05  9:18         ` Andrew Morton
2003-01-03  8:59 Aniruddha M Marathe
2003-01-03  9:33 ` Andrew Morton
2003-01-03 10:24   ` David S. Miller
2003-01-03 10:22     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E1783D0.5A47A299@digeo.com \
    --to=akpm@digeo.com \
    --cc=ak@suse.de \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox