public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86-32: A better system call mechanism
@ 2012-04-01  0:00 H. Peter Anvin
  2012-04-01  6:50 ` Willy Tarreau
  0 siblings, 1 reply; 3+ messages in thread
From: H. Peter Anvin @ 2012-04-01  0:00 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Linux Kernel Mailing List
  Cc: Arjan van de Ven, H. Peter Anvin

On x86-32, we currently use int $0x80 as the primary system call
mechanism.  Although there are some recent variants available on
certain hardware (sysenter, syscall) via the vdso, the primary system
call vector is still way up the interrupt vector table, which is
inefficient.

This patch adds a very small amount of code which permits the very
first vector to be used for system call.  That vector is #DE, divide
error, generally known as division by zero.

An example of how to use this new system call mechanism:

	.text
	.globl	_start
_start:
	movl $__NR_write, %eax
	movl $1, %ebx
	movl $str_1, %ecx
	movl $str_1_len, %edx
	aam $0

	movl $__NR_write, %eax
	movl $1, %ebx
	movl $str_2, %ecx
	movl $str_2_len, %edx
	divl %edx

	movl $__NR_exit, %eax
	xorl %ebx, %ebx
	divl %edx

	.type	_start, @function
	.size	_start, . - _start

	.section ".rodata", "a"
	.balign	128
str_1:
	.ascii "This works!\n"
str_1_len	= . - str_1

str_2:
	.ascii "This works too!\n"
str_2_len	= . - str_2

We use the shortest forms of the relevant instructions only, for
simplicity.  Why these mechanisms work is left as a trivial exercise
to the reader.

Suggested-by: Arjan van de Ven <arjan@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/entry_32.S |    5 ++++-
 arch/x86/kernel/traps.c    |    3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 7b784f4..0dfe246 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -497,7 +497,10 @@ ENDPROC(ia32_sysenter_target)
  */
 	.pushsection .kprobes.text, "ax"
 	# system call handler stub
-ENTRY(system_call)
+ENTRY(system_call_divide_error)
+	addl $2,(%esp)			# Skip past the faulting instruction
+	.globl system_call		# Don't use ENTRY because of padding
+system_call:
 	RING0_INT_FRAME			# can't unwind into user space anyway
 	pushl_cfi %eax			# save orig_eax
 	SAVE_ALL
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index ff9281f1..d5bd411 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -67,6 +67,7 @@
 #include <asm/setup.h>
 
 asmlinkage int system_call(void);
+asmlinkage int system_call_divide_error(void);
 
 /* Do we ignore FPU interrupts ? */
 char ignore_fpu_irq;
@@ -679,7 +680,7 @@ void __init trap_init(void)
 	early_iounmap(p, 4);
 #endif
 
-	set_intr_gate(X86_TRAP_DE, &divide_error);
+	set_system_trap_gate(X86_TRAP_DE, &system_call_divide_error);
 	set_intr_gate_ist(X86_TRAP_NMI, &nmi, NMI_STACK);
 	/* int4 can be called from all */
 	set_system_intr_gate(X86_TRAP_OF, &overflow);
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86-32: A better system call mechanism
  2012-04-01  0:00 [PATCH] x86-32: A better system call mechanism H. Peter Anvin
@ 2012-04-01  6:50 ` Willy Tarreau
  2012-04-01  6:52   ` Willy Tarreau
  0 siblings, 1 reply; 3+ messages in thread
From: Willy Tarreau @ 2012-04-01  6:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Linux Kernel Mailing List, Arjan van de Ven

Hi Peter,

On Sun, Apr 01, 2012 at 12:00:00AM +0000, H. Peter Anvin wrote:
> On x86-32, we currently use int $0x80 as the primary system call
> mechanism.  Although there are some recent variants available on
> certain hardware (sysenter, syscall) via the vdso, the primary system
> call vector is still way up the interrupt vector table, which is
> inefficient.
> 
> This patch adds a very small amount of code which permits the very
> first vector to be used for system call.  That vector is #DE, divide
> error, generally known as division by zero.

Looks like a clever trick, but beyond the beauty, what does it really
save ? Code size is the same as aam 0 / div edx are both 2-byte long,
just like int 0x80. Is the call less expensive ? And if so, how does
it compare to vdso ?

Thanks,
Willy


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86-32: A better system call mechanism
  2012-04-01  6:50 ` Willy Tarreau
@ 2012-04-01  6:52   ` Willy Tarreau
  0 siblings, 0 replies; 3+ messages in thread
From: Willy Tarreau @ 2012-04-01  6:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Linux Kernel Mailing List, Arjan van de Ven

On Sun, Apr 01, 2012 at 08:50:19AM +0200, Willy Tarreau wrote:
> Hi Peter,
> 
> On Sun, Apr 01, 2012 at 12:00:00AM +0000, H. Peter Anvin wrote:
> > On x86-32, we currently use int $0x80 as the primary system call
> > mechanism.  Although there are some recent variants available on
> > certain hardware (sysenter, syscall) via the vdso, the primary system
> > call vector is still way up the interrupt vector table, which is
> > inefficient.
> > 
> > This patch adds a very small amount of code which permits the very
> > first vector to be used for system call.  That vector is #DE, divide
> > error, generally known as division by zero.
> 
> Looks like a clever trick, but beyond the beauty, what does it really
> save ? Code size is the same as aam 0 / div edx are both 2-byte long,
> just like int 0x80. Is the call less expensive ? And if so, how does
> it compare to vdso ?

Hmmm I think I just found the response in the Date header, time to get
some coffee :-)

Willy


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-04-01  6:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-01  0:00 [PATCH] x86-32: A better system call mechanism H. Peter Anvin
2012-04-01  6:50 ` Willy Tarreau
2012-04-01  6:52   ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox