* Calling syscalls from x86-64 kernel results in a crash on Opteron machines
@ 2004-09-13 14:04 Constantine Gavrilov
2004-09-13 14:38 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Constantine Gavrilov @ 2004-09-13 14:04 UTC (permalink / raw)
To: bugs, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4369 bytes --]
Hello:
We have a piece of kernel code that calls some system calls in kernel
context (from a process with mm and a daemonized kernel thread that does
not have mm). This works fine on IA64 and i386 architectures.
When I try this on x86-64 kernel on Opteron machines, it results in
immediate crash. I have tried standard _syscall() macros from
asm/unistd.h. The system panics when returning from the system call.
The disassembled code shows that gcc has often a hard time deciding
which registers (32-bit or 64-bit) it will use. For example, it puts the
system call number to eax, while it should put it to rax. However, this
register thing is not a problem. I have tried my own gcc hand-crafted
inline assembly and glibc inline syscall assembly that results in
"correct" disassembled code. The result is always the same -- kernel
crash when calling a function defined by _syscall() macros or when using
an "inline" block defined by glibc macros.
Attached please find a test module that tries to call the umask() (JUST
TO DEMONSTRATE a problem) via the syscall machanism. Both methods (the
_syscall1() marco and GLIBC INLINE_SYCALL() were used.
The assembly dump of the umask() called via _syscall(1) and via
INLINE_SYSCALL() as well as the disassembly of umask() from glibc are
provided in a separate attachement. The crash dump (captured with a
serial console) is provided along with disassembly of the main module
function.
It seems that segmentation is changed during the syscall and not
restored properly, or some other REALLY BAD THING happens. The entry.S
for x86_64 architecture is very informative, but I am not an expert in
Opteron architecture and I do not know how the syscall instruction is
supposed to work.
Can someone explain the reason for the crash? Can you think of a
workaround? Comments and ideas are very welcome (except of the kind that
it can be implemented in the user space or with a help of a user proxy
process).
Thanks. Please CC to me. I am not subscribed to the lists.
Additional info:
uname -a
Linux dev83 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:32:58 EDT 2003 x86_64
x86_64 x86_64 GNU/Linux
cat /proc/cpuinfo:
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 240
stepping : 1
cpu MHz : 1394.254
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips : 2778.72
TLB size : 1088 4K pages
clflush size : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts ttp
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 240
stepping : 1
cpu MHz : 1394.254
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm
3dnowext 3dnow
bogomips : 2785.28
TLB size : 1088 4K pages
clflush size : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts ttp
cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 6137491456 84901888 6052589568 0 12132352 22863872
Swap: 1048698880 0 1048698880
MemTotal: 5993644 kB
MemFree: 5910732 kB
MemShared: 0 kB
Buffers: 11848 kB
Cached: 22328 kB
SwapCached: 0 kB
Active: 37588 kB
ActiveAnon: 7232 kB
ActiveCache: 30356 kB
Inact_dirty: 3816 kB
Inact_laundry: 0 kB
Inact_clean: 0 kB
Inact_target: 8280 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 5993644 kB
LowFree: 5910732 kB
SwapTotal: 1024120 kB
SwapFree: 1024120 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
[-- Attachment #2: syscall_test.c --]
[-- Type: text/plain, Size: 1019 bytes --]
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
static long errno;
MODULE_AUTHOR("Constantine Gavrilov");
MODULE_DESCRIPTION("Simple test for syscall interface");
MODULE_LICENSE("GPL");
#ifdef CONFIG_X86_64
#include "gsyscall.h"
static long wrapper_umask (mode_t mask)
{
long res = INLINE_SYSCALL(umask, 1, mask);
return res;
}
#endif
static _syscall1(long, umask, int, mode);
static int __init syscall_test_init(void)
{
long res;
printk(KERN_INFO "syscall_test: via syscall macro\n");
res=umask(0666);
printk(KERN_INFO "syscall_test: via syscall macro -- result is %ld\n", res);
#ifdef CONFIG_X86_64
printk(KERN_INFO "syscall_test: via INLINE_SYSCALL\n");
res=wrapper_umask(0666);
printk(KERN_INFO "syscall_test: via INLINE_SYSCALL -- result is %ld\n", res);
#endif
return 0;
}
static void __exit syscall_test_exit(void)
{
return;
}
module_init(syscall_test_init);
module_exit(syscall_test_exit);
[-- Attachment #3: syscall_dumps --]
[-- Type: text/plain, Size: 2786 bytes --]
_syscall1() dump:
Dump of assembler code for function umask:
0x0000000000000030 <umask+0>: mov $0x5f,%eax
0x0000000000000035 <umask+5>: movslq %edi,%rdi
0x0000000000000038 <umask+8>: syscall
0x000000000000003a <umask+10>: cmp $0xffffffffffffff80,%rax
0x000000000000003e <umask+14>: jbe 0x51 <umask+33>
0x0000000000000040 <umask+16>: neg %rax
0x0000000000000043 <umask+19>: mov %rax,0(%rip) # 0x4a <umask+26>
0x000000000000004a <umask+26>: mov $0xffffffffffffffff,%rax
0x0000000000000051 <umask+33>: retq
0x0000000000000052 <umask+34>: data16
0x0000000000000053 <umask+35>: data16
0x0000000000000054 <umask+36>: data16
0x0000000000000055 <umask+37>: nop
0x0000000000000056 <umask+38>: data16
0x0000000000000057 <umask+39>: data16
0x0000000000000058 <umask+40>: data16
0x0000000000000059 <umask+41>: nop
0x000000000000005a <umask+42>: data16
0x000000000000005b <umask+43>: data16
0x000000000000005c <umask+44>: nop
0x000000000000005d <umask+45>: data16
0x000000000000005e <umask+46>: data16
0x000000000000005f <umask+47>: nop
INLINE_SYCALL() dump:
Dump of assembler code for function wrapper_umask:
0x0000000000000000 <wrapper_umask+0>: mov %edi,%edi
0x0000000000000002 <wrapper_umask+2>: mov $0x5f,%rax
0x0000000000000009 <wrapper_umask+9>: syscall
0x000000000000000b <wrapper_umask+11>: cmp $0xfffffffffffff000,%rax
0x0000000000000011 <wrapper_umask+17>: jbe 0x24 <wrapper_umask+36>
0x0000000000000013 <wrapper_umask+19>: neg %rax
0x0000000000000016 <wrapper_umask+22>: mov %rax,0(%rip) # 0x1d <wrapper_umask+29>
0x000000000000001d <wrapper_umask+29>: mov $0xffffffffffffffff,%rax
0x0000000000000024 <wrapper_umask+36>: retq
0x0000000000000025 <wrapper_umask+37>: data16
0x0000000000000026 <wrapper_umask+38>: data16
0x0000000000000027 <wrapper_umask+39>: data16
0x0000000000000028 <wrapper_umask+40>: nop
0x0000000000000029 <wrapper_umask+41>: data16
0x000000000000002a <wrapper_umask+42>: data16
0x000000000000002b <wrapper_umask+43>: data16
0x000000000000002c <wrapper_umask+44>: nop
0x000000000000002d <wrapper_umask+45>: data16
0x000000000000002e <wrapper_umask+46>: data16
0x000000000000002f <wrapper_umask+47>: nop
Disassemble of umask() from a statically linked prog:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main()
{
umask(666);
return 0;
}
Dump of assembler code for function umask:
0x0000000000406a20 <umask+0>: mov $0x5f,%rax
0x0000000000406a27 <umask+7>: syscall
0x0000000000406a29 <umask+9>: retq
0x0000000000406a2a <umask+10>: nop
0x0000000000406a2b <umask+11>: nop
0x0000000000406a2c <umask+12>: nop
0x0000000000406a2d <umask+13>: nop
0x0000000000406a2e <umask+14>: nop
0x0000000000406a2f <umask+15>: nop
[-- Attachment #4: syscall_crash --]
[-- Type: text/plain, Size: 3885 bytes --]
page_fault: wrong gs 0 expected ffffffff805fb4c0
Unable to handle kernel NULL pointer dereference at virtual address 000000000000
0008
printing rip:
ffffffff80110053
PML4 17becb067 PGD 17bec5067 PMD 0
Oops: 0002
CPU 0
Pid: 2218, comm: insmod Not tainted
RIP: 0010:[<ffffffff80110053>]{system_call+3}
RSP: 0018:000001017becde30 EFLAGS: 00010012
RAX: 000000000000005f RBX: ffffffff8040ed20 RCX: ffffffffa00810fa
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000001b6
RBP: ffffffffa0081000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000212 R12: 0000000000554030
R13: 00000000000000b8 R14: 000000000000000c R15: 000001017c504740
FS: 0000002a9557d4c0(0000) GS:ffffffff805fb4c0(0000) knlGS:ffffffff805fb4c0
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000101000 CR4: 00000000000006e0
Call Trace: [<ffffffffa008113c>]{:syscall_test:syscall_test_init+28}
[<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00810b8>]
[<ffffffff801100c7>]{system_call+119}
Process insmod (pid: 2218, stackpage=1017becd000)
Stack: 000001017becde30 0000000000000018 ffffffffa008113c 0000000000554030
ffffffff801256b6 000001017be78000 000001017e893680 000001017e893640
00000000005542c6 000001017be78000 000001017be7a000 ffffff00000e5000
0000000000000246 00000000000000b8 ffffffffa007c000 ffffffffa00810b8
00000000000006a8 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000002a958aa6c0 0000000000000000 00000000005539d0 00000000005514e0
Call Trace: [<ffffffffa008113c>]{:syscall_test:syscall_test_init+28}
[<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00810b8>]
[<ffffffff801100c7>]{system_call+119}
Code: 65 48 89 24 25 08 00 00 00 65 48 8b 24 25 00 00 00 00 fb 48
Kernel panic: Fatal exception
=================================
syscall_test_init() dump:
0x0000000000000060 <syscall_test_init+0>: sub $0x8,%rsp
0x0000000000000064 <syscall_test_init+4>: mov $0x0,%rdi
0x000000000000006b <syscall_test_init+11>: xor %eax,%eax
0x000000000000006d <syscall_test_init+13>: callq 0x72 <syscall_test_init+18>
0x0000000000000072 <syscall_test_init+18>: mov $0x1b6,%edi
0x0000000000000077 <syscall_test_init+23>: callq 0x30 <umask>
0x000000000000007c <syscall_test_init+28>: mov $0x0,%rdi
0x0000000000000083 <syscall_test_init+35>: mov %rax,%rsi
0x0000000000000086 <syscall_test_init+38>: xor %eax,%eax
0x0000000000000088 <syscall_test_init+40>: callq 0x8d <syscall_test_init+45>
0x000000000000008d <syscall_test_init+45>: mov $0x0,%rdi
0x0000000000000094 <syscall_test_init+52>: xor %eax,%eax
0x0000000000000096 <syscall_test_init+54>: callq 0x9b <syscall_test_init+59>
0x000000000000009b <syscall_test_init+59>: mov $0x1b6,%edi
0x00000000000000a0 <syscall_test_init+64>: callq 0x0 <wrapper_umask>
0x00000000000000a5 <syscall_test_init+69>: mov $0x0,%rdi
0x00000000000000ac <syscall_test_init+76>: mov %rax,%rsi
0x00000000000000af <syscall_test_init+79>: xor %eax,%eax
0x00000000000000b1 <syscall_test_init+81>: callq 0xb6 <syscall_test_init+86>
0x00000000000000b6 <syscall_test_init+86>: xor %eax,%eax
0x00000000000000b8 <syscall_test_init+88>: add $0x8,%rsp
0x00000000000000bc <syscall_test_init+92>: retq
0x00000000000000bd <syscall_test_init+93>: data16
0x00000000000000be <syscall_test_init+94>: data16
0x00000000000000bf <syscall_test_init+95>: nop
[-- Attachment #5: inline_crash --]
[-- Type: text/plain, Size: 3926 bytes --]
The block that tests the call to umask via syscall
was commented out in this case and the module was
recompiled.
==================================================
page_fault: wrong gs 0 expected ffffffff805fb540
Unable to handle kernel NULL pointer dereference at virtual address 000000000000
0008
printing rip:
ffffffff80110053
PML4 17bb83067 PGD 17bb7d067 PMD 0
Oops: 0002
CPU 1
Pid: 2218, comm: insmod Not tainted
RIP: 0010:[<ffffffff80110053>]{system_call+3}
RSP: 0018:000001017bb85e30 EFLAGS: 00010012
RAX: 000000000000005f RBX: ffffffff8040ed20 RCX: ffffffffa00810cb
RDX: 0000000001000000 RSI: 0000000000000000 RDI: 00000000000001b6
RBP: ffffffffa0081000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000212 R12: 0000000000553f30
R13: 00000000000000b8 R14: 000000000000000c R15: 000001017e95b3c0
FS: 0000002a9557d4c0(0000) GS:ffffffff805fb540(0000) knlGS:ffffffff805fb540
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000018216000 CR4: 00000000000006e0
Call Trace: [<ffffffffa008113c>]{:syscall_test:syscall_test_init+28}
[<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00810b8>]
[<ffffffff801100c7>]{system_call+119}
Process insmod (pid: 2218, stackpage=1017bb85000)
Stack: 000001017bb85e30 0000000000000018 ffffffffa008113c 0000000000553f30
ffffffff801256b6 000001017bb30000 000001017e99b440 000001017e99b400
0000000000554126 000001017bb30000 000001017bb32000 ffffff00000e5000
0000000000000246 00000000000000b8 ffffffffa007c000 ffffffffa00810b8
0000000000000608 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000002a958aa6c0 0000000000000000 00000000005538d0 00000000005514e0
Call Trace: [<ffffffffa008113c>]{:syscall_test:syscall_test_init+28}
[<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00810b8>]
[<ffffffff801100c7>]{system_call+119}
Code: 65 48 89 24 25 08 00 00 00 65 48 8b 24 25 00 00 00 00 fb 48
Kernel panic: Fatal exception
=========================================
syscall_test_init() dump:
0x0000000000000060 <syscall_test_init+0>: sub $0x8,%rsp
0x0000000000000064 <syscall_test_init+4>: mov $0x0,%rdi
0x000000000000006b <syscall_test_init+11>: xor %eax,%eax
0x000000000000006d <syscall_test_init+13>: callq 0x72 <syscall_test_init+18>
0x0000000000000072 <syscall_test_init+18>: mov $0x1b6,%edi
0x0000000000000077 <syscall_test_init+23>: callq 0x0 <wrapper_umask>
0x000000000000007c <syscall_test_init+28>: mov $0x0,%rdi
0x0000000000000083 <syscall_test_init+35>: mov %rax,%rsi
0x0000000000000086 <syscall_test_init+38>: xor %eax,%eax
0x0000000000000088 <syscall_test_init+40>: callq 0x8d <syscall_test_init+45>
0x000000000000008d <syscall_test_init+45>: xor %eax,%eax
0x000000000000008f <syscall_test_init+47>: add $0x8,%rsp
0x0000000000000093 <syscall_test_init+51>: retq
0x0000000000000094 <syscall_test_init+52>: data16
0x0000000000000095 <syscall_test_init+53>: data16
0x0000000000000096 <syscall_test_init+54>: data16
0x0000000000000097 <syscall_test_init+55>: nop
0x0000000000000098 <syscall_test_init+56>: data16
0x0000000000000099 <syscall_test_init+57>: data16
0x000000000000009a <syscall_test_init+58>: data16
0x000000000000009b <syscall_test_init+59>: nop
0x000000000000009c <syscall_test_init+60>: data16
0x000000000000009d <syscall_test_init+61>: data16
0x000000000000009e <syscall_test_init+62>: data16
0x000000000000009f <syscall_test_init+63>: nop
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 14:04 Calling syscalls from x86-64 kernel results in a crash on Opteron machines Constantine Gavrilov
@ 2004-09-13 14:38 ` Christoph Hellwig
2004-09-13 15:05 ` Constantine Gavrilov
2004-09-13 14:44 ` Arnd Bergmann
2004-09-13 15:00 ` Brian Gerst
2 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2004-09-13 14:38 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: bugs, linux-kernel
On Mon, Sep 13, 2004 at 05:04:17PM +0300, Constantine Gavrilov wrote:
> Hello:
>
> We have a piece of kernel code that calls some system calls in kernel
> context (
Which you shouldn't do in the first place.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 14:38 ` Christoph Hellwig
@ 2004-09-13 15:05 ` Constantine Gavrilov
2004-09-13 16:17 ` Andrea Arcangeli
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Constantine Gavrilov @ 2004-09-13 15:05 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: bugs, linux-kernel
Christoph Hellwig wrote:
>On Mon, Sep 13, 2004 at 05:04:17PM +0300, Constantine Gavrilov wrote:
>
>
>>Hello:
>>
>>We have a piece of kernel code that calls some system calls in kernel
>>context (
>>
>>
>
>Which you shouldn't do in the first place.
>
>
Function kernel_thread() on i386 is implemented by putting the args to
appropriate regs and calling int 0x80, resulting in a system call
clone() on i386.
I have also found the "syscall" instruction in x86-64 kernel specific
code (it does not call _syscall() macros directly, though). So,
"shouldn't do" is a bit too strong.
What I am writing is an application, and not interface. As such, it is
not much different from its requierements from a user-space application.
If user-space application may call system calls, why a kernel space
application cannot?
And BTW, kernel-space applications have their own place even if the
concept seems foreign to you.
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 15:05 ` Constantine Gavrilov
@ 2004-09-13 16:17 ` Andrea Arcangeli
2004-09-13 16:41 ` Stephen Hemminger
2004-09-13 16:42 ` Greg KH
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Andrea Arcangeli @ 2004-09-13 16:17 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: Christoph Hellwig, bugs, linux-kernel
Hi Constantine,
On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
I avoided to do like i386 that inefficiently calls int 0x80 when you can
call sys_read/sys_write etc.. by hand.
the syscall is only meaningful if you're not in kernel space. Once
you're in kernel space if you ever try to invoke a syscall again (either
via int 0x80, syscall, sysenter, call gate, whatever) then you're just
going slower than you should for no good reason.
The only point of calling int 0x80 and friends is to change mode from
user space to kernel space, and you're in kernel space already so you
should just call sys_read/sys_write etc.. by hand which will not waste
precious cycles and it'll be a lot simpler too.
Note also that int 0x80 will bring you into the 32bit emulation layer,
the only 64bit entry point is reacheable only via syscall.
Hope this helps.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 16:17 ` Andrea Arcangeli
@ 2004-09-13 16:41 ` Stephen Hemminger
2004-09-13 20:08 ` Andrea Arcangeli
0 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2004-09-13 16:41 UTC (permalink / raw)
To: linux-kernel
On Mon, 13 Sep 2004 18:17:36 +0200
Andrea Arcangeli <andrea@novell.com> wrote:
> Hi Constantine,
>
> On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> > And BTW, kernel-space applications have their own place even if the
> > concept seems foreign to you.
>
> I avoided to do like i386 that inefficiently calls int 0x80 when you can
> call sys_read/sys_write etc.. by hand.
>
> the syscall is only meaningful if you're not in kernel space. Once
> you're in kernel space if you ever try to invoke a syscall again (either
> via int 0x80, syscall, sysenter, call gate, whatever) then you're just
> going slower than you should for no good reason.
>
> The only point of calling int 0x80 and friends is to change mode from
> user space to kernel space, and you're in kernel space already so you
> should just call sys_read/sys_write etc.. by hand which will not waste
> precious cycles and it'll be a lot simpler too.
>
> Note also that int 0x80 will bring you into the 32bit emulation layer,
> the only 64bit entry point is reacheable only via syscall.
>
> Hope this helps.
Actually, the fact that system calls work in kernel space I would consider
a BUG. The int 0x80 handler should oops or at least kill the offending
thread for security and robustness reasons.
--
Stephen Hemminger mailto:shemminger@osdl.org
Open Source Development Lab http://developer.osdl.org/shemminger
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 16:41 ` Stephen Hemminger
@ 2004-09-13 20:08 ` Andrea Arcangeli
0 siblings, 0 replies; 16+ messages in thread
From: Andrea Arcangeli @ 2004-09-13 20:08 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: linux-kernel
On Mon, Sep 13, 2004 at 09:41:48AM -0700, Stephen Hemminger wrote:
> Actually, the fact that system calls work in kernel space I would consider
> a BUG. The int 0x80 handler should oops or at least kill the offending
> thread for security and robustness reasons.
kernel_thread is using int 0x80 in x86, and yes, that should better
implemented without it (like we did in x86-64).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 15:05 ` Constantine Gavrilov
2004-09-13 16:17 ` Andrea Arcangeli
@ 2004-09-13 16:42 ` Greg KH
2004-09-13 17:21 ` Brian Gerst
2004-09-14 2:04 ` William Lee Irwin III
3 siblings, 0 replies; 16+ messages in thread
From: Greg KH @ 2004-09-13 16:42 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: Christoph Hellwig, bugs, linux-kernel
On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> What I am writing is an application, and not interface. As such, it is
> not much different from its requierements from a user-space application.
> If user-space application may call system calls, why a kernel space
> application cannot?
>
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
What kind of application is this?
And do you have a link to your source code available?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 15:05 ` Constantine Gavrilov
2004-09-13 16:17 ` Andrea Arcangeli
2004-09-13 16:42 ` Greg KH
@ 2004-09-13 17:21 ` Brian Gerst
2004-09-14 2:04 ` William Lee Irwin III
3 siblings, 0 replies; 16+ messages in thread
From: Brian Gerst @ 2004-09-13 17:21 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: Christoph Hellwig, bugs, linux-kernel
Constantine Gavrilov wrote:
> Christoph Hellwig wrote:
>
>> On Mon, Sep 13, 2004 at 05:04:17PM +0300, Constantine Gavrilov wrote:
>>
>>
>>> Hello:
>>>
>>> We have a piece of kernel code that calls some system calls in kernel
>>> context (
>>>
>>
>>
>> Which you shouldn't do in the first place.
>>
>>
>
> Function kernel_thread() on i386 is implemented by putting the args to
> appropriate regs and calling int 0x80, resulting in a system call
> clone() on i386.
It's gone in 2.6, in favor of calling do_fork() directly.
> I have also found the "syscall" instruction in x86-64 kernel specific
> code (it does not call _syscall() macros directly, though). So,
> "shouldn't do" is a bit too strong.
>
> What I am writing is an application, and not interface. As such, it is
> not much different from its requierements from a user-space application.
> If user-space application may call system calls, why a kernel space
> application cannot?
>
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
What are you trying to do that can't be done in user space? The only
possible reason for a kernel space app is for performance (like knfsd),
at the cost of risking system stability and security.
--
Brian Gerst
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 15:05 ` Constantine Gavrilov
` (2 preceding siblings ...)
2004-09-13 17:21 ` Brian Gerst
@ 2004-09-14 2:04 ` William Lee Irwin III
3 siblings, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2004-09-14 2:04 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: Christoph Hellwig, bugs, linux-kernel
Christoph Hellwig wrote:
>> Which you shouldn't do in the first place.
On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> Function kernel_thread() on i386 is implemented by putting the args to
> appropriate regs and calling int 0x80, resulting in a system call
> clone() on i386.
> I have also found the "syscall" instruction in x86-64 kernel specific
> code (it does not call _syscall() macros directly, though). So,
> "shouldn't do" is a bit too strong.
> What I am writing is an application, and not interface. As such, it is
> not much different from its requierements from a user-space application.
> If user-space application may call system calls, why a kernel space
> application cannot?
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
This is not something we particularly endorse, but when making syscalls
the function calls sys_foo() suffice. Also, ia32 does not use syscall
traps for kernel_thread() in current 2.6.x
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 14:04 Calling syscalls from x86-64 kernel results in a crash on Opteron machines Constantine Gavrilov
2004-09-13 14:38 ` Christoph Hellwig
@ 2004-09-13 14:44 ` Arnd Bergmann
2004-09-13 15:18 ` Constantine Gavrilov
2004-09-13 15:00 ` Brian Gerst
2 siblings, 1 reply; 16+ messages in thread
From: Arnd Bergmann @ 2004-09-13 14:44 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: bugs, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 881 bytes --]
On Montag, 13. September 2004 16:04, Constantine Gavrilov wrote:
> We have a piece of kernel code that calls some system calls in kernel
> context (from a process with mm and a daemonized kernel thread that does
> not have mm). This works fine on IA64 and i386 architectures.
You can find the list of system calls that are supposed to work
from kernel space in asm/unistd.h inside #ifdef __KERNEL__SYSCALLS__.
On current kernels, that list only contains execve(), which should
be avoided as well in favor of call_usermodehelper. Other calls
might work on some architectures but that is not a supported
interface any more.
You could call the sys_* functions directly if they are exported,
but it is unlikely that such code gets integrated in the mainline
kernel.
The real answer for your problem highly depends on which syscalls
you want to use.
Arnd <><
[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 14:44 ` Arnd Bergmann
@ 2004-09-13 15:18 ` Constantine Gavrilov
2004-09-13 19:39 ` H. Peter Anvin
0 siblings, 1 reply; 16+ messages in thread
From: Constantine Gavrilov @ 2004-09-13 15:18 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: bugs, linux-kernel
Arnd Bergmann wrote:
>On Montag, 13. September 2004 16:04, Constantine Gavrilov wrote:
>
>
>>We have a piece of kernel code that calls some system calls in kernel
>>context (from a process with mm and a daemonized kernel thread that does
>>not have mm). This works fine on IA64 and i386 architectures.
>>
>>
>
>You can find the list of system calls that are supposed to work
>from kernel space in asm/unistd.h inside #ifdef __KERNEL__SYSCALLS__.
>On current kernels, that list only contains execve(), which should
>be avoided as well in favor of call_usermodehelper. Other calls
>might work on some architectures but that is not a supported
>interface any more.
>
>You could call the sys_* functions directly if they are exported,
>but it is unlikely that such code gets integrated in the mainline
>kernel.
>
>The real answer for your problem highly depends on which syscalls
>you want to use.
>
> Arnd <><
>
>
I can implement differently what I want, though it will be somewhat
kludgy and kernel depenedent (depends on a version and distribution). I
wanted to avoid that. Since what I write is really an application and
not interface, it was very "native" to use application syscall approach.
My real problem is not how to implement it. I want to understand this
specific x86_64 problem.
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 15:18 ` Constantine Gavrilov
@ 2004-09-13 19:39 ` H. Peter Anvin
0 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2004-09-13 19:39 UTC (permalink / raw)
To: linux-kernel
Followup to: <4145BA28.5020702@qlusters.com>
By author: Constantine Gavrilov <constg@qlusters.com>
In newsgroup: linux.dev.kernel
>
> I can implement differently what I want, though it will be somewhat
> kludgy and kernel depenedent (depends on a version and distribution). I
> wanted to avoid that. Since what I write is really an application and
> not interface, it was very "native" to use application syscall approach.
>
> My real problem is not how to implement it. I want to understand this
> specific x86_64 problem.
>
Put it in userspace. Really.
-hpa
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 14:04 Calling syscalls from x86-64 kernel results in a crash on Opteron machines Constantine Gavrilov
2004-09-13 14:38 ` Christoph Hellwig
2004-09-13 14:44 ` Arnd Bergmann
@ 2004-09-13 15:00 ` Brian Gerst
2004-09-13 15:26 ` Constantine Gavrilov
2 siblings, 1 reply; 16+ messages in thread
From: Brian Gerst @ 2004-09-13 15:00 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: linux-kernel
Constantine Gavrilov wrote:
> Hello:
>
> We have a piece of kernel code that calls some system calls in kernel
> context (from a process with mm and a daemonized kernel thread that does
> not have mm). This works fine on IA64 and i386 architectures.
>
> When I try this on x86-64 kernel on Opteron machines, it results in
> immediate crash. I have tried standard _syscall() macros from
> asm/unistd.h. The system panics when returning from the system call.
> The disassembled code shows that gcc has often a hard time deciding
> which registers (32-bit or 64-bit) it will use. For example, it puts the
> system call number to eax, while it should put it to rax. However, this
> register thing is not a problem. I have tried my own gcc hand-crafted
> inline assembly and glibc inline syscall assembly that results in
> "correct" disassembled code. The result is always the same -- kernel
> crash when calling a function defined by _syscall() macros or when using
> an "inline" block defined by glibc macros.
>
> Attached please find a test module that tries to call the umask() (JUST
> TO DEMONSTRATE a problem) via the syscall machanism. Both methods (the
> _syscall1() marco and GLIBC INLINE_SYCALL() were used.
>
> The assembly dump of the umask() called via _syscall(1) and via
> INLINE_SYSCALL() as well as the disassembly of umask() from glibc are
> provided in a separate attachement. The crash dump (captured with a
> serial console) is provided along with disassembly of the main module
> function.
>
> It seems that segmentation is changed during the syscall and not
> restored properly, or some other REALLY BAD THING happens. The entry.S
> for x86_64 architecture is very informative, but I am not an expert in
> Opteron architecture and I do not know how the syscall instruction is
> supposed to work.
>
> Can someone explain the reason for the crash? Can you think of a
> workaround? Comments and ideas are very welcome (except of the kind that
> it can be implemented in the user space or with a help of a user proxy
> process).
You should never use the unistd.h macros from kernel space. Call
sys_foo() directly. This may mean you have to export it. The reason it
crashes is that the "syscall" opcode used by the x86-64 macros (unlike
the "int $0x80" for i386) causes a fault when already running in kernel
space.
--
Brian Gerst
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 15:00 ` Brian Gerst
@ 2004-09-13 15:26 ` Constantine Gavrilov
0 siblings, 0 replies; 16+ messages in thread
From: Constantine Gavrilov @ 2004-09-13 15:26 UTC (permalink / raw)
To: Brian Gerst; +Cc: linux-kernel
Brian Gerst wrote:
>
> You should never use the unistd.h macros from kernel space. Call
> sys_foo() directly. This may mean you have to export it. The reason
> it crashes is that the "syscall" opcode used by the x86-64 macros
> (unlike the "int $0x80" for i386) causes a fault when already running
> in kernel space.
>
> --
> Brian Gerst
I can see from the crash report that the fault happens. I want to
understand why.
I can use workarounds. (Calling sys_foo() directly from module can be a
problem -- I would have to know the "versioned" function name or the
address of the function within the kernel space. Calling an entry from
the syscall table is much easier.)
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <2DZQy-7TB-7@gated-at.bofh.it>]
* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
[not found] <2DZQy-7TB-7@gated-at.bofh.it>
@ 2004-09-13 14:31 ` Andi Kleen
2004-09-13 15:28 ` Constantine Gavrilov
0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2004-09-13 14:31 UTC (permalink / raw)
To: Constantine Gavrilov; +Cc: linux-kernel
Constantine Gavrilov <constg@qlusters.com> writes:
> Can someone explain the reason for the crash? Can you think of a
syscall/sysret don't support recursive calls. That's the price for
being fast.
> workaround? Comments and ideas are very welcome (except of the kind
Just call the appropiate sys_* function directly instead.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Calling syscalls from x86-64 kernel results in a crash on Opteron machines
2004-09-13 14:31 ` Andi Kleen
@ 2004-09-13 15:28 ` Constantine Gavrilov
0 siblings, 0 replies; 16+ messages in thread
From: Constantine Gavrilov @ 2004-09-13 15:28 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
Andi Kleen wrote:
>Constantine Gavrilov <constg@qlusters.com> writes:
>
>
>
>>Can someone explain the reason for the crash? Can you think of a
>>
>>
>
>syscall/sysret don't support recursive calls. That's the price for
>being fast.
>
I do not think recursive calls are used here. Do I miss something?
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2004-09-14 2:07 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-13 14:04 Calling syscalls from x86-64 kernel results in a crash on Opteron machines Constantine Gavrilov
2004-09-13 14:38 ` Christoph Hellwig
2004-09-13 15:05 ` Constantine Gavrilov
2004-09-13 16:17 ` Andrea Arcangeli
2004-09-13 16:41 ` Stephen Hemminger
2004-09-13 20:08 ` Andrea Arcangeli
2004-09-13 16:42 ` Greg KH
2004-09-13 17:21 ` Brian Gerst
2004-09-14 2:04 ` William Lee Irwin III
2004-09-13 14:44 ` Arnd Bergmann
2004-09-13 15:18 ` Constantine Gavrilov
2004-09-13 19:39 ` H. Peter Anvin
2004-09-13 15:00 ` Brian Gerst
2004-09-13 15:26 ` Constantine Gavrilov
[not found] <2DZQy-7TB-7@gated-at.bofh.it>
2004-09-13 14:31 ` Andi Kleen
2004-09-13 15:28 ` Constantine Gavrilov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox