* Various x86 syscall mechanisms
@ 2008-06-20 22:00 Jeremy Fitzhardinge
2008-06-20 23:39 ` Roland McGrath
2008-06-21 0:27 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-20 22:00 UTC (permalink / raw)
To: Roland McGrath; +Cc: Linux Kernel Mailing List
Hi Roland,
As far as I can work out, an x86_32 kernel will use "int 0x80" and
"sysenter" for system calls. 64-bit kernel will use just "syscall" for
64-bit processes (though you can use "int 0x80" to access the 32-bit
syscall interface from a 64-bit process), but will allow "sysenter",
"syscall" or "int 0x80" for 32-on-64 processes.
Why does 32-on-64 implement 32-bit syscall when native 32-bit doesn't
seem to? Or am I overlooking something here? Does 32-bit also support
syscall?
Thanks,
J
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Various x86 syscall mechanisms 2008-06-20 22:00 Various x86 syscall mechanisms Jeremy Fitzhardinge @ 2008-06-20 23:39 ` Roland McGrath 2008-06-27 21:45 ` Jeremy Fitzhardinge ` (2 more replies) 2008-06-21 0:27 ` H. Peter Anvin 1 sibling, 3 replies; 11+ messages in thread From: Roland McGrath @ 2008-06-20 23:39 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: Linux Kernel Mailing List > As far as I can work out, an x86_32 kernel will use "int 0x80" and > "sysenter" for system calls. 64-bit kernel will use just "syscall" for > 64-bit processes (though you can use "int 0x80" to access the 32-bit > syscall interface from a 64-bit process), but will allow "sysenter", > "syscall" or "int 0x80" for 32-on-64 processes. That is correct, with the caveats below. > Why does 32-on-64 implement 32-bit syscall when native 32-bit doesn't > seem to? Or am I overlooking something here? Does 32-bit also support > syscall? I think it is clearest to talk separately about the "intended ABI", the "what actually works today", and the "why". (Also note I was not the decision-maker in this, just picking up what I can see.) First and simplest, the 64-bit ABI. AFAIK the intended ABI has always been the "syscall" instruction for 64-bit syscalls and "int $0x80" for 32-bit syscalls made from 64-bit tasks on CONFIG_IA32_EMULATION kernels (intended for valgrind). For 64-bit processes, that's all there is meant to be and that's all there is to do. For the 32-bit ABI, what I believe was always the intent for what could be considered the proper ABI is "int 0x80" or "use the vDSO entry point". If someone asked me what you could ever have expected to rely on for the future, I would say exactly that. The use of the vDSO is explicitly intended to take the details of sysenter/syscall or other such new instructions out of the 32-bit ABI picture for what any proper application will expect from the kernel. As to what works, "int 0x80" of course works the same everywhere. In 32-bit kernels, the vDSO uses "sysenter" when the hardware supports it. By the nature of "sysenter", it really cannot "allow sysenter" in a generic sense--it enables entry via "sysenter" when the hardware supports it, but it always returns to the specific PC address where it mapped the vDSO. 32-bit kernels never support using "syscall". In 64-bit kernels, the 32-bit vDSO uses "sysenter" when the hardware vendor is Intel or Centaur, and "syscall" otherwise (never "int 0x80", though that still works outside the vDSO). All 64-bit kernels enable support for both 32-bit "sysenter" and 32-bit "syscall" via their respective MSRs. (The vDSO selection is based on what we think the hardware actually supports.) As to why, here is what I've pieced together. The intent of the choices in the kernel's selection of the vDSO has always been "whatever is fastest on this hardware". I have never myself been involved in any measuring or comparison of the various methods, so I can't speak to the actual choices made or how much attention was really paid. The "syscall"/"sysret" instruction interface (AMD's invention) is superior to "sysenter"/"sysexit" (Intel's invention). It was always part of the x86_64 interface, since AMD got there first. So all processors support 64-bit user tasks using "syscall". It's good and even if the privileged CPU details changed, keeping "syscall" as the user instruction will be fine. AMD's were the first x86_64 CPUs, and those always supported "syscall" from 32-bit tasks to 64-bit kernels. (I don't know whether AMD CPUs now support "sysenter" from 32-bit tasks to 64-bit kernels, and if so which past AMD64 CPUs may not have supported that. On today's kernel you could easily test it by hacking use_sysenter=1 into syscall32_cpu_init and trying that kernel on an AMD64 CPU. I wouldn't be surprised if it does work on all cpu_has(X86_FEATURE_SEP) CPUs from AMD too.) Intel CPUs do not support "syscall" from 32-bit tasks at all (as per their documentation), but do support "sysenter" from 32-bit tasks to 64-bit kernels. I'm not aware of there having been any Intel x86_64 CPU that did not support "sysenter" this way. Using "syscall" when it works kind of looks preferable across the board because the interface is better. I assume that if AMD's x86_64 CPUs do support 32->64 "sysenter" too, that "syscall" performs at least as well. I assume that if Intel or other vendors added 32->64 "syscall" support, they would not add it unless they were making it the optimal path. For 32-bit kernels, we assume that whenever "sysenter" is available, it's at least preferable to "int 0x80". I don't know the order of AMD's introduction of "syscall" on 32-bit CPUs and Intel's introduction of "sysenter", but Linux only ever got a vsyscall using "sysenter". It was long on my back-burner list to toss in the "syscall" version of the 32-bit vDSO for 32-bit kernels on hardware that supports "syscall". But, several recent generations of AMD CPUs do support "sysenter" for 32-bit kernels, and I haven't myself had on hand for easy kernel hacking one of the AMD CPUs that supported "syscall" but not "sysenter". Nowadays, more and more people can (and should) run a 64-bit kernel anyway. So it hasn't seemed worth the trouble. (If AMD is today making CPUs where for 32-bit kernels "sysenter" performs much worse than "syscall", then perhaps it is worth the effort if using 32-bit kernels is the fastest thing for someone.) Thanks, Roland ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-20 23:39 ` Roland McGrath @ 2008-06-27 21:45 ` Jeremy Fitzhardinge 2008-06-27 21:52 ` Roland McGrath 2008-06-28 5:00 ` Andi Kleen 2008-06-30 0:07 ` Bill Davidsen 2 siblings, 1 reply; 11+ messages in thread From: Jeremy Fitzhardinge @ 2008-06-27 21:45 UTC (permalink / raw) To: Roland McGrath; +Cc: Linux Kernel Mailing List Roland McGrath wrote: > I think it is clearest to talk separately about the "intended ABI", the > "what actually works today", and the "why". (Also note I was not the > decision-maker in this, just picking up what I can see.) > > First and simplest, the 64-bit ABI. AFAIK the intended ABI has always been > the "syscall" instruction for 64-bit syscalls and "int $0x80" for 32-bit > syscalls made from 64-bit tasks on CONFIG_IA32_EMULATION kernels (intended > for valgrind). Hm, I think that's a post-facto rationalization. At one point I noticed that int 0x80 always does 32-bit syscalls and considered doing a 32-on-64 Valgrind variant. But I never did that, and I don't believe anyone else has (but it's been a while since I've been closely involved in Valgrind development, so I could be wrong). > For 64-bit processes, that's all there is meant to be and > that's all there is to do. > > For the 32-bit ABI, what I believe was always the intent for what could be > considered the proper ABI is "int 0x80" or "use the vDSO entry point". If > someone asked me what you could ever have expected to rely on for the > future, I would say exactly that. The use of the vDSO is explicitly > intended to take the details of sysenter/syscall or other such new > instructions out of the 32-bit ABI picture for what any proper application > will expect from the kernel. > > As to what works, "int 0x80" of course works the same everywhere. > > In 32-bit kernels, the vDSO uses "sysenter" when the hardware supports it. > By the nature of "sysenter", it really cannot "allow sysenter" in a generic > sense--it enables entry via "sysenter" when the hardware supports it, but > it always returns to the specific PC address where it mapped the vDSO. > > 32-bit kernels never support using "syscall". > > In 64-bit kernels, the 32-bit vDSO uses "sysenter" when the hardware vendor > is Intel or Centaur, and "syscall" otherwise (never "int 0x80", though that > still works outside the vDSO). All 64-bit kernels enable support for both > 32-bit "sysenter" and 32-bit "syscall" via their respective MSRs. (The > vDSO selection is based on what we think the hardware actually supports.) > Yes. And it seems that there are no cpuid feature bits relating to 32-bit compat variants of these instructions (X86_FEATURE_SEP relates to whether the 64-bit mode supports SEP, and X86_FEATURE_SYSCALL is only set on AMD processors and implicit on 64-bit Intel processors). > As to why, here is what I've pieced together. > > The intent of the choices in the kernel's selection of the vDSO has always > been "whatever is fastest on this hardware". I have never myself been > involved in any measuring or comparison of the various methods, so I can't > speak to the actual choices made or how much attention was really paid. > > The "syscall"/"sysret" instruction interface (AMD's invention) is superior > to "sysenter"/"sysexit" (Intel's invention). It was always part of the > x86_64 interface, since AMD got there first. So all processors support > 64-bit user tasks using "syscall". It's good and even if the privileged > CPU details changed, keeping "syscall" as the user instruction will be fine. > > AMD's were the first x86_64 CPUs, and those always supported "syscall" > from 32-bit tasks to 64-bit kernels. (I don't know whether AMD CPUs now > support "sysenter" from 32-bit tasks to 64-bit kernels, and if so which > past AMD64 CPUs may not have supported that. On today's kernel you could > easily test it by hacking use_sysenter=1 into syscall32_cpu_init and > trying that kernel on an AMD64 CPU. I wouldn't be surprised if it does > work on all cpu_has(X86_FEATURE_SEP) CPUs from AMD too.) > > Intel CPUs do not support "syscall" from 32-bit tasks at all (as per their > documentation), but do support "sysenter" from 32-bit tasks to 64-bit kernels. > I'm not aware of there having been any Intel x86_64 CPU that did not support > "sysenter" this way. > The documentation has no caveats or exceptions. > Using "syscall" when it works kind of looks preferable across the board > because the interface is better. I assume that if AMD's x86_64 CPUs do > support 32->64 "sysenter" too, that "syscall" performs at least as well. > I assume that if Intel or other vendors added 32->64 "syscall" support, > they would not add it unless they were making it the optimal path. > > For 32-bit kernels, we assume that whenever "sysenter" is available, it's > at least preferable to "int 0x80". I don't know the order of AMD's > introduction of "syscall" on 32-bit CPUs and Intel's introduction of > "sysenter", but Linux only ever got a vsyscall using "sysenter". > K6 and Pentium II, respectively, I think (PPro claims to have it, but doesn't). I seem to remember the first sysenter work going in around 1998/9, so it was after the PII. > It was long on my back-burner list to toss in the "syscall" version of the > 32-bit vDSO for 32-bit kernels on hardware that supports "syscall". But, > several recent generations of AMD CPUs do support "sysenter" for 32-bit > kernels, and I haven't myself had on hand for easy kernel hacking one of > the AMD CPUs that supported "syscall" but not "sysenter". Nowadays, more > and more people can (and should) run a 64-bit kernel anyway. So it hasn't > seemed worth the trouble. (If AMD is today making CPUs where for 32-bit > kernels "sysenter" performs much worse than "syscall", then perhaps it is > worth the effort if using 32-bit kernels is the fastest thing for someone.) The AMD documentation says that syscall/sysret is higher performance than sysenter/sysexit, but I don't know if that's true, and if so, to what degree. Intel makes no distinction. HPA and Andi point out that the only AMD cpu which doesn't support sysenter is the K6, and its version of syscall is different from the K7 and later anyway. Thanks for the clarifying overview. I've been piecing my understanding together as I've been getting 64-bit pvops Xen working, and trying to fit what it has to do to implement all these instructions and mode combinations, and it looks like I'm about right. J ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-27 21:45 ` Jeremy Fitzhardinge @ 2008-06-27 21:52 ` Roland McGrath 0 siblings, 0 replies; 11+ messages in thread From: Roland McGrath @ 2008-06-27 21:52 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: Linux Kernel Mailing List > > [...] "int $0x80" for 32-bit > > syscalls made from 64-bit tasks on CONFIG_IA32_EMULATION kernels (intended > > for valgrind). > > Hm, I think that's a post-facto rationalization. It was my recollection of something Andi had said about why it was there. My recollections are not a reliable source of accurate information. > Thanks for the clarifying overview. I've been piecing my understanding > together as I've been getting 64-bit pvops Xen working, and trying to > fit what it has to do to implement all these instructions and mode > combinations, and it looks like I'm about right. I'm always glad to help. Thanks, Roland ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-20 23:39 ` Roland McGrath 2008-06-27 21:45 ` Jeremy Fitzhardinge @ 2008-06-28 5:00 ` Andi Kleen 2008-06-30 0:07 ` Bill Davidsen 2 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2008-06-28 5:00 UTC (permalink / raw) To: Roland McGrath; +Cc: Jeremy Fitzhardinge, Linux Kernel Mailing List Roland McGrath <roland@redhat.com> writes: > > I think it is clearest to talk separately about the "intended ABI", the > "what actually works today", and the "why". (Also note I was not the > decision-maker in this, just picking up what I can see.) You are correct. > For the 32-bit ABI, what I believe was always the intent for what could be > considered the proper ABI is "int 0x80" or "use the vDSO entry point". If > someone asked me what you could ever have expected to rely on for the > future, I would say exactly that. The use of the vDSO is explicitly > intended to take the details of sysenter/syscall or other such new > instructions out of the 32-bit ABI picture for what any proper application > will expect from the kernel. For SYSENTER the vDSO is even needed because it relies on a hardcoded return address. > AMD's were the first x86_64 CPUs, and those always supported "syscall" > from 32-bit tasks to 64-bit kernels. (I don't know whether AMD CPUs now > support "sysenter" from 32-bit tasks to 64-bit kernels, and if so which > past AMD64 CPUs may not have supported that. On today's kernel you could K8 at least. > It was long on my back-burner list to toss in the "syscall" version of the > 32-bit vDSO for 32-bit kernels on hardware that supports "syscall". But, That would only make a difference on K6 (K7 supports SYSENTER), and also K6/K7 SYSCALL was slightly different from the K8 version. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-20 23:39 ` Roland McGrath 2008-06-27 21:45 ` Jeremy Fitzhardinge 2008-06-28 5:00 ` Andi Kleen @ 2008-06-30 0:07 ` Bill Davidsen 2 siblings, 0 replies; 11+ messages in thread From: Bill Davidsen @ 2008-06-30 0:07 UTC (permalink / raw) To: Roland McGrath; +Cc: Jeremy Fitzhardinge, Linux Kernel Mailing List Roland McGrath wrote: >> As far as I can work out, an x86_32 kernel will use "int 0x80" and >> "sysenter" for system calls. 64-bit kernel will use just "syscall" for >> 64-bit processes (though you can use "int 0x80" to access the 32-bit >> syscall interface from a 64-bit process), but will allow "sysenter", >> "syscall" or "int 0x80" for 32-on-64 processes. > > That is correct, with the caveats below. > Thanks for setting this out clearly, I've seen most of it (from Andi, I think) in bits, and one of the scheduler folk had a comment relevant to scheduling which I can't find now, but this is both technical and historical, and thus a nice thing to hand to someone with a related question. Well done. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-20 22:00 Various x86 syscall mechanisms Jeremy Fitzhardinge 2008-06-20 23:39 ` Roland McGrath @ 2008-06-21 0:27 ` H. Peter Anvin 2008-06-21 2:00 ` Jeremy Fitzhardinge ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: H. Peter Anvin @ 2008-06-21 0:27 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: Roland McGrath, Linux Kernel Mailing List Jeremy Fitzhardinge wrote: > Hi Roland, > > As far as I can work out, an x86_32 kernel will use "int 0x80" and > "sysenter" for system calls. 64-bit kernel will use just "syscall" for > 64-bit processes (though you can use "int 0x80" to access the 32-bit > syscall interface from a 64-bit process), but will allow "sysenter", > "syscall" or "int 0x80" for 32-on-64 processes. > > Why does 32-on-64 implement 32-bit syscall when native 32-bit doesn't > seem to? Or am I overlooking something here? Does 32-bit also support > syscall? The reason is that not all 64-bit processors (i.e. K8) support a 32-bit sysenter in long mode (i.e. with a 64-bit kernel.) sysenter is *always* entered from the vdso, since the return address is lost and this is also where a 64-bit kernel can put a syscall. There is no reason we couldn't do syscall for 32-bit native, but the only processor that would benefit would be K7, and that's far enough in the past that I don't think anyone cares enough. Note that long mode syscall is different from protected mode syscall, even in 32-bit compatibility mode. The long mode variant is a lot saner. -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-21 0:27 ` H. Peter Anvin @ 2008-06-21 2:00 ` Jeremy Fitzhardinge 2008-06-21 14:02 ` Andi Kleen 2008-07-01 12:06 ` Jan Engelhardt 2 siblings, 0 replies; 11+ messages in thread From: Jeremy Fitzhardinge @ 2008-06-21 2:00 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Roland McGrath, Linux Kernel Mailing List H. Peter Anvin wrote: > The reason is that not all 64-bit processors (i.e. K8) support a > 32-bit sysenter in long mode (i.e. with a 64-bit kernel.) OK, so compat 32-bit processes would use syscall in that case, even if they wouldn't on a 32-bit kernel? > sysenter is *always* entered from the vdso, since the return address > is lost and this is also where a 64-bit kernel can put a syscall. > > There is no reason we couldn't do syscall for 32-bit native, but the > only processor that would benefit would be K7, and that's far enough > in the past that I don't think anyone cares enough. OK, good. > Note that long mode syscall is different from protected mode syscall, > even in 32-bit compatibility mode. The long mode variant is a lot saner. You mean that syscall arriving in long mode ring0 is saner than syscall arriving in protected mode ring0? J ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-21 0:27 ` H. Peter Anvin 2008-06-21 2:00 ` Jeremy Fitzhardinge @ 2008-06-21 14:02 ` Andi Kleen 2008-06-21 16:51 ` H. Peter Anvin 2008-07-01 12:06 ` Jan Engelhardt 2 siblings, 1 reply; 11+ messages in thread From: Andi Kleen @ 2008-06-21 14:02 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Roland McGrath, Linux Kernel Mailing List "H. Peter Anvin" <hpa@zytor.com> writes: > There is no reason we couldn't do syscall for 32-bit native, but the > only processor that would benefit would be K7, and that's far enough K6 actually. K7 has sysenter AFAIK. The K6 syscall was also slightly different and would need special case code. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-21 14:02 ` Andi Kleen @ 2008-06-21 16:51 ` H. Peter Anvin 0 siblings, 0 replies; 11+ messages in thread From: H. Peter Anvin @ 2008-06-21 16:51 UTC (permalink / raw) To: Andi Kleen; +Cc: Jeremy Fitzhardinge, Roland McGrath, Linux Kernel Mailing List Andi Kleen wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > >> There is no reason we couldn't do syscall for 32-bit native, but the >> only processor that would benefit would be K7, and that's far enough > > K6 actually. K7 has sysenter AFAIK. The K6 syscall was also slightly different > and would need special case code. Sorry, yes, K6. Even more historic. -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Various x86 syscall mechanisms 2008-06-21 0:27 ` H. Peter Anvin 2008-06-21 2:00 ` Jeremy Fitzhardinge 2008-06-21 14:02 ` Andi Kleen @ 2008-07-01 12:06 ` Jan Engelhardt 2 siblings, 0 replies; 11+ messages in thread From: Jan Engelhardt @ 2008-07-01 12:06 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Roland McGrath, Linux Kernel Mailing List On Saturday 2008-06-21 02:27, H. Peter Anvin wrote: > > There is no reason we couldn't do syscall for 32-bit native, but the only > processor that would benefit would be K7, and that's far enough in the past > that I don't think anyone cares enough. Well if it gives a speed improvement, I'd certainly care. I do not see a reason why I should throw away the K7 if it still works, even after 5 years lifetime. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-07-01 12:06 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-20 22:00 Various x86 syscall mechanisms Jeremy Fitzhardinge 2008-06-20 23:39 ` Roland McGrath 2008-06-27 21:45 ` Jeremy Fitzhardinge 2008-06-27 21:52 ` Roland McGrath 2008-06-28 5:00 ` Andi Kleen 2008-06-30 0:07 ` Bill Davidsen 2008-06-21 0:27 ` H. Peter Anvin 2008-06-21 2:00 ` Jeremy Fitzhardinge 2008-06-21 14:02 ` Andi Kleen 2008-06-21 16:51 ` H. Peter Anvin 2008-07-01 12:06 ` Jan Engelhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox