* 64-syscall args on 32-bit vs syscall()
@ 2010-03-15 4:48 Benjamin Herrenschmidt
2010-03-15 5:06 ` David Miller
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 4:48 UTC (permalink / raw)
To: linux-arch
Cc: linux-kernel@vger.kernel.org, Mark Lord, Ulrich Drepper,
Linus Torvalds, Steven Munroe
Hoy there !
This may have been discussed earlier (I have some vague memories...) but
I just hit a problem with that again (Mark: hint, it's in hdparm's
fallocate) so I'd like a bit of a refresh here on what is the "right
thing" to do...
So some syscalls want a 64-bit argument. Let's take fallocate() as our
example. So we already know that we have to be extra careful since some
32-bit arch will pass this into 2 registers (or stack slots) which need
to be aligned, and so we tend to already take care of making sure that
the said 64-bit argument is either defined as 2x32-bit arguments, or
defined as 1x64 bit argument aligned to 2x32-bit in the argument list.
So far so good...
The problem is when user space tries to use the same trick for calling
those functions using glibc-provided syscall() function. In this
example, hdparm does:
err = syscall(SYS_fallocate, fd, mode, offset, len);
With "offset" being a 64-bit argument.
This will break because the first argument to syscall now shifts
everything by one register, which breaks the register pair alignment
(and I suppose archs with stack based calling convention can have
similar alignment issues even if x86 doesn't).
Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
it's first argument to correct that ? Either that or making it some kind
of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
As it is, any 32-bit app using syscall() on any of the syscalls that
takes 64-bit arguments will be broken, unless the app itself breaks up
the argument, but the the order of the hi and lo part is different
between BE and LE architectures ;-)
So is there a more "correct" solution than another here ? Should powerpc
glibc be fixed at least so that syscall() keeps the alignment ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt @ 2010-03-15 5:06 ` David Miller 2010-03-15 5:18 ` Benjamin Herrenschmidt 2010-03-15 15:03 ` Steven Munroe 2010-03-15 15:04 ` Jamie Lokier 2 siblings, 1 reply; 34+ messages in thread From: David Miller @ 2010-03-15 5:06 UTC (permalink / raw) To: benh; +Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Mon, 15 Mar 2010 15:48:13 +1100 > As it is, any 32-bit app using syscall() on any of the syscalls that > takes 64-bit arguments will be broken, unless the app itself breaks up > the argument, but the the order of the hi and lo part is different > between BE and LE architectures ;-) I think it is even different on the same endian architectures, f.e. mips I think. There is no way to do this without some arch specific code to handle things properly, really. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 5:06 ` David Miller @ 2010-03-15 5:18 ` Benjamin Herrenschmidt 2010-03-15 5:54 ` David Miller 2010-03-15 13:44 ` Ralf Baechle 0 siblings, 2 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-15 5:18 UTC (permalink / raw) To: David Miller Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj On Sun, 2010-03-14 at 22:06 -0700, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Mon, 15 Mar 2010 15:48:13 +1100 > > > As it is, any 32-bit app using syscall() on any of the syscalls that > > takes 64-bit arguments will be broken, unless the app itself breaks up > > the argument, but the the order of the hi and lo part is different > > between BE and LE architectures ;-) > > I think it is even different on the same endian architectures, > f.e. mips I think. > > There is no way to do this without some arch specific code > to handle things properly, really. Right, but to what extent ? IE. do we always need the callers using syscall() directly to know it all, or can we to some extent handle some of it inside glibc ? For example, if powerpc glibc is fixed so that syscall() takes a 64-bit first argument (or calls via some macro to add a dummy 32-bit argument), the register alignment will be preserved, and things will work just fine. IE. It may not fix all problems with all archs, but in this case, it will fix the common cases for powerpc at least :-) And any other arch that has the exact same alignment problem. Or is there any good reason -not- to do that in glibc ? Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 5:18 ` Benjamin Herrenschmidt @ 2010-03-15 5:54 ` David Miller 2010-03-15 20:22 ` Benjamin Herrenschmidt 2010-03-15 13:44 ` Ralf Baechle 1 sibling, 1 reply; 34+ messages in thread From: David Miller @ 2010-03-15 5:54 UTC (permalink / raw) To: benh; +Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Mon, 15 Mar 2010 16:18:33 +1100 > Or is there any good reason -not- to do that in glibc ? The whole point of syscall() is to handle cases where the C library doesn't know about the system call yet. I think it's therefore very much "buyer beware". On sparc it'll never work to use the workaround you're proposing since we pass everything in via registers. So arch knowledge will always need to be present in these situations. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 5:54 ` David Miller @ 2010-03-15 20:22 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-15 20:22 UTC (permalink / raw) To: David Miller Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj On Sun, 2010-03-14 at 22:54 -0700, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Mon, 15 Mar 2010 16:18:33 +1100 > > > Or is there any good reason -not- to do that in glibc ? > > The whole point of syscall() is to handle cases where the C library > doesn't know about the system call yet. > > I think it's therefore very much "buyer beware". > > On sparc it'll never work to use the workaround you're proposing since > we pass everything in via registers. > > So arch knowledge will always need to be present in these situations. I'm not sure I follow. We also pass via register on powerpc, but the offset introduced by the sysno argument breaks register pair alignment which cannot be fixed up inside syscall(). However, if I change glibc's syscall to be something like #define syscall(sysno, args...) __syscall(0 /* dummy */, sysno, args) And make __syscall then do something like: mr r0, r4 mr r3, r5 mr r4, r6 mr r5, r7 mr r6, r8 .../... sc blr Then at least all that class of syscalls will be fixed. Of course this has to be in glibc arch code. I was merely asking if that was something our glibc folks would consider and whether somebody could think of a better solution :-) Cheers ,Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 5:18 ` Benjamin Herrenschmidt 2010-03-15 5:54 ` David Miller @ 2010-03-15 13:44 ` Ralf Baechle 2010-03-15 15:13 ` H. Peter Anvin 2010-03-15 20:27 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 34+ messages in thread From: Ralf Baechle @ 2010-03-15 13:44 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Miller, linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj On Mon, Mar 15, 2010 at 04:18:33PM +1100, Benjamin Herrenschmidt wrote: > On Sun, 2010-03-14 at 22:06 -0700, David Miller wrote: > > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Date: Mon, 15 Mar 2010 15:48:13 +1100 > > > > > As it is, any 32-bit app using syscall() on any of the syscalls that > > > takes 64-bit arguments will be broken, unless the app itself breaks up > > > the argument, but the the order of the hi and lo part is different > > > between BE and LE architectures ;-) > > > > I think it is even different on the same endian architectures, > > f.e. mips I think. MIPS passes arguments in the endian order that is low/high for little endian rsp high/low for big endian. > > There is no way to do this without some arch specific code > > to handle things properly, really. > > Right, but to what extent ? IE. do we always need the callers using > syscall() directly to know it all, or can we to some extent handle some > of it inside glibc ? > > For example, if powerpc glibc is fixed so that syscall() takes a 64-bit > first argument (or calls via some macro to add a dummy 32-bit argument), > the register alignment will be preserved, and things will work just > fine. > > IE. It may not fix all problems with all archs, but in this case, it > will fix the common cases for powerpc at least :-) And any other arch > that has the exact same alignment problem. > > Or is there any good reason -not- to do that in glibc ? Syscall is most often used for new syscalls that have no syscall stub in glibc yet, so the user of syscall() encodes this ABI knowledge. If at a later stage syscall() is changed to have this sort of knowledge we break the API. This is something only the kernel can get right. Ralf ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 13:44 ` Ralf Baechle @ 2010-03-15 15:13 ` H. Peter Anvin 2010-03-15 16:00 ` Ulrich Drepper 2010-03-15 20:27 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2010-03-15 15:13 UTC (permalink / raw) To: Ralf Baechle Cc: Benjamin Herrenschmidt, David Miller, linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj On 03/15/2010 06:44 AM, Ralf Baechle wrote: > > Syscall is most often used for new syscalls that have no syscall stub in > glibc yet, so the user of syscall() encodes this ABI knowledge. If at a > later stage syscall() is changed to have this sort of knowledge we break > the API. This is something only the kernel can get right. > One option would be to do a libkernel.so, with auto-generated stubs out of the kernel build tree. As already discussed in #kernel this morning, there are a number of sticky points with types and namespaces for this this, but those aren't any worse than the equivalent problems for syscall(3). -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 15:13 ` H. Peter Anvin @ 2010-03-15 16:00 ` Ulrich Drepper 2010-03-15 19:00 ` David Miller 0 siblings, 1 reply; 34+ messages in thread From: Ulrich Drepper @ 2010-03-15 16:00 UTC (permalink / raw) To: H. Peter Anvin Cc: Ralf Baechle, Benjamin Herrenschmidt, David Miller, linux-arch, linux-kernel, kernel, torvalds, munroesj -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/15/2010 08:13 AM, H. Peter Anvin wrote: > One option would be to do a libkernel.so, No need. Put it in the vdso. And name it something other than syscall. The syscall() API is fixed, you cannot change it. All this only if it makes sense for ALL archs. If it cannot work for just one arch then it's not worth it at all. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkueWbcACgkQ2ijCOnn/RHRtBQCeP88S/0xei7CAt65AGboqsrC8 N7wAoK7Qbi+OZuQrgHTCgTA27TgY+gQU =4tJ6 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 16:00 ` Ulrich Drepper @ 2010-03-15 19:00 ` David Miller 2010-03-15 19:41 ` H. Peter Anvin 0 siblings, 1 reply; 34+ messages in thread From: David Miller @ 2010-03-15 19:00 UTC (permalink / raw) To: drepper Cc: hpa, ralf, benh, linux-arch, linux-kernel, kernel, torvalds, munroesj From: Ulrich Drepper <drepper@redhat.com> Date: Mon, 15 Mar 2010 09:00:55 -0700 > On 03/15/2010 08:13 AM, H. Peter Anvin wrote: >> One option would be to do a libkernel.so, > > No need. Put it in the vdso. And name it something other than syscall. > The syscall() API is fixed, you cannot change it. > > All this only if it makes sense for ALL archs. If it cannot work for > just one arch then it's not worth it at all. There are many archs that still lack VDSO. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 19:00 ` David Miller @ 2010-03-15 19:41 ` H. Peter Anvin 2010-03-15 20:35 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2010-03-15 19:41 UTC (permalink / raw) To: David Miller Cc: drepper, ralf, benh, linux-arch, linux-kernel, kernel, torvalds, munroesj On 03/15/2010 12:00 PM, David Miller wrote: > From: Ulrich Drepper <drepper@redhat.com> > Date: Mon, 15 Mar 2010 09:00:55 -0700 > >> On 03/15/2010 08:13 AM, H. Peter Anvin wrote: >>> One option would be to do a libkernel.so, >> >> No need. Put it in the vdso. And name it something other than syscall. >> The syscall() API is fixed, you cannot change it. >> >> All this only if it makes sense for ALL archs. If it cannot work for >> just one arch then it's not worth it at all. > > There are many archs that still lack VDSO. Putting it into the vdso is also rather annoyingly heavyweight for what is nothing other than an ordinary shared library. Just making it an ordinary shared library seems a lot saner. I don't see why syscall() can't change the type for its first argument -- it seems to be exactly what symbol versioning is for. Doesn't change the fact that it is fundamentally broken, of course. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 19:41 ` H. Peter Anvin @ 2010-03-15 20:35 ` Benjamin Herrenschmidt 2010-03-15 20:41 ` H. Peter Anvin 2010-03-16 21:56 ` Steven Munroe 0 siblings, 2 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-15 20:35 UTC (permalink / raw) To: H. Peter Anvin Cc: David Miller, drepper, ralf, linux-arch, linux-kernel, kernel, torvalds, munroesj On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote: > I don't see why syscall() can't change the type for its first argument > -- it seems to be exactly what symbol versioning is for. > > Doesn't change the fact that it is fundamentally broken, of course. No need to change the type of the first arg and go for symbol versionning if you do something like I proposed earlier, there will be no conflict between syscall() and __syscall() and both variants can exist. Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 20:35 ` Benjamin Herrenschmidt @ 2010-03-15 20:41 ` H. Peter Anvin 2010-03-16 21:56 ` Steven Munroe 1 sibling, 0 replies; 34+ messages in thread From: H. Peter Anvin @ 2010-03-15 20:41 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Miller, drepper, ralf, linux-arch, linux-kernel, kernel, torvalds, munroesj On 03/15/2010 01:35 PM, Benjamin Herrenschmidt wrote: > On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote: >> I don't see why syscall() can't change the type for its first argument >> -- it seems to be exactly what symbol versioning is for. >> >> Doesn't change the fact that it is fundamentally broken, of course. > > No need to change the type of the first arg and go for symbol > versionning if you do something like I proposed earlier, there will be > no conflict between syscall() and __syscall() and both variants can > exist. > Basically symbol versioning done "by hand", actually using symbol versioning is better, IMNSHO. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 20:35 ` Benjamin Herrenschmidt 2010-03-15 20:41 ` H. Peter Anvin @ 2010-03-16 21:56 ` Steven Munroe 2010-03-17 0:31 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 34+ messages in thread From: Steven Munroe @ 2010-03-16 21:56 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: H. Peter Anvin, David Miller, drepper, ralf, linux-arch, linux-kernel, kernel, torvalds On Tue, 2010-03-16 at 07:35 +1100, Benjamin Herrenschmidt wrote: > On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote: > > I don't see why syscall() can't change the type for its first argument > > -- it seems to be exactly what symbol versioning is for. > > > > Doesn't change the fact that it is fundamentally broken, of course. > > No need to change the type of the first arg and go for symbol > versionning if you do something like I proposed earlier, there will be > no conflict between syscall() and __syscall() and both variants can > exist. > One concern is the new syscall and the kernel have to match and mixing will not work. your proposal seems to impact all syscalls not just the one called via syscall API. These syscalls get generated inline which makes static linking very dangerous ... So I think you do need both symbol versioning and kernel feature stubs (like xstat). That gets to be a lot of work > Cheers, > Ben. > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-16 21:56 ` Steven Munroe @ 2010-03-17 0:31 ` Benjamin Herrenschmidt 2010-03-17 5:52 ` Ulrich Drepper 0 siblings, 1 reply; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-17 0:31 UTC (permalink / raw) To: munroesj Cc: H. Peter Anvin, David Miller, drepper, ralf, linux-arch, linux-kernel, kernel, torvalds On Tue, 2010-03-16 at 16:56 -0500, Steven Munroe wrote: > On Tue, 2010-03-16 at 07:35 +1100, Benjamin Herrenschmidt wrote: > > On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote: > > > I don't see why syscall() can't change the type for its first argument > > > -- it seems to be exactly what symbol versioning is for. > > > > > > Doesn't change the fact that it is fundamentally broken, of course. > > > > No need to change the type of the first arg and go for symbol > > versionning if you do something like I proposed earlier, there will be > > no conflict between syscall() and __syscall() and both variants can > > exist. > > > One concern is the new syscall and the kernel have to match and mixing > will not work. your proposal seems to impact all syscalls not just the > one called via syscall API. These syscalls get generated inline which > makes static linking very dangerous ... > > So I think you do need both symbol versioning and kernel feature stubs > (like xstat). That gets to be a lot of work What do you mean ? My proposal is purely a change to the syscall() function, nothing else. No kernel change, no ABI change, no change to the way glibc normally calls syscalls internally, etc... just the exported syscall() function to shift its arguments in order to avoid losing register pair alignment. And the change would still be compatible with existing userland code who manually splits the 64-bit arguments to avoid the problem on power. IE. Unless I've missed something, this would be a 100% backward compatible change that simply make a whole class of syscall() use work that didn't before on power (but did on x86), such as the one I hit in hdparm for example. Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 0:31 ` Benjamin Herrenschmidt @ 2010-03-17 5:52 ` Ulrich Drepper 2010-03-17 8:56 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 34+ messages in thread From: Ulrich Drepper @ 2010-03-17 5:52 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/16/2010 05:31 PM, Benjamin Herrenschmidt wrote: > My proposal is purely a change to the syscall() > function, nothing else. No kernel change, no ABI change, no change to > the way glibc normally calls syscalls internally, etc... How can this be? People are today actively working around the problem of 64-bit arguments. You have to break something since you cannot recognize these situations. And since it became meanwhile clear that there is no way to "fix" all archs magically I really don't want to introduce anything. There are mechanisms in place to abstract out some of the issues. And for the rest, well, if you're using syscalls directly you already have to encoded lowlevel knowledge. One more bit doesn't hurt. It's not as if this happens every day. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkugbhsACgkQ2ijCOnn/RHQzlACeMp0UK2jZuZOgXhJjB8Z9p4kh rCoAn0zaJqFYV9tQ0Ct49Mprfa0O5iKh =71la -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 5:52 ` Ulrich Drepper @ 2010-03-17 8:56 ` Benjamin Herrenschmidt 2010-03-17 9:14 ` Ulrich Drepper ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-17 8:56 UTC (permalink / raw) To: Ulrich Drepper Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Tue, 2010-03-16 at 22:52 -0700, Ulrich Drepper wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 03/16/2010 05:31 PM, Benjamin Herrenschmidt wrote: > > My proposal is purely a change to the syscall() > > function, nothing else. No kernel change, no ABI change, no change to > > the way glibc normally calls syscalls internally, etc... > > How can this be? People are today actively working around the problem > of 64-bit arguments. You have to break something since you cannot > recognize these situations. Ok, so I -may- be missing something, but I believe this won't break anything: - You keep the existing syscall() exported by glibc for binary compatibility - You add a new __syscall() (or whatever you want to name it) that adds a dummy argument at the beginning, and whose implementation shifts everything by 2 instead of 1 argument before calling into the kernel - You define in unistd.h or whatever is relevant, a macro that does: #define syscall(__sysno, __args..) __syscall(0, _sysno, __args) I believe that should cover it, at least for powerpc, possibly for other archs too though as I said, I may have missed something there. IE. Whether your app writes: syscall(SYS_foo, my_64bit_arg); Or syscall(SYS_foo, (u32)(my_64bit_arg >> 32), (u32)(my_64bit_arg)); Both should still work with the new approach and end up doing the right thing. Hence, apps that use the first form today because it works on x86 would end up working at least on powerpc where they would have been otherwise broken unless they used some arch specific #ifdef to do the second form. > And since it became meanwhile clear that > there is no way to "fix" all archs magically I really don't want to > introduce anything. There are mechanisms in place to abstract out some > of the issues. And for the rest, well, if you're using syscalls > directly you already have to encoded lowlevel knowledge. One more bit > doesn't hurt. It's not as if this happens every day. It doesn't happen everyday. However, if my proposal ends up fixing a bunch of cases where it does without breaking anything, then I suppose it's worth considering, though as I said, it's possible that I miss some subtlety here in which case I'd be glad to stand corrected :-) Cheers, Ben. > - -- > ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkugbhsACgkQ2ijCOnn/RHQzlACeMp0UK2jZuZOgXhJjB8Z9p4kh > rCoAn0zaJqFYV9tQ0Ct49Mprfa0O5iKh > =71la > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 8:56 ` Benjamin Herrenschmidt @ 2010-03-17 9:14 ` Ulrich Drepper 2010-03-17 10:13 ` Benjamin Herrenschmidt 2010-03-17 9:18 ` Jamie Lokier 2010-03-17 18:30 ` H. Peter Anvin 2 siblings, 1 reply; 34+ messages in thread From: Ulrich Drepper @ 2010-03-17 9:14 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/17/2010 01:56 AM, Benjamin Herrenschmidt wrote: >> - You keep the existing syscall() exported by glibc for binary >> compatibility > >> - You add a new __syscall() (or whatever you want to name it) that adds >> a dummy argument at the beginning, and whose implementation shifts >> everything by 2 instead of 1 argument before calling into the kernel > >> - You define in unistd.h or whatever is relevant, a macro that does: > >> #define syscall(__sysno, __args..) __syscall(0, _sysno, __args) > >> I believe that should cover it, at least for powerpc, possibly for other >> archs too though as I said, I may have missed something there. How can this possibly be the case? This will screw people who currently work around the ppc limitations of the existing syscall. Just leave it alone. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkugnV0ACgkQ2ijCOnn/RHRL4gCeIY0SLDCgLqtVvuMw+pvCzkwE 3MIAoJQRK5Mc+WtC/Wz9tPFPy4X+EALe =lexw -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 9:14 ` Ulrich Drepper @ 2010-03-17 10:13 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-17 10:13 UTC (permalink / raw) To: Ulrich Drepper Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Wed, 2010-03-17 at 02:14 -0700, Ulrich Drepper wrote: > >> I believe that should cover it, at least for powerpc, possibly for > other > >> archs too though as I said, I may have missed something there. > > How can this possibly be the case? This will screw people who > currently > work around the ppc limitations of the existing syscall. No it won't. As I said, it will work for both cases. The problem is a register pair alignment problem. If the alignment is corrected with the trick I proposed, 64-bit values will end up in the right pair, but manually worked-around cases where the value is already broken up will -also- end up in the right pair. The problem with syscall() as it is is that it skews the arguments by 1 register, which causes the compiler to skip a register when generating the call for a 64-bit value. By doing the trick I propose, that skew will be gone, both 32 and 64 bit arguments will end up where expected. Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 8:56 ` Benjamin Herrenschmidt 2010-03-17 9:14 ` Ulrich Drepper @ 2010-03-17 9:18 ` Jamie Lokier 2010-03-17 10:18 ` Benjamin Herrenschmidt 2010-03-17 18:30 ` H. Peter Anvin 2 siblings, 1 reply; 34+ messages in thread From: Jamie Lokier @ 2010-03-17 9:18 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ulrich Drepper, munroesj, H. Peter Anvin, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds Benjamin Herrenschmidt wrote: > Hence, apps that use the first form today because it works on x86 would > end up working at least on powerpc where they would have been otherwise > broken unless they used some arch specific #ifdef to do the second form. I think what Ulrich is getting at is your change will break existing code which already does: #ifdef __powerpc__ syscall(SYS_foo, 0, my_64bit_arg); #else syscall(SYS_foo, my_64bit_arg); #endif I don't know of any such code, but it might be out there. -- Jamie ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 9:18 ` Jamie Lokier @ 2010-03-17 10:18 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-17 10:18 UTC (permalink / raw) To: Jamie Lokier Cc: Ulrich Drepper, munroesj, H. Peter Anvin, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Wed, 2010-03-17 at 09:18 +0000, Jamie Lokier wrote: > Benjamin Herrenschmidt wrote: > > Hence, apps that use the first form today because it works on x86 would > > end up working at least on powerpc where they would have been otherwise > > broken unless they used some arch specific #ifdef to do the second form. > > I think what Ulrich is getting at is your change will break existing > code which already does: > > #ifdef __powerpc__ > syscall(SYS_foo, 0, my_64bit_arg); > #else > syscall(SYS_foo, my_64bit_arg); > #endif > > I don't know of any such code, but it might be out there. No, the above "workaround" doesn't work. With the existing syscall() definition, there is no difference between your two examples. In the first case, you force a proper 64-bit aligment, but you are already off by one register pair from the kernel expectation. In the second case, gcc will imply one, which means that both your examples above will result in my_64bit_arg in the -same- place, which is off by a register pair from what the kernel expect. IE. In the first case gcc will put SYS_foo in r3, 0 in r4, and my_64bit_arg in r5 and r6. In the second case, gcc will put SYS_foo in r3, won't care about r4, and will put the 64-bit arg in r5 and r6. Then, glibc syscall() will shift r3 to r0, r3 to r4 etc... causing my_64bit_arg to land in r4 and r5. But the kernel expects it in r3 and r4. The workaround that apps should use today is: #if defined(__powerpc__) && WORDSIZE == 32 syscall(SYS_foo, (u32)(my_64bit_arg >> 32), (u32)my_64bit_arg); #else syscall(SYS_foo, my_64bit_arg); #endif And with my proposed change, both of the above will work. IE. gcc will put the argument always in r5,r6 and the syscall() implementation will always shift r5 to r3 and t6 to r4. Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 8:56 ` Benjamin Herrenschmidt 2010-03-17 9:14 ` Ulrich Drepper 2010-03-17 9:18 ` Jamie Lokier @ 2010-03-17 18:30 ` H. Peter Anvin 2010-03-17 20:35 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2010-03-17 18:30 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On 03/17/2010 01:56 AM, Benjamin Herrenschmidt wrote: > > Ok, so I -may- be missing something, but I believe this won't break > anything: > > - You keep the existing syscall() exported by glibc for binary > compatibility > > - You add a new __syscall() (or whatever you want to name it) that adds > a dummy argument at the beginning, and whose implementation shifts > everything by 2 instead of 1 argument before calling into the kernel > > - You define in unistd.h or whatever is relevant, a macro that does: > > #define syscall(__sysno, __args..) __syscall(0, _sysno, __args) > Again, this is *exactly* symbol versioning done by hand... we have proper symbol versioning, let's use it. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 18:30 ` H. Peter Anvin @ 2010-03-17 20:35 ` Benjamin Herrenschmidt 2010-03-17 20:53 ` H. Peter Anvin 0 siblings, 1 reply; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-17 20:35 UTC (permalink / raw) To: H. Peter Anvin Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Wed, 2010-03-17 at 11:30 -0700, H. Peter Anvin wrote: > Again, this is *exactly* symbol versioning done by hand... we have > proper symbol versioning, let's use it. Yeah, whatever, I don't mind what technique you use for the versionning, ultimately, if the approach works, we can look at those details :-) We -do- need the macro to strip the dummy argument though, unless we use a slightly different technique which is to make the __sysno argument itself 64-bit, which works as well I believe. Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 20:35 ` Benjamin Herrenschmidt @ 2010-03-17 20:53 ` H. Peter Anvin 2010-03-17 22:58 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2010-03-17 20:53 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On 03/17/2010 01:35 PM, Benjamin Herrenschmidt wrote: > On Wed, 2010-03-17 at 11:30 -0700, H. Peter Anvin wrote: >> Again, this is *exactly* symbol versioning done by hand... we have >> proper symbol versioning, let's use it. > > Yeah, whatever, I don't mind what technique you use for the versionning, > ultimately, if the approach works, we can look at those details :-) We > -do- need the macro to strip the dummy argument though, unless we use > a slightly different technique which is to make the __sysno argument > itself 64-bit, which works as well I believe. > It seems cleaner to do it that way (with a 64-bit sysno arg.) -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 20:53 ` H. Peter Anvin @ 2010-03-17 22:58 ` Benjamin Herrenschmidt 2010-03-18 16:08 ` Steven Munroe 0 siblings, 1 reply; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-17 22:58 UTC (permalink / raw) To: H. Peter Anvin Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Wed, 2010-03-17 at 13:53 -0700, H. Peter Anvin wrote: > > Yeah, whatever, I don't mind what technique you use for the > versionning, > > ultimately, if the approach works, we can look at those details :-) > We > > -do- need the macro to strip the dummy argument though, unless we > use > > a slightly different technique which is to make the __sysno argument > > itself 64-bit, which works as well I believe. > > > > It seems cleaner to do it that way (with a 64-bit sysno arg.) Right. Now if we can get Ulrich to actually put 2 and 2 together and admit that it actually works without breaking anything existing (at least for my arch but I wouldn't be surprised if that was the case for others), I would be even happier :-) Steve, any chance you can cook up a glibc patch to test with ? Maybe making it powerpc specific for now ? Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-17 22:58 ` Benjamin Herrenschmidt @ 2010-03-18 16:08 ` Steven Munroe 2010-03-18 16:21 ` Andreas Schwab 0 siblings, 1 reply; 34+ messages in thread From: Steven Munroe @ 2010-03-18 16:08 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: H. Peter Anvin, Ulrich Drepper, munroesj, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Thu, 2010-03-18 at 09:58 +1100, Benjamin Herrenschmidt wrote: > On Wed, 2010-03-17 at 13:53 -0700, H. Peter Anvin wrote: > > > Yeah, whatever, I don't mind what technique you use for the > > versionning, > > > ultimately, if the approach works, we can look at those details :-) > > We > > > -do- need the macro to strip the dummy argument though, unless we > > use > > > a slightly different technique which is to make the __sysno argument > > > itself 64-bit, which works as well I believe. > > > > > > > It seems cleaner to do it that way (with a 64-bit sysno arg.) > > Right. Now if we can get Ulrich to actually put 2 and 2 together and > admit that it actually works without breaking anything existing (at > least for my arch but I wouldn't be surprised if that was the case for > others), I would be even happier :-) > > Steve, any chance you can cook up a glibc patch to test with ? Maybe > making it powerpc specific for now ? > Do what declare __sysno as long long? The current prototype is in unistd.h: #ifdef __USE_MISC /* Invoke `system call' number SYSNO, passing it the remaining arguments. This is completely system-dependent, and not often useful. In Unix, `syscall' sets `errno' for all errors and most calls return -1 for errors; in many systems you cannot pass arguments or get return values for all system calls (`pipe', `fork', and `getppid' typically among them). In Mach, all system calls take normal arguments and always return an error code (zero for success). */ extern long int syscall (long int __sysno, ...) __THROW; #endif /* Use misc. */ Changing this would be an ABI change and would have to be versioned. It would effect any one using syscall not just SYS_fallocate. the question is do programmers in practice include unistd.h when they use syscall. If the changed prototype is not in scope then the 1st parm (__sysno) defaults to int and is passed in on r3 which gets moved to r0. If the changed syscall prototype is in scope then then _sysno would be passed in r3/r4 (r3 would be 0 would be passed to r0 and the actual system number would be in r4 and passed to the kernel in r3) which behavior do you want? which (incorrect behavior compiled into existing codes do you want to support? Do you want syscall.S for PPC32 to change to match the changed prototype? It will have to be be versioned and the new prototype will only be available in future releases of GLIBC. Existing applications will bind to the old ABI and get the old behavior. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-18 16:08 ` Steven Munroe @ 2010-03-18 16:21 ` Andreas Schwab 2010-03-18 17:03 ` Steven Munroe 0 siblings, 1 reply; 34+ messages in thread From: Andreas Schwab @ 2010-03-18 16:21 UTC (permalink / raw) To: munroesj Cc: Benjamin Herrenschmidt, H. Peter Anvin, Ulrich Drepper, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds Steven Munroe <munroesj@linux.vnet.ibm.com> writes: > extern long int syscall (long int __sysno, ...) __THROW; > > #endif /* Use misc. */ > > Changing this would be an ABI change and would have to be versioned. It > would effect any one using syscall not just SYS_fallocate. > > the question is do programmers in practice include unistd.h when they > use syscall. > > If the changed prototype is not in scope then the 1st parm (__sysno) > defaults to int and is passed in on r3 which gets moved to r0. int is incompatible with long, so you already get undefined behaviour anyway. Andreas. -- Andreas Schwab, schwab@redhat.com GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E "And now for something completely different." ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-18 16:21 ` Andreas Schwab @ 2010-03-18 17:03 ` Steven Munroe 2010-03-18 21:18 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 34+ messages in thread From: Steven Munroe @ 2010-03-18 17:03 UTC (permalink / raw) To: Andreas Schwab Cc: munroesj, Benjamin Herrenschmidt, H. Peter Anvin, Ulrich Drepper, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Thu, 2010-03-18 at 17:21 +0100, Andreas Schwab wrote: > Steven Munroe <munroesj@linux.vnet.ibm.com> writes: > > > extern long int syscall (long int __sysno, ...) __THROW; > > > > #endif /* Use misc. */ > > > > Changing this would be an ABI change and would have to be versioned. It > > would effect any one using syscall not just SYS_fallocate. > > > > the question is do programmers in practice include unistd.h when they > > use syscall. > > > > If the changed prototype is not in scope then the 1st parm (__sysno) > > defaults to int and is passed in on r3 which gets moved to r0. > > int is incompatible with long, so you already get undefined behaviour > anyway. > Sorry in and long are compatible in 32-bit but not long long. int and long are not compatible in 64-bit It is hard the keep all the nodes and arguments straight. But the concern about changing the prototype and are people actually using the prototype are still valid. > Andreas. > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-18 17:03 ` Steven Munroe @ 2010-03-18 21:18 ` Benjamin Herrenschmidt 2010-03-19 1:22 ` Jamie Lokier 0 siblings, 1 reply; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-18 21:18 UTC (permalink / raw) To: munroesj Cc: Andreas Schwab, H. Peter Anvin, Ulrich Drepper, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds On Thu, 2010-03-18 at 12:03 -0500, Steven Munroe wrote: > Sorry in and long are compatible in 32-bit but not long long. > > int and long are not compatible in 64-bit > > It is hard the keep all the nodes and arguments straight. > > But the concern about changing the prototype and are people actually > using the prototype are still valid. Well, using the macro trick instead would fix that problem, code wouldn't build if it doesn't include unistd.h :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-18 21:18 ` Benjamin Herrenschmidt @ 2010-03-19 1:22 ` Jamie Lokier 0 siblings, 0 replies; 34+ messages in thread From: Jamie Lokier @ 2010-03-19 1:22 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: munroesj, Andreas Schwab, H. Peter Anvin, Ulrich Drepper, David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds Benjamin Herrenschmidt wrote: > On Thu, 2010-03-18 at 12:03 -0500, Steven Munroe wrote: > > Sorry in and long are compatible in 32-bit but not long long. > > > > int and long are not compatible in 64-bit > > > > It is hard the keep all the nodes and arguments straight. > > > > But the concern about changing the prototype and are people actually > > using the prototype are still valid. > > Well, using the macro trick instead would fix that problem, code > wouldn't build if it doesn't include unistd.h :-) Or it will build, but call the old ABI version - no change to those programs. -- Jamie ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 13:44 ` Ralf Baechle 2010-03-15 15:13 ` H. Peter Anvin @ 2010-03-15 20:27 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-15 20:27 UTC (permalink / raw) To: Ralf Baechle Cc: David Miller, linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj On Mon, 2010-03-15 at 14:44 +0100, Ralf Baechle wrote: > Syscall is most often used for new syscalls that have no syscall stub in > glibc yet, so the user of syscall() encodes this ABI knowledge. If at a > later stage syscall() is changed to have this sort of knowledge we break > the API. This is something only the kernel can get right. Well, no. The change I propose would not break the ABI on powerpc and would auto-magically fix thoses cases :-) But again, you don't have to do the same thing on MIPS or sparc, it's definitely arch specific. IE. What you are saying is that a syscall defined in the kernel as: sys_foo(u64 arg); To be called from userspace would require something like: u64 arg = 0x123456789abcdef01; #if defined(__powerpc__) && WORDSIZE == 32 syscall(SYS_foo, (u32)(arg >> 32), (u32)arg); #ese syscall(SYS_foo, arg); While with the trick of making syscall a macro wrapping an underlying __syscall that has an added dummy argument, the register alignment is "corrected" and thus -both- forms above suddenly work for me. That might actually work for you too. Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt 2010-03-15 5:06 ` David Miller @ 2010-03-15 15:03 ` Steven Munroe 2010-03-15 20:32 ` Benjamin Herrenschmidt 2010-03-15 15:04 ` Jamie Lokier 2 siblings, 1 reply; 34+ messages in thread From: Steven Munroe @ 2010-03-15 15:03 UTC (permalink / raw) To: Benjamin Herrenschmidt, Ryan S. Arnold Cc: linux-arch, linux-kernel@vger.kernel.org, Mark Lord, Ulrich Drepper, Linus Torvalds On Mon, 2010-03-15 at 15:48 +1100, Benjamin Herrenschmidt wrote: > Hoy there ! > > This may have been discussed earlier (I have some vague memories...) but > I just hit a problem with that again (Mark: hint, it's in hdparm's > fallocate) so I'd like a bit of a refresh here on what is the "right > thing" to do... > > So some syscalls want a 64-bit argument. Let's take fallocate() as our > example. So we already know that we have to be extra careful since some > 32-bit arch will pass this into 2 registers (or stack slots) which need > to be aligned, and so we tend to already take care of making sure that > the said 64-bit argument is either defined as 2x32-bit arguments, or > defined as 1x64 bit argument aligned to 2x32-bit in the argument list. > > So far so good... > > The problem is when user space tries to use the same trick for calling > those functions using glibc-provided syscall() function. In this > example, hdparm does: > > err = syscall(SYS_fallocate, fd, mode, offset, len); > > With "offset" being a 64-bit argument. > The powerpc implementation of syscall is: ENTRY (syscall) mr r0,r3 mr r3,r4 mr r4,r5 mr r5,r6 mr r6,r7 mr r7,r8 mr r8,r9 sc PSEUDO_RET PSEUDO_END (syscall) The ABI says: "Long long arguments are considered to have 8-byte size and alignment. The same 8-byte arguments that must go in aligned pairs or registers are 8-byte aligned on the stack." This implies that the SYS_fallocate call will skip a register to get the required alignment in the parameter save area. for ppc32 on entry r3 == SYS_fallocate r4 == fd r5 == mode r6 == not used r7, r8 == offset r9 == len This gets shifted to: r0 == SYS_fallocate r3 == fd r4 == mode r5 == not used r6, r7 == offset r8 == len For syscall the vararg parms will be mirrored to the parameter save area but will not be used. The ABI does not talk to LE for this case. Ryan does the new ABI doc cover this? > This will break because the first argument to syscall now shifts > everything by one register, which breaks the register pair alignment > (and I suppose archs with stack based calling convention can have > similar alignment issues even if x86 doesn't). > > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as > it's first argument to correct that ? Either that or making it some kind > of macro wrapper around a __syscall(int dummy, int sysno, ...) ? > > As it is, any 32-bit app using syscall() on any of the syscalls that > takes 64-bit arguments will be broken, unless the app itself breaks up > the argument, but the the order of the hi and lo part is different > between BE and LE architectures ;-) > > So is there a more "correct" solution than another here ? Should powerpc > glibc be fixed at least so that syscall() keeps the alignment ? > > Cheers, > Ben. > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 15:03 ` Steven Munroe @ 2010-03-15 20:32 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-15 20:32 UTC (permalink / raw) To: munroesj Cc: Ryan S. Arnold, linux-arch, linux-kernel@vger.kernel.org, Mark Lord, Ulrich Drepper, Linus Torvalds > The powerpc implementation of syscall is: > > > ENTRY (syscall) > mr r0,r3 > mr r3,r4 > mr r4,r5 > mr r5,r6 > mr r6,r7 > mr r7,r8 > mr r8,r9 > sc > PSEUDO_RET > PSEUDO_END (syscall) And my proposal is to make it instead: #define syscall(__sysno, __args...) __syscall(0,__sysno,__args) ENTRY (__syscall) mr r0,r4 mr r3,r5 mr r4,r6 mr r5,r7 mr r6,r8 mr r7,r9 mr r8,r10 sc PSEUDO_RET PSEUDO_END (__syscall) > The ABI says: > > "Long long arguments are considered to have 8-byte size and alignment. > The same 8-byte arguments that must go in aligned pairs or registers are > 8-byte aligned on the stack." Right, that's what I'm explaining too. > This implies that the SYS_fallocate call will skip a register to get the > required alignment in the parameter save area. > > for ppc32 on entry > > r3 == SYS_fallocate > r4 == fd > r5 == mode > r6 == not used > r7, r8 == offset > r9 == len len is 64-bit too afaik but let's ignore that for now > This gets shifted to: > > r0 == SYS_fallocate > r3 == fd > r4 == mode > r5 == not used > r6, r7 == offset > r8 == len Which is not correct, as the kernel expects: r0 == SYS_fallocate r3 == fd r4 == mode r5, r6 == offset r7, r8 == len > For syscall the vararg parms will be mirrored to the parameter save area > but will not be used. The ABI does not talk to LE for this case. Right, but the fact that we shift all args by -1- register means that we break the 64-bit register pair alignment compared to the real syscall which uses r0 instead for the syscall number. Hence my proposal to add a dummy argument to restore that alignment. As it is there is userspace code that does: syscall(SYS_fallocate, fd, mode, offset, len); Which works on x86 but is broken on ppc32 unless we do that change. Cheers, Ben. > Ryan does the new ABI doc cover this? > > > This will break because the first argument to syscall now shifts > > everything by one register, which breaks the register pair alignment > > (and I suppose archs with stack based calling convention can have > > similar alignment issues even if x86 doesn't). > > > > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as > > it's first argument to correct that ? Either that or making it some kind > > of macro wrapper around a __syscall(int dummy, int sysno, ...) ? > > > > As it is, any 32-bit app using syscall() on any of the syscalls that > > takes 64-bit arguments will be broken, unless the app itself breaks up > > the argument, but the the order of the hi and lo part is different > > between BE and LE architectures ;-) > > > > So is there a more "correct" solution than another here ? Should powerpc > > glibc be fixed at least so that syscall() keeps the alignment ? > > > > Cheers, > > Ben. > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt 2010-03-15 5:06 ` David Miller 2010-03-15 15:03 ` Steven Munroe @ 2010-03-15 15:04 ` Jamie Lokier 2010-03-15 20:33 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 34+ messages in thread From: Jamie Lokier @ 2010-03-15 15:04 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: linux-arch, linux-kernel@vger.kernel.org, Mark Lord, Ulrich Drepper, Linus Torvalds, Steven Munroe Benjamin Herrenschmidt wrote: > err = syscall(SYS_fallocate, fd, mode, offset, len); > > With "offset" being a 64-bit argument. > > This will break because the first argument to syscall now shifts > everything by one register, which breaks the register pair alignment > (and I suppose archs with stack based calling convention can have > similar alignment issues even if x86 doesn't). > > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as > it's first argument to correct that ? Either that or making it some kind > of macro wrapper around a __syscall(int dummy, int sysno, ...) ? > > As it is, any 32-bit app using syscall() on any of the syscalls that > takes 64-bit arguments will be broken, unless the app itself breaks up > the argument, but the the order of the hi and lo part is different > between BE and LE architectures ;-) > > So is there a more "correct" solution than another here ? Should powerpc > glibc be fixed at least so that syscall() keeps the alignment ? There are several problems with syscall(), not just this - because a number of system calls in section 2 of the manual don't map directly to kernel syscalls with the same function prototype. Even fork() has become something complicated in Glibc that doesn't use the fork syscall :-( So anything using syscall() has to be careful on Linux already. Changing the 64-bit alignment won't fix the other differences. -- Jamie ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall() 2010-03-15 15:04 ` Jamie Lokier @ 2010-03-15 20:33 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 34+ messages in thread From: Benjamin Herrenschmidt @ 2010-03-15 20:33 UTC (permalink / raw) To: Jamie Lokier Cc: linux-arch, linux-kernel@vger.kernel.org, Mark Lord, Ulrich Drepper, Linus Torvalds, Steven Munroe On Mon, 2010-03-15 at 15:04 +0000, Jamie Lokier wrote: > There are several problems with syscall(), not just this - because a > number of system calls in section 2 of the manual don't map directly > to kernel syscalls with the same function prototype. > > Even fork() has become something complicated in Glibc that doesn't use > the fork syscall :-( > > So anything using syscall() has to be careful on Linux already. > Changing the 64-bit alignment won't fix the other differences. It won't fix -all- the problems with syscall(), but it will fix a wagon of them without breaking existing code that already does the arch specific breakup on the call site... Cheers, Ben. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2010-03-19 1:23 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt 2010-03-15 5:06 ` David Miller 2010-03-15 5:18 ` Benjamin Herrenschmidt 2010-03-15 5:54 ` David Miller 2010-03-15 20:22 ` Benjamin Herrenschmidt 2010-03-15 13:44 ` Ralf Baechle 2010-03-15 15:13 ` H. Peter Anvin 2010-03-15 16:00 ` Ulrich Drepper 2010-03-15 19:00 ` David Miller 2010-03-15 19:41 ` H. Peter Anvin 2010-03-15 20:35 ` Benjamin Herrenschmidt 2010-03-15 20:41 ` H. Peter Anvin 2010-03-16 21:56 ` Steven Munroe 2010-03-17 0:31 ` Benjamin Herrenschmidt 2010-03-17 5:52 ` Ulrich Drepper 2010-03-17 8:56 ` Benjamin Herrenschmidt 2010-03-17 9:14 ` Ulrich Drepper 2010-03-17 10:13 ` Benjamin Herrenschmidt 2010-03-17 9:18 ` Jamie Lokier 2010-03-17 10:18 ` Benjamin Herrenschmidt 2010-03-17 18:30 ` H. Peter Anvin 2010-03-17 20:35 ` Benjamin Herrenschmidt 2010-03-17 20:53 ` H. Peter Anvin 2010-03-17 22:58 ` Benjamin Herrenschmidt 2010-03-18 16:08 ` Steven Munroe 2010-03-18 16:21 ` Andreas Schwab 2010-03-18 17:03 ` Steven Munroe 2010-03-18 21:18 ` Benjamin Herrenschmidt 2010-03-19 1:22 ` Jamie Lokier 2010-03-15 20:27 ` Benjamin Herrenschmidt 2010-03-15 15:03 ` Steven Munroe 2010-03-15 20:32 ` Benjamin Herrenschmidt 2010-03-15 15:04 ` Jamie Lokier 2010-03-15 20:33 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox