* 64-syscall args on 32-bit vs syscall()
2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt
@ 2010-03-15 4:48 ` Benjamin Herrenschmidt
2010-03-15 5:06 ` David Miller
` (2 subsequent siblings)
3 siblings, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 4:48 UTC (permalink / raw)
To: linux-arch
Cc: linux-kernel@vger.kernel.org, Mark Lord, Ulrich Drepper,
Linus Torvalds, Steven Munroe
Hoy there !
This may have been discussed earlier (I have some vague memories...) but
I just hit a problem with that again (Mark: hint, it's in hdparm's
fallocate) so I'd like a bit of a refresh here on what is the "right
thing" to do...
So some syscalls want a 64-bit argument. Let's take fallocate() as our
example. So we already know that we have to be extra careful since some
32-bit arch will pass this into 2 registers (or stack slots) which need
to be aligned, and so we tend to already take care of making sure that
the said 64-bit argument is either defined as 2x32-bit arguments, or
defined as 1x64 bit argument aligned to 2x32-bit in the argument list.
So far so good...
The problem is when user space tries to use the same trick for calling
those functions using glibc-provided syscall() function. In this
example, hdparm does:
err = syscall(SYS_fallocate, fd, mode, offset, len);
With "offset" being a 64-bit argument.
This will break because the first argument to syscall now shifts
everything by one register, which breaks the register pair alignment
(and I suppose archs with stack based calling convention can have
similar alignment issues even if x86 doesn't).
Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
it's first argument to correct that ? Either that or making it some kind
of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
As it is, any 32-bit app using syscall() on any of the syscalls that
takes 64-bit arguments will be broken, unless the app itself breaks up
the argument, but the the order of the hi and lo part is different
between BE and LE architectures ;-)
So is there a more "correct" solution than another here ? Should powerpc
glibc be fixed at least so that syscall() keeps the alignment ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt
2010-03-15 4:48 ` Benjamin Herrenschmidt
@ 2010-03-15 5:06 ` David Miller
2010-03-15 5:18 ` Benjamin Herrenschmidt
2010-03-15 15:03 ` Steven Munroe
2010-03-15 15:04 ` Jamie Lokier
3 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2010-03-15 5:06 UTC (permalink / raw)
To: benh; +Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Mon, 15 Mar 2010 15:48:13 +1100
> As it is, any 32-bit app using syscall() on any of the syscalls that
> takes 64-bit arguments will be broken, unless the app itself breaks up
> the argument, but the the order of the hi and lo part is different
> between BE and LE architectures ;-)
I think it is even different on the same endian architectures,
f.e. mips I think.
There is no way to do this without some arch specific code
to handle things properly, really.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 5:06 ` David Miller
@ 2010-03-15 5:18 ` Benjamin Herrenschmidt
2010-03-15 5:54 ` David Miller
2010-03-15 13:44 ` Ralf Baechle
0 siblings, 2 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 5:18 UTC (permalink / raw)
To: David Miller
Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj
On Sun, 2010-03-14 at 22:06 -0700, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Mon, 15 Mar 2010 15:48:13 +1100
>
> > As it is, any 32-bit app using syscall() on any of the syscalls that
> > takes 64-bit arguments will be broken, unless the app itself breaks up
> > the argument, but the the order of the hi and lo part is different
> > between BE and LE architectures ;-)
>
> I think it is even different on the same endian architectures,
> f.e. mips I think.
>
> There is no way to do this without some arch specific code
> to handle things properly, really.
Right, but to what extent ? IE. do we always need the callers using
syscall() directly to know it all, or can we to some extent handle some
of it inside glibc ?
For example, if powerpc glibc is fixed so that syscall() takes a 64-bit
first argument (or calls via some macro to add a dummy 32-bit argument),
the register alignment will be preserved, and things will work just
fine.
IE. It may not fix all problems with all archs, but in this case, it
will fix the common cases for powerpc at least :-) And any other arch
that has the exact same alignment problem.
Or is there any good reason -not- to do that in glibc ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 5:18 ` Benjamin Herrenschmidt
@ 2010-03-15 5:54 ` David Miller
2010-03-15 20:22 ` Benjamin Herrenschmidt
2010-03-15 13:44 ` Ralf Baechle
1 sibling, 1 reply; 38+ messages in thread
From: David Miller @ 2010-03-15 5:54 UTC (permalink / raw)
To: benh; +Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Mon, 15 Mar 2010 16:18:33 +1100
> Or is there any good reason -not- to do that in glibc ?
The whole point of syscall() is to handle cases where the C library
doesn't know about the system call yet.
I think it's therefore very much "buyer beware".
On sparc it'll never work to use the workaround you're proposing since
we pass everything in via registers.
So arch knowledge will always need to be present in these situations.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 5:54 ` David Miller
@ 2010-03-15 20:22 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 20:22 UTC (permalink / raw)
To: David Miller
Cc: linux-arch, linux-kernel, kernel, drepper, torvalds, munroesj
On Sun, 2010-03-14 at 22:54 -0700, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Mon, 15 Mar 2010 16:18:33 +1100
>
> > Or is there any good reason -not- to do that in glibc ?
>
> The whole point of syscall() is to handle cases where the C library
> doesn't know about the system call yet.
>
> I think it's therefore very much "buyer beware".
>
> On sparc it'll never work to use the workaround you're proposing since
> we pass everything in via registers.
>
> So arch knowledge will always need to be present in these situations.
I'm not sure I follow. We also pass via register on powerpc, but the
offset introduced by the sysno argument breaks register pair alignment
which cannot be fixed up inside syscall().
However, if I change glibc's syscall to be something like
#define syscall(sysno, args...) __syscall(0 /* dummy */, sysno, args)
And make __syscall then do something like:
mr r0, r4
mr r3, r5
mr r4, r6
mr r5, r7
mr r6, r8
.../...
sc
blr
Then at least all that class of syscalls will be fixed. Of course this
has to be in glibc arch code. I was merely asking if that was something
our glibc folks would consider and whether somebody could think of a
better solution :-)
Cheers
,Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 5:18 ` Benjamin Herrenschmidt
2010-03-15 5:54 ` David Miller
@ 2010-03-15 13:44 ` Ralf Baechle
2010-03-15 15:13 ` H. Peter Anvin
2010-03-15 20:27 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 38+ messages in thread
From: Ralf Baechle @ 2010-03-15 13:44 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: David Miller, linux-arch, linux-kernel, kernel, drepper, torvalds,
munroesj
On Mon, Mar 15, 2010 at 04:18:33PM +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2010-03-14 at 22:06 -0700, David Miller wrote:
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Date: Mon, 15 Mar 2010 15:48:13 +1100
> >
> > > As it is, any 32-bit app using syscall() on any of the syscalls that
> > > takes 64-bit arguments will be broken, unless the app itself breaks up
> > > the argument, but the the order of the hi and lo part is different
> > > between BE and LE architectures ;-)
> >
> > I think it is even different on the same endian architectures,
> > f.e. mips I think.
MIPS passes arguments in the endian order that is low/high for little
endian rsp high/low for big endian.
> > There is no way to do this without some arch specific code
> > to handle things properly, really.
>
> Right, but to what extent ? IE. do we always need the callers using
> syscall() directly to know it all, or can we to some extent handle some
> of it inside glibc ?
>
> For example, if powerpc glibc is fixed so that syscall() takes a 64-bit
> first argument (or calls via some macro to add a dummy 32-bit argument),
> the register alignment will be preserved, and things will work just
> fine.
>
> IE. It may not fix all problems with all archs, but in this case, it
> will fix the common cases for powerpc at least :-) And any other arch
> that has the exact same alignment problem.
>
> Or is there any good reason -not- to do that in glibc ?
Syscall is most often used for new syscalls that have no syscall stub in
glibc yet, so the user of syscall() encodes this ABI knowledge. If at a
later stage syscall() is changed to have this sort of knowledge we break
the API. This is something only the kernel can get right.
Ralf
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 13:44 ` Ralf Baechle
@ 2010-03-15 15:13 ` H. Peter Anvin
2010-03-15 16:00 ` Ulrich Drepper
2010-03-15 20:27 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2010-03-15 15:13 UTC (permalink / raw)
To: Ralf Baechle
Cc: Benjamin Herrenschmidt, David Miller, linux-arch, linux-kernel,
kernel, drepper, torvalds, munroesj
On 03/15/2010 06:44 AM, Ralf Baechle wrote:
>
> Syscall is most often used for new syscalls that have no syscall stub in
> glibc yet, so the user of syscall() encodes this ABI knowledge. If at a
> later stage syscall() is changed to have this sort of knowledge we break
> the API. This is something only the kernel can get right.
>
One option would be to do a libkernel.so, with auto-generated stubs out
of the kernel build tree. As already discussed in #kernel this morning,
there are a number of sticky points with types and namespaces for this
this, but those aren't any worse than the equivalent problems for
syscall(3).
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 15:13 ` H. Peter Anvin
@ 2010-03-15 16:00 ` Ulrich Drepper
2010-03-15 19:00 ` David Miller
0 siblings, 1 reply; 38+ messages in thread
From: Ulrich Drepper @ 2010-03-15 16:00 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ralf Baechle, Benjamin Herrenschmidt, David Miller, linux-arch,
linux-kernel, kernel, torvalds, munroesj
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/15/2010 08:13 AM, H. Peter Anvin wrote:
> One option would be to do a libkernel.so,
No need. Put it in the vdso. And name it something other than syscall.
The syscall() API is fixed, you cannot change it.
All this only if it makes sense for ALL archs. If it cannot work for
just one arch then it's not worth it at all.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
iEYEARECAAYFAkueWbcACgkQ2ijCOnn/RHRtBQCeP88S/0xei7CAt65AGboqsrC8
N7wAoK7Qbi+OZuQrgHTCgTA27TgY+gQU
=4tJ6
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 16:00 ` Ulrich Drepper
@ 2010-03-15 19:00 ` David Miller
2010-03-15 19:41 ` H. Peter Anvin
0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2010-03-15 19:00 UTC (permalink / raw)
To: drepper
Cc: hpa, ralf, benh, linux-arch, linux-kernel, kernel, torvalds,
munroesj
From: Ulrich Drepper <drepper@redhat.com>
Date: Mon, 15 Mar 2010 09:00:55 -0700
> On 03/15/2010 08:13 AM, H. Peter Anvin wrote:
>> One option would be to do a libkernel.so,
>
> No need. Put it in the vdso. And name it something other than syscall.
> The syscall() API is fixed, you cannot change it.
>
> All this only if it makes sense for ALL archs. If it cannot work for
> just one arch then it's not worth it at all.
There are many archs that still lack VDSO.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 19:00 ` David Miller
@ 2010-03-15 19:41 ` H. Peter Anvin
2010-03-15 20:35 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2010-03-15 19:41 UTC (permalink / raw)
To: David Miller
Cc: drepper, ralf, benh, linux-arch, linux-kernel, kernel, torvalds,
munroesj
On 03/15/2010 12:00 PM, David Miller wrote:
> From: Ulrich Drepper <drepper@redhat.com>
> Date: Mon, 15 Mar 2010 09:00:55 -0700
>
>> On 03/15/2010 08:13 AM, H. Peter Anvin wrote:
>>> One option would be to do a libkernel.so,
>>
>> No need. Put it in the vdso. And name it something other than syscall.
>> The syscall() API is fixed, you cannot change it.
>>
>> All this only if it makes sense for ALL archs. If it cannot work for
>> just one arch then it's not worth it at all.
>
> There are many archs that still lack VDSO.
Putting it into the vdso is also rather annoyingly heavyweight for what
is nothing other than an ordinary shared library. Just making it an
ordinary shared library seems a lot saner.
I don't see why syscall() can't change the type for its first argument
-- it seems to be exactly what symbol versioning is for.
Doesn't change the fact that it is fundamentally broken, of course.
-hpa
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 19:41 ` H. Peter Anvin
@ 2010-03-15 20:35 ` Benjamin Herrenschmidt
2010-03-15 20:41 ` H. Peter Anvin
2010-03-16 21:56 ` Steven Munroe
0 siblings, 2 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 20:35 UTC (permalink / raw)
To: H. Peter Anvin
Cc: David Miller, drepper, ralf, linux-arch, linux-kernel, kernel,
torvalds, munroesj
On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote:
> I don't see why syscall() can't change the type for its first argument
> -- it seems to be exactly what symbol versioning is for.
>
> Doesn't change the fact that it is fundamentally broken, of course.
No need to change the type of the first arg and go for symbol
versionning if you do something like I proposed earlier, there will be
no conflict between syscall() and __syscall() and both variants can
exist.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 20:35 ` Benjamin Herrenschmidt
@ 2010-03-15 20:41 ` H. Peter Anvin
2010-03-16 21:56 ` Steven Munroe
1 sibling, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2010-03-15 20:41 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: David Miller, drepper, ralf, linux-arch, linux-kernel, kernel,
torvalds, munroesj
On 03/15/2010 01:35 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote:
>> I don't see why syscall() can't change the type for its first argument
>> -- it seems to be exactly what symbol versioning is for.
>>
>> Doesn't change the fact that it is fundamentally broken, of course.
>
> No need to change the type of the first arg and go for symbol
> versionning if you do something like I proposed earlier, there will be
> no conflict between syscall() and __syscall() and both variants can
> exist.
>
Basically symbol versioning done "by hand", actually using symbol
versioning is better, IMNSHO.
-hpa
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 20:35 ` Benjamin Herrenschmidt
2010-03-15 20:41 ` H. Peter Anvin
@ 2010-03-16 21:56 ` Steven Munroe
2010-03-17 0:31 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 38+ messages in thread
From: Steven Munroe @ 2010-03-16 21:56 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: H. Peter Anvin, David Miller, drepper, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Tue, 2010-03-16 at 07:35 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote:
> > I don't see why syscall() can't change the type for its first argument
> > -- it seems to be exactly what symbol versioning is for.
> >
> > Doesn't change the fact that it is fundamentally broken, of course.
>
> No need to change the type of the first arg and go for symbol
> versionning if you do something like I proposed earlier, there will be
> no conflict between syscall() and __syscall() and both variants can
> exist.
>
One concern is the new syscall and the kernel have to match and mixing
will not work. your proposal seems to impact all syscalls not just the
one called via syscall API. These syscalls get generated inline which
makes static linking very dangerous ...
So I think you do need both symbol versioning and kernel feature stubs
(like xstat). That gets to be a lot of work
> Cheers,
> Ben.
>
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-16 21:56 ` Steven Munroe
@ 2010-03-17 0:31 ` Benjamin Herrenschmidt
2010-03-17 0:31 ` Benjamin Herrenschmidt
2010-03-17 5:52 ` Ulrich Drepper
0 siblings, 2 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 0:31 UTC (permalink / raw)
To: munroesj
Cc: H. Peter Anvin, David Miller, drepper, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Tue, 2010-03-16 at 16:56 -0500, Steven Munroe wrote:
> On Tue, 2010-03-16 at 07:35 +1100, Benjamin Herrenschmidt wrote:
> > On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote:
> > > I don't see why syscall() can't change the type for its first argument
> > > -- it seems to be exactly what symbol versioning is for.
> > >
> > > Doesn't change the fact that it is fundamentally broken, of course.
> >
> > No need to change the type of the first arg and go for symbol
> > versionning if you do something like I proposed earlier, there will be
> > no conflict between syscall() and __syscall() and both variants can
> > exist.
> >
> One concern is the new syscall and the kernel have to match and mixing
> will not work. your proposal seems to impact all syscalls not just the
> one called via syscall API. These syscalls get generated inline which
> makes static linking very dangerous ...
>
> So I think you do need both symbol versioning and kernel feature stubs
> (like xstat). That gets to be a lot of work
What do you mean ? My proposal is purely a change to the syscall()
function, nothing else. No kernel change, no ABI change, no change to
the way glibc normally calls syscalls internally, etc... just the
exported syscall() function to shift its arguments in order to avoid
losing register pair alignment.
And the change would still be compatible with existing userland code who
manually splits the 64-bit arguments to avoid the problem on power.
IE. Unless I've missed something, this would be a 100% backward
compatible change that simply make a whole class of syscall() use work
that didn't before on power (but did on x86), such as the one I hit in
hdparm for example.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 0:31 ` Benjamin Herrenschmidt
@ 2010-03-17 0:31 ` Benjamin Herrenschmidt
2010-03-17 5:52 ` Ulrich Drepper
1 sibling, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 0:31 UTC (permalink / raw)
To: munroesj
Cc: H. Peter Anvin, David Miller, drepper, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Tue, 2010-03-16 at 16:56 -0500, Steven Munroe wrote:
> On Tue, 2010-03-16 at 07:35 +1100, Benjamin Herrenschmidt wrote:
> > On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote:
> > > I don't see why syscall() can't change the type for its first argument
> > > -- it seems to be exactly what symbol versioning is for.
> > >
> > > Doesn't change the fact that it is fundamentally broken, of course.
> >
> > No need to change the type of the first arg and go for symbol
> > versionning if you do something like I proposed earlier, there will be
> > no conflict between syscall() and __syscall() and both variants can
> > exist.
> >
> One concern is the new syscall and the kernel have to match and mixing
> will not work. your proposal seems to impact all syscalls not just the
> one called via syscall API. These syscalls get generated inline which
> makes static linking very dangerous ...
>
> So I think you do need both symbol versioning and kernel feature stubs
> (like xstat). That gets to be a lot of work
What do you mean ? My proposal is purely a change to the syscall()
function, nothing else. No kernel change, no ABI change, no change to
the way glibc normally calls syscalls internally, etc... just the
exported syscall() function to shift its arguments in order to avoid
losing register pair alignment.
And the change would still be compatible with existing userland code who
manually splits the 64-bit arguments to avoid the problem on power.
IE. Unless I've missed something, this would be a 100% backward
compatible change that simply make a whole class of syscall() use work
that didn't before on power (but did on x86), such as the one I hit in
hdparm for example.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 0:31 ` Benjamin Herrenschmidt
2010-03-17 0:31 ` Benjamin Herrenschmidt
@ 2010-03-17 5:52 ` Ulrich Drepper
2010-03-17 8:56 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 38+ messages in thread
From: Ulrich Drepper @ 2010-03-17 5:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/16/2010 05:31 PM, Benjamin Herrenschmidt wrote:
> My proposal is purely a change to the syscall()
> function, nothing else. No kernel change, no ABI change, no change to
> the way glibc normally calls syscalls internally, etc...
How can this be? People are today actively working around the problem
of 64-bit arguments. You have to break something since you cannot
recognize these situations. And since it became meanwhile clear that
there is no way to "fix" all archs magically I really don't want to
introduce anything. There are mechanisms in place to abstract out some
of the issues. And for the rest, well, if you're using syscalls
directly you already have to encoded lowlevel knowledge. One more bit
doesn't hurt. It's not as if this happens every day.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
iEYEARECAAYFAkugbhsACgkQ2ijCOnn/RHQzlACeMp0UK2jZuZOgXhJjB8Z9p4kh
rCoAn0zaJqFYV9tQ0Ct49Mprfa0O5iKh
=71la
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 5:52 ` Ulrich Drepper
@ 2010-03-17 8:56 ` Benjamin Herrenschmidt
2010-03-17 9:14 ` Ulrich Drepper
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 8:56 UTC (permalink / raw)
To: Ulrich Drepper
Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Tue, 2010-03-16 at 22:52 -0700, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/16/2010 05:31 PM, Benjamin Herrenschmidt wrote:
> > My proposal is purely a change to the syscall()
> > function, nothing else. No kernel change, no ABI change, no change to
> > the way glibc normally calls syscalls internally, etc...
>
> How can this be? People are today actively working around the problem
> of 64-bit arguments. You have to break something since you cannot
> recognize these situations.
Ok, so I -may- be missing something, but I believe this won't break
anything:
- You keep the existing syscall() exported by glibc for binary
compatibility
- You add a new __syscall() (or whatever you want to name it) that adds
a dummy argument at the beginning, and whose implementation shifts
everything by 2 instead of 1 argument before calling into the kernel
- You define in unistd.h or whatever is relevant, a macro that does:
#define syscall(__sysno, __args..) __syscall(0, _sysno, __args)
I believe that should cover it, at least for powerpc, possibly for other
archs too though as I said, I may have missed something there.
IE. Whether your app writes:
syscall(SYS_foo, my_64bit_arg);
Or
syscall(SYS_foo, (u32)(my_64bit_arg >> 32), (u32)(my_64bit_arg));
Both should still work with the new approach and end up doing the right
thing.
Hence, apps that use the first form today because it works on x86 would
end up working at least on powerpc where they would have been otherwise
broken unless they used some arch specific #ifdef to do the second form.
> And since it became meanwhile clear that
> there is no way to "fix" all archs magically I really don't want to
> introduce anything. There are mechanisms in place to abstract out some
> of the issues. And for the rest, well, if you're using syscalls
> directly you already have to encoded lowlevel knowledge. One more bit
> doesn't hurt. It's not as if this happens every day.
It doesn't happen everyday. However, if my proposal ends up fixing a
bunch of cases where it does without breaking anything, then I suppose
it's worth considering, though as I said, it's possible that I miss some
subtlety here in which case I'd be glad to stand corrected :-)
Cheers,
Ben.
> - --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkugbhsACgkQ2ijCOnn/RHQzlACeMp0UK2jZuZOgXhJjB8Z9p4kh
> rCoAn0zaJqFYV9tQ0Ct49Mprfa0O5iKh
> =71la
> -----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 8:56 ` Benjamin Herrenschmidt
@ 2010-03-17 9:14 ` Ulrich Drepper
2010-03-17 10:13 ` Benjamin Herrenschmidt
2010-03-17 9:18 ` Jamie Lokier
2010-03-17 18:30 ` H. Peter Anvin
2 siblings, 1 reply; 38+ messages in thread
From: Ulrich Drepper @ 2010-03-17 9:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/17/2010 01:56 AM, Benjamin Herrenschmidt wrote:
>> - You keep the existing syscall() exported by glibc for binary
>> compatibility
>
>> - You add a new __syscall() (or whatever you want to name it) that adds
>> a dummy argument at the beginning, and whose implementation shifts
>> everything by 2 instead of 1 argument before calling into the kernel
>
>> - You define in unistd.h or whatever is relevant, a macro that does:
>
>> #define syscall(__sysno, __args..) __syscall(0, _sysno, __args)
>
>> I believe that should cover it, at least for powerpc, possibly for other
>> archs too though as I said, I may have missed something there.
How can this possibly be the case? This will screw people who currently
work around the ppc limitations of the existing syscall.
Just leave it alone.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
iEYEARECAAYFAkugnV0ACgkQ2ijCOnn/RHRL4gCeIY0SLDCgLqtVvuMw+pvCzkwE
3MIAoJQRK5Mc+WtC/Wz9tPFPy4X+EALe
=lexw
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 9:14 ` Ulrich Drepper
@ 2010-03-17 10:13 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 10:13 UTC (permalink / raw)
To: Ulrich Drepper
Cc: munroesj, H. Peter Anvin, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Wed, 2010-03-17 at 02:14 -0700, Ulrich Drepper wrote:
> >> I believe that should cover it, at least for powerpc, possibly for
> other
> >> archs too though as I said, I may have missed something there.
>
> How can this possibly be the case? This will screw people who
> currently
> work around the ppc limitations of the existing syscall.
No it won't. As I said, it will work for both cases. The problem is a
register pair alignment problem. If the alignment is corrected with the
trick I proposed, 64-bit values will end up in the right pair, but
manually worked-around cases where the value is already broken up will
-also- end up in the right pair.
The problem with syscall() as it is is that it skews the arguments by 1
register, which causes the compiler to skip a register when generating
the call for a 64-bit value. By doing the trick I propose, that skew
will be gone, both 32 and 64 bit arguments will end up where expected.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 8:56 ` Benjamin Herrenschmidt
2010-03-17 9:14 ` Ulrich Drepper
@ 2010-03-17 9:18 ` Jamie Lokier
2010-03-17 10:18 ` Benjamin Herrenschmidt
2010-03-17 18:30 ` H. Peter Anvin
2 siblings, 1 reply; 38+ messages in thread
From: Jamie Lokier @ 2010-03-17 9:18 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Ulrich Drepper, munroesj, H. Peter Anvin, David Miller, ralf,
linux-arch, linux-kernel, kernel, torvalds
Benjamin Herrenschmidt wrote:
> Hence, apps that use the first form today because it works on x86 would
> end up working at least on powerpc where they would have been otherwise
> broken unless they used some arch specific #ifdef to do the second form.
I think what Ulrich is getting at is your change will break existing
code which already does:
#ifdef __powerpc__
syscall(SYS_foo, 0, my_64bit_arg);
#else
syscall(SYS_foo, my_64bit_arg);
#endif
I don't know of any such code, but it might be out there.
-- Jamie
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 9:18 ` Jamie Lokier
@ 2010-03-17 10:18 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 10:18 UTC (permalink / raw)
To: Jamie Lokier
Cc: Ulrich Drepper, munroesj, H. Peter Anvin, David Miller, ralf,
linux-arch, linux-kernel, kernel, torvalds
On Wed, 2010-03-17 at 09:18 +0000, Jamie Lokier wrote:
> Benjamin Herrenschmidt wrote:
> > Hence, apps that use the first form today because it works on x86 would
> > end up working at least on powerpc where they would have been otherwise
> > broken unless they used some arch specific #ifdef to do the second form.
>
> I think what Ulrich is getting at is your change will break existing
> code which already does:
>
> #ifdef __powerpc__
> syscall(SYS_foo, 0, my_64bit_arg);
> #else
> syscall(SYS_foo, my_64bit_arg);
> #endif
>
> I don't know of any such code, but it might be out there.
No, the above "workaround" doesn't work. With the existing syscall()
definition, there is no difference between your two examples. In the
first case, you force a proper 64-bit aligment, but you are already off
by one register pair from the kernel expectation. In the second case,
gcc will imply one, which means that both your examples above will
result in my_64bit_arg in the -same- place, which is off by a register
pair from what the kernel expect.
IE. In the first case gcc will put SYS_foo in r3, 0 in r4, and
my_64bit_arg in r5 and r6. In the second case, gcc will put SYS_foo in
r3, won't care about r4, and will put the 64-bit arg in r5 and r6. Then,
glibc syscall() will shift r3 to r0, r3 to r4 etc... causing
my_64bit_arg to land in r4 and r5. But the kernel expects it in r3 and
r4.
The workaround that apps should use today is:
#if defined(__powerpc__) && WORDSIZE == 32
syscall(SYS_foo, (u32)(my_64bit_arg >> 32), (u32)my_64bit_arg);
#else
syscall(SYS_foo, my_64bit_arg);
#endif
And with my proposed change, both of the above will work. IE. gcc will
put the argument always in r5,r6 and the syscall() implementation will
always shift r5 to r3 and t6 to r4.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 8:56 ` Benjamin Herrenschmidt
2010-03-17 9:14 ` Ulrich Drepper
2010-03-17 9:18 ` Jamie Lokier
@ 2010-03-17 18:30 ` H. Peter Anvin
2010-03-17 20:35 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2010-03-17 18:30 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On 03/17/2010 01:56 AM, Benjamin Herrenschmidt wrote:
>
> Ok, so I -may- be missing something, but I believe this won't break
> anything:
>
> - You keep the existing syscall() exported by glibc for binary
> compatibility
>
> - You add a new __syscall() (or whatever you want to name it) that adds
> a dummy argument at the beginning, and whose implementation shifts
> everything by 2 instead of 1 argument before calling into the kernel
>
> - You define in unistd.h or whatever is relevant, a macro that does:
>
> #define syscall(__sysno, __args..) __syscall(0, _sysno, __args)
>
Again, this is *exactly* symbol versioning done by hand... we have
proper symbol versioning, let's use it.
-hpa
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 18:30 ` H. Peter Anvin
@ 2010-03-17 20:35 ` Benjamin Herrenschmidt
2010-03-17 20:53 ` H. Peter Anvin
0 siblings, 1 reply; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 20:35 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Wed, 2010-03-17 at 11:30 -0700, H. Peter Anvin wrote:
> Again, this is *exactly* symbol versioning done by hand... we have
> proper symbol versioning, let's use it.
Yeah, whatever, I don't mind what technique you use for the versionning,
ultimately, if the approach works, we can look at those details :-) We
-do- need the macro to strip the dummy argument though, unless we use
a slightly different technique which is to make the __sysno argument
itself 64-bit, which works as well I believe.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 20:35 ` Benjamin Herrenschmidt
@ 2010-03-17 20:53 ` H. Peter Anvin
2010-03-17 22:58 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2010-03-17 20:53 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On 03/17/2010 01:35 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2010-03-17 at 11:30 -0700, H. Peter Anvin wrote:
>> Again, this is *exactly* symbol versioning done by hand... we have
>> proper symbol versioning, let's use it.
>
> Yeah, whatever, I don't mind what technique you use for the versionning,
> ultimately, if the approach works, we can look at those details :-) We
> -do- need the macro to strip the dummy argument though, unless we use
> a slightly different technique which is to make the __sysno argument
> itself 64-bit, which works as well I believe.
>
It seems cleaner to do it that way (with a 64-bit sysno arg.)
-hpa
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 20:53 ` H. Peter Anvin
@ 2010-03-17 22:58 ` Benjamin Herrenschmidt
2010-03-17 22:58 ` Benjamin Herrenschmidt
2010-03-18 16:08 ` Steven Munroe
0 siblings, 2 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 22:58 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Wed, 2010-03-17 at 13:53 -0700, H. Peter Anvin wrote:
> > Yeah, whatever, I don't mind what technique you use for the
> versionning,
> > ultimately, if the approach works, we can look at those details :-)
> We
> > -do- need the macro to strip the dummy argument though, unless we
> use
> > a slightly different technique which is to make the __sysno argument
> > itself 64-bit, which works as well I believe.
> >
>
> It seems cleaner to do it that way (with a 64-bit sysno arg.)
Right. Now if we can get Ulrich to actually put 2 and 2 together and
admit that it actually works without breaking anything existing (at
least for my arch but I wouldn't be surprised if that was the case for
others), I would be even happier :-)
Steve, any chance you can cook up a glibc patch to test with ? Maybe
making it powerpc specific for now ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 22:58 ` Benjamin Herrenschmidt
@ 2010-03-17 22:58 ` Benjamin Herrenschmidt
2010-03-18 16:08 ` Steven Munroe
1 sibling, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-17 22:58 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ulrich Drepper, munroesj, David Miller, ralf, linux-arch,
linux-kernel, kernel, torvalds
On Wed, 2010-03-17 at 13:53 -0700, H. Peter Anvin wrote:
> > Yeah, whatever, I don't mind what technique you use for the
> versionning,
> > ultimately, if the approach works, we can look at those details :-)
> We
> > -do- need the macro to strip the dummy argument though, unless we
> use
> > a slightly different technique which is to make the __sysno argument
> > itself 64-bit, which works as well I believe.
> >
>
> It seems cleaner to do it that way (with a 64-bit sysno arg.)
Right. Now if we can get Ulrich to actually put 2 and 2 together and
admit that it actually works without breaking anything existing (at
least for my arch but I wouldn't be surprised if that was the case for
others), I would be even happier :-)
Steve, any chance you can cook up a glibc patch to test with ? Maybe
making it powerpc specific for now ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-17 22:58 ` Benjamin Herrenschmidt
2010-03-17 22:58 ` Benjamin Herrenschmidt
@ 2010-03-18 16:08 ` Steven Munroe
2010-03-18 16:21 ` Andreas Schwab
1 sibling, 1 reply; 38+ messages in thread
From: Steven Munroe @ 2010-03-18 16:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: H. Peter Anvin, Ulrich Drepper, munroesj, David Miller, ralf,
linux-arch, linux-kernel, kernel, torvalds
On Thu, 2010-03-18 at 09:58 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2010-03-17 at 13:53 -0700, H. Peter Anvin wrote:
> > > Yeah, whatever, I don't mind what technique you use for the
> > versionning,
> > > ultimately, if the approach works, we can look at those details :-)
> > We
> > > -do- need the macro to strip the dummy argument though, unless we
> > use
> > > a slightly different technique which is to make the __sysno argument
> > > itself 64-bit, which works as well I believe.
> > >
> >
> > It seems cleaner to do it that way (with a 64-bit sysno arg.)
>
> Right. Now if we can get Ulrich to actually put 2 and 2 together and
> admit that it actually works without breaking anything existing (at
> least for my arch but I wouldn't be surprised if that was the case for
> others), I would be even happier :-)
>
> Steve, any chance you can cook up a glibc patch to test with ? Maybe
> making it powerpc specific for now ?
>
Do what declare __sysno as long long? The current prototype is in
unistd.h:
#ifdef __USE_MISC
/* Invoke `system call' number SYSNO, passing it the remaining
arguments.
This is completely system-dependent, and not often useful.
In Unix, `syscall' sets `errno' for all errors and most calls return
-1
for errors; in many systems you cannot pass arguments or get return
values for all system calls (`pipe', `fork', and `getppid' typically
among them).
In Mach, all system calls take normal arguments and always return an
error code (zero for success). */
extern long int syscall (long int __sysno, ...) __THROW;
#endif /* Use misc. */
Changing this would be an ABI change and would have to be versioned. It
would effect any one using syscall not just SYS_fallocate.
the question is do programmers in practice include unistd.h when they
use syscall.
If the changed prototype is not in scope then the 1st parm (__sysno)
defaults to int and is passed in on r3 which gets moved to r0.
If the changed syscall prototype is in scope then then _sysno would be
passed in r3/r4 (r3 would be 0 would be passed to r0 and the actual
system number would be in r4 and passed to the kernel in r3)
which behavior do you want? which (incorrect behavior compiled into
existing codes do you want to support?
Do you want syscall.S for PPC32 to change to match the changed
prototype? It will have to be be versioned and the new prototype will
only be available in future releases of GLIBC. Existing applications
will bind to the old ABI and get the old behavior.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-18 16:08 ` Steven Munroe
@ 2010-03-18 16:21 ` Andreas Schwab
2010-03-18 16:21 ` Andreas Schwab
2010-03-18 17:03 ` Steven Munroe
0 siblings, 2 replies; 38+ messages in thread
From: Andreas Schwab @ 2010-03-18 16:21 UTC (permalink / raw)
To: munroesj
Cc: Benjamin Herrenschmidt, H. Peter Anvin, Ulrich Drepper,
David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds
Steven Munroe <munroesj@linux.vnet.ibm.com> writes:
> extern long int syscall (long int __sysno, ...) __THROW;
>
> #endif /* Use misc. */
>
> Changing this would be an ABI change and would have to be versioned. It
> would effect any one using syscall not just SYS_fallocate.
>
> the question is do programmers in practice include unistd.h when they
> use syscall.
>
> If the changed prototype is not in scope then the 1st parm (__sysno)
> defaults to int and is passed in on r3 which gets moved to r0.
int is incompatible with long, so you already get undefined behaviour
anyway.
Andreas.
--
Andreas Schwab, schwab@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E
"And now for something completely different."
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-18 16:21 ` Andreas Schwab
@ 2010-03-18 16:21 ` Andreas Schwab
2010-03-18 17:03 ` Steven Munroe
1 sibling, 0 replies; 38+ messages in thread
From: Andreas Schwab @ 2010-03-18 16:21 UTC (permalink / raw)
To: munroesj
Cc: Benjamin Herrenschmidt, H. Peter Anvin, Ulrich Drepper,
David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds
Steven Munroe <munroesj@linux.vnet.ibm.com> writes:
> extern long int syscall (long int __sysno, ...) __THROW;
>
> #endif /* Use misc. */
>
> Changing this would be an ABI change and would have to be versioned. It
> would effect any one using syscall not just SYS_fallocate.
>
> the question is do programmers in practice include unistd.h when they
> use syscall.
>
> If the changed prototype is not in scope then the 1st parm (__sysno)
> defaults to int and is passed in on r3 which gets moved to r0.
int is incompatible with long, so you already get undefined behaviour
anyway.
Andreas.
--
Andreas Schwab, schwab@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E
"And now for something completely different."
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-18 16:21 ` Andreas Schwab
2010-03-18 16:21 ` Andreas Schwab
@ 2010-03-18 17:03 ` Steven Munroe
2010-03-18 21:18 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 38+ messages in thread
From: Steven Munroe @ 2010-03-18 17:03 UTC (permalink / raw)
To: Andreas Schwab
Cc: munroesj, Benjamin Herrenschmidt, H. Peter Anvin, Ulrich Drepper,
David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds
On Thu, 2010-03-18 at 17:21 +0100, Andreas Schwab wrote:
> Steven Munroe <munroesj@linux.vnet.ibm.com> writes:
>
> > extern long int syscall (long int __sysno, ...) __THROW;
> >
> > #endif /* Use misc. */
> >
> > Changing this would be an ABI change and would have to be versioned. It
> > would effect any one using syscall not just SYS_fallocate.
> >
> > the question is do programmers in practice include unistd.h when they
> > use syscall.
> >
> > If the changed prototype is not in scope then the 1st parm (__sysno)
> > defaults to int and is passed in on r3 which gets moved to r0.
>
> int is incompatible with long, so you already get undefined behaviour
> anyway.
>
Sorry in and long are compatible in 32-bit but not long long.
int and long are not compatible in 64-bit
It is hard the keep all the nodes and arguments straight.
But the concern about changing the prototype and are people actually
using the prototype are still valid.
> Andreas.
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-18 17:03 ` Steven Munroe
@ 2010-03-18 21:18 ` Benjamin Herrenschmidt
2010-03-19 1:22 ` Jamie Lokier
0 siblings, 1 reply; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-18 21:18 UTC (permalink / raw)
To: munroesj
Cc: Andreas Schwab, H. Peter Anvin, Ulrich Drepper, David Miller,
ralf, linux-arch, linux-kernel, kernel, torvalds
On Thu, 2010-03-18 at 12:03 -0500, Steven Munroe wrote:
> Sorry in and long are compatible in 32-bit but not long long.
>
> int and long are not compatible in 64-bit
>
> It is hard the keep all the nodes and arguments straight.
>
> But the concern about changing the prototype and are people actually
> using the prototype are still valid.
Well, using the macro trick instead would fix that problem, code
wouldn't build if it doesn't include unistd.h :-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-18 21:18 ` Benjamin Herrenschmidt
@ 2010-03-19 1:22 ` Jamie Lokier
0 siblings, 0 replies; 38+ messages in thread
From: Jamie Lokier @ 2010-03-19 1:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: munroesj, Andreas Schwab, H. Peter Anvin, Ulrich Drepper,
David Miller, ralf, linux-arch, linux-kernel, kernel, torvalds
Benjamin Herrenschmidt wrote:
> On Thu, 2010-03-18 at 12:03 -0500, Steven Munroe wrote:
> > Sorry in and long are compatible in 32-bit but not long long.
> >
> > int and long are not compatible in 64-bit
> >
> > It is hard the keep all the nodes and arguments straight.
> >
> > But the concern about changing the prototype and are people actually
> > using the prototype are still valid.
>
> Well, using the macro trick instead would fix that problem, code
> wouldn't build if it doesn't include unistd.h :-)
Or it will build, but call the old ABI version - no change to those programs.
-- Jamie
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 13:44 ` Ralf Baechle
2010-03-15 15:13 ` H. Peter Anvin
@ 2010-03-15 20:27 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 20:27 UTC (permalink / raw)
To: Ralf Baechle
Cc: David Miller, linux-arch, linux-kernel, kernel, drepper, torvalds,
munroesj
On Mon, 2010-03-15 at 14:44 +0100, Ralf Baechle wrote:
> Syscall is most often used for new syscalls that have no syscall stub in
> glibc yet, so the user of syscall() encodes this ABI knowledge. If at a
> later stage syscall() is changed to have this sort of knowledge we break
> the API. This is something only the kernel can get right.
Well, no. The change I propose would not break the ABI on powerpc and
would auto-magically fix thoses cases :-) But again, you don't have to
do the same thing on MIPS or sparc, it's definitely arch specific.
IE. What you are saying is that a syscall defined in the kernel as:
sys_foo(u64 arg);
To be called from userspace would require something like:
u64 arg = 0x123456789abcdef01;
#if defined(__powerpc__) && WORDSIZE == 32
syscall(SYS_foo, (u32)(arg >> 32), (u32)arg);
#ese
syscall(SYS_foo, arg);
While with the trick of making syscall a macro wrapping an underlying
__syscall that has an added dummy argument, the register alignment is
"corrected" and thus -both- forms above suddenly work for me. That might
actually work for you too.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt
2010-03-15 4:48 ` Benjamin Herrenschmidt
2010-03-15 5:06 ` David Miller
@ 2010-03-15 15:03 ` Steven Munroe
2010-03-15 20:32 ` Benjamin Herrenschmidt
2010-03-15 15:04 ` Jamie Lokier
3 siblings, 1 reply; 38+ messages in thread
From: Steven Munroe @ 2010-03-15 15:03 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Ryan S. Arnold
Cc: linux-arch, linux-kernel@vger.kernel.org, Mark Lord,
Ulrich Drepper, Linus Torvalds
On Mon, 2010-03-15 at 15:48 +1100, Benjamin Herrenschmidt wrote:
> Hoy there !
>
> This may have been discussed earlier (I have some vague memories...) but
> I just hit a problem with that again (Mark: hint, it's in hdparm's
> fallocate) so I'd like a bit of a refresh here on what is the "right
> thing" to do...
>
> So some syscalls want a 64-bit argument. Let's take fallocate() as our
> example. So we already know that we have to be extra careful since some
> 32-bit arch will pass this into 2 registers (or stack slots) which need
> to be aligned, and so we tend to already take care of making sure that
> the said 64-bit argument is either defined as 2x32-bit arguments, or
> defined as 1x64 bit argument aligned to 2x32-bit in the argument list.
>
> So far so good...
>
> The problem is when user space tries to use the same trick for calling
> those functions using glibc-provided syscall() function. In this
> example, hdparm does:
>
> err = syscall(SYS_fallocate, fd, mode, offset, len);
>
> With "offset" being a 64-bit argument.
>
The powerpc implementation of syscall is:
ENTRY (syscall)
mr r0,r3
mr r3,r4
mr r4,r5
mr r5,r6
mr r6,r7
mr r7,r8
mr r8,r9
sc
PSEUDO_RET
PSEUDO_END (syscall)
The ABI says:
"Long long arguments are considered to have 8-byte size and alignment.
The same 8-byte arguments that must go in aligned pairs or registers are
8-byte aligned on the stack."
This implies that the SYS_fallocate call will skip a register to get the
required alignment in the parameter save area.
for ppc32 on entry
r3 == SYS_fallocate
r4 == fd
r5 == mode
r6 == not used
r7, r8 == offset
r9 == len
This gets shifted to:
r0 == SYS_fallocate
r3 == fd
r4 == mode
r5 == not used
r6, r7 == offset
r8 == len
For syscall the vararg parms will be mirrored to the parameter save area
but will not be used. The ABI does not talk to LE for this case.
Ryan does the new ABI doc cover this?
> This will break because the first argument to syscall now shifts
> everything by one register, which breaks the register pair alignment
> (and I suppose archs with stack based calling convention can have
> similar alignment issues even if x86 doesn't).
>
> Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
> it's first argument to correct that ? Either that or making it some kind
> of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
>
> As it is, any 32-bit app using syscall() on any of the syscalls that
> takes 64-bit arguments will be broken, unless the app itself breaks up
> the argument, but the the order of the hi and lo part is different
> between BE and LE architectures ;-)
>
> So is there a more "correct" solution than another here ? Should powerpc
> glibc be fixed at least so that syscall() keeps the alignment ?
>
> Cheers,
> Ben.
>
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 15:03 ` Steven Munroe
@ 2010-03-15 20:32 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 20:32 UTC (permalink / raw)
To: munroesj
Cc: Ryan S. Arnold, linux-arch, linux-kernel@vger.kernel.org,
Mark Lord, Ulrich Drepper, Linus Torvalds
> The powerpc implementation of syscall is:
>
>
> ENTRY (syscall)
> mr r0,r3
> mr r3,r4
> mr r4,r5
> mr r5,r6
> mr r6,r7
> mr r7,r8
> mr r8,r9
> sc
> PSEUDO_RET
> PSEUDO_END (syscall)
And my proposal is to make it instead:
#define syscall(__sysno, __args...) __syscall(0,__sysno,__args)
ENTRY (__syscall)
mr r0,r4
mr r3,r5
mr r4,r6
mr r5,r7
mr r6,r8
mr r7,r9
mr r8,r10
sc
PSEUDO_RET
PSEUDO_END (__syscall)
> The ABI says:
>
> "Long long arguments are considered to have 8-byte size and alignment.
> The same 8-byte arguments that must go in aligned pairs or registers are
> 8-byte aligned on the stack."
Right, that's what I'm explaining too.
> This implies that the SYS_fallocate call will skip a register to get the
> required alignment in the parameter save area.
>
> for ppc32 on entry
>
> r3 == SYS_fallocate
> r4 == fd
> r5 == mode
> r6 == not used
> r7, r8 == offset
> r9 == len
len is 64-bit too afaik but let's ignore that for now
> This gets shifted to:
>
> r0 == SYS_fallocate
> r3 == fd
> r4 == mode
> r5 == not used
> r6, r7 == offset
> r8 == len
Which is not correct, as the kernel expects:
r0 == SYS_fallocate
r3 == fd
r4 == mode
r5, r6 == offset
r7, r8 == len
> For syscall the vararg parms will be mirrored to the parameter save area
> but will not be used. The ABI does not talk to LE for this case.
Right, but the fact that we shift all args by -1- register means that we
break the 64-bit register pair alignment compared to the real syscall
which uses r0 instead for the syscall number. Hence my proposal to add
a dummy argument to restore that alignment.
As it is there is userspace code that does:
syscall(SYS_fallocate, fd, mode, offset, len);
Which works on x86 but is broken on ppc32 unless we do that change.
Cheers,
Ben.
> Ryan does the new ABI doc cover this?
>
> > This will break because the first argument to syscall now shifts
> > everything by one register, which breaks the register pair alignment
> > (and I suppose archs with stack based calling convention can have
> > similar alignment issues even if x86 doesn't).
> >
> > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
> > it's first argument to correct that ? Either that or making it some kind
> > of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
> >
> > As it is, any 32-bit app using syscall() on any of the syscalls that
> > takes 64-bit arguments will be broken, unless the app itself breaks up
> > the argument, but the the order of the hi and lo part is different
> > between BE and LE architectures ;-)
> >
> > So is there a more "correct" solution than another here ? Should powerpc
> > glibc be fixed at least so that syscall() keeps the alignment ?
> >
> > Cheers,
> > Ben.
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 4:48 64-syscall args on 32-bit vs syscall() Benjamin Herrenschmidt
` (2 preceding siblings ...)
2010-03-15 15:03 ` Steven Munroe
@ 2010-03-15 15:04 ` Jamie Lokier
2010-03-15 20:33 ` Benjamin Herrenschmidt
3 siblings, 1 reply; 38+ messages in thread
From: Jamie Lokier @ 2010-03-15 15:04 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-arch, linux-kernel@vger.kernel.org, Mark Lord,
Ulrich Drepper, Linus Torvalds, Steven Munroe
Benjamin Herrenschmidt wrote:
> err = syscall(SYS_fallocate, fd, mode, offset, len);
>
> With "offset" being a 64-bit argument.
>
> This will break because the first argument to syscall now shifts
> everything by one register, which breaks the register pair alignment
> (and I suppose archs with stack based calling convention can have
> similar alignment issues even if x86 doesn't).
>
> Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
> it's first argument to correct that ? Either that or making it some kind
> of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
>
> As it is, any 32-bit app using syscall() on any of the syscalls that
> takes 64-bit arguments will be broken, unless the app itself breaks up
> the argument, but the the order of the hi and lo part is different
> between BE and LE architectures ;-)
>
> So is there a more "correct" solution than another here ? Should powerpc
> glibc be fixed at least so that syscall() keeps the alignment ?
There are several problems with syscall(), not just this - because a
number of system calls in section 2 of the manual don't map directly
to kernel syscalls with the same function prototype.
Even fork() has become something complicated in Glibc that doesn't use
the fork syscall :-(
So anything using syscall() has to be careful on Linux already.
Changing the 64-bit alignment won't fix the other differences.
-- Jamie
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: 64-syscall args on 32-bit vs syscall()
2010-03-15 15:04 ` Jamie Lokier
@ 2010-03-15 20:33 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 38+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-15 20:33 UTC (permalink / raw)
To: Jamie Lokier
Cc: linux-arch, linux-kernel@vger.kernel.org, Mark Lord,
Ulrich Drepper, Linus Torvalds, Steven Munroe
On Mon, 2010-03-15 at 15:04 +0000, Jamie Lokier wrote:
> There are several problems with syscall(), not just this - because a
> number of system calls in section 2 of the manual don't map directly
> to kernel syscalls with the same function prototype.
>
> Even fork() has become something complicated in Glibc that doesn't use
> the fork syscall :-(
>
> So anything using syscall() has to be careful on Linux already.
> Changing the 64-bit alignment won't fix the other differences.
It won't fix -all- the problems with syscall(), but it will fix a wagon
of them without breaking existing code that already does the arch
specific breakup on the call site...
Cheers,
Ben.
^ permalink raw reply [flat|nested] 38+ messages in thread