* Inefficient ia64 system call implementation in glibc
@ 2003-09-19 16:32 H. J. Lu
2003-09-19 17:29 ` Grant Grundler
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: H. J. Lu @ 2003-09-19 16:32 UTC (permalink / raw)
To: linux-ia64
The inline ia64 system call assumes all values passed to kernel are
signed 64bit. It does sign extension if the incoming arg is not signed
64bit. In case of fxstat.c:
int
__fxstat (int vers, int fd, struct stat *buf)
{
return INLINE_SYSCALL (fstat, 2, fd, CHECK_1 (buf));
}
it leads to
0000000000000000 <__fxstat>:
0: 00 20 39 0c 80 05 [MII] alloc r36=ar.pfs,14,6,0
6: f0 e0 01 12 48 a0 mov r15\x1212
c: 04 08 00 84 mov r37=r1
10: 01 38 01 44 00 21 [MII] mov r39=r34
16: 60 02 84 2c 00 60 sxt4 r38=r33
^^^^^^^^^^^^^
1c: 04 00 c4 00 mov r35°;;
20: 0a 00 00 00 00 02 [MMI] break.m 0x100000;;
26: 10 02 20 00 42 e0 mov r33=r8
"sxt4 r38=r33" is not necessary at all since kernel will never use
the uppper 4 bytes with
asmlinkage long sys_newfstat(unsigned int fd, struct stat * statbuf)
The basically problem is glibc doesn't store information about what
the kernel interface is so that it can't efficiently set up parameters
for system calls. Is there a way to improve the situation?
H.J.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
@ 2003-09-19 17:29 ` Grant Grundler
2003-09-19 21:46 ` John Worley
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Grant Grundler @ 2003-09-19 17:29 UTC (permalink / raw)
To: linux-ia64
On Fri, Sep 19, 2003 at 09:32:18AM -0700, H. J. Lu wrote:
> The inline ia64 system call assumes all values passed to kernel are
> signed 64bit. It does sign extension if the incoming arg is not signed
> 64bit.
AFAIK, All compilers do this. The HPUX performance team was on a rampage
to replace signed variables with "unsigned" whereever possible just for
this reason.
See example 2 in section "4.5.1 Data Types" (page 16 of 17):
http://devresource.hp.com/STK/partner/PA_PerfGuide_vs2.pdf
BTW, don't dismiss this just because it talks about parisc.
I'd guess +90% of this paper applies to ia64 as well.
grant
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
2003-09-19 17:29 ` Grant Grundler
@ 2003-09-19 21:46 ` John Worley
2003-09-19 23:32 ` Jim Hull
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: John Worley @ 2003-09-19 21:46 UTC (permalink / raw)
To: linux-ia64
H.J. Lu <hjl@lucon.org> write:
> The inline ia64 system call assumes all values passed to kernel are
> signed 64bit. It does sign extension if the incoming arg is not signed
> 64bit. In case of fxstat.c:
>
> int
> __fxstat (int vers, int fd, struct stat *buf)
> {
> return INLINE_SYSCALL (fstat, 2, fd, CHECK_1 (buf));
> }
>
> it leads to
>
> 0000000000000000 <__fxstat>:
> 0: 00 20 39 0c 80 05 [MII] alloc r36=ar.pfs,14,6,0
> 6: f0 e0 01 12 48 a0 mov r15\x1212
> c: 04 08 00 84 mov r37=r1
> 10: 01 38 01 44 00 21 [MII] mov r39=r34
> 16: 60 02 84 2c 00 60 sxt4 r38=r33
> ^^^^^^^^^^^^^
> 1c: 04 00 c4 00 mov r35°;;
> 20: 0a 00 00 00 00 02 [MMI] break.m 0x100000;;
> 26: 10 02 20 00 42 e0 mov r33=r8
The real inefficiency here is the compiler output. Given the
realities of the Itanium 2 implementation, the first two bundles
will require 3 cycles to execute. A better coding would be:
{ .mmi
alloc r36=ar.pfs,14,6,0
mov r15\x1212
mov r35°
}
{ .mmi
mov r37=r1
mov r39=r34
sxt4 r38=r33
} ;;
which will execute in one cycle. The sign extension, although
"unnecessary" doesn't cost any cycles. Admittedly you could use the
mi;;i bundle to pack the break instruction in the second bundle if
you didn't have to sign-extend, but I'd rather see the 3 v. 1 cycle
problem addressed first.
Regards,
John "I worry about this stuff way too much" Worley
john.worley@hp.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
2003-09-19 17:29 ` Grant Grundler
2003-09-19 21:46 ` John Worley
@ 2003-09-19 23:32 ` Jim Hull
2003-09-20 13:01 ` Andreas Schwab
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Jim Hull @ 2003-09-19 23:32 UTC (permalink / raw)
To: linux-ia64
John Worley wrote:
> The real inefficiency here is the compiler output. Given the
> realities of the Itanium 2 implementation, the first two bundles
> will require 3 cycles to execute. A better coding would be:
>
> { .mmi
> alloc r36=ar.pfs,14,6,0
> mov r15\x1212
> mov r35°
> }
> { .mmi
> mov r37=r1
> mov r39=r34
> sxt4 r38=r33
> } ;;
>
> which will execute in one cycle. The sign extension, although
> "unnecessary" doesn't cost any cycles. Admittedly you could use the
> mi;;i bundle to pack the break instruction in the second bundle if
> you didn't have to sign-extend, but I'd rather see the 3 v. 1 cycle
> problem addressed first.
Hi John!
Your scheduling of this code is definitely better than the original.
One minor point: Even if you could get the tools to understand that the
sign-extension is unnecessary, you still need to copy the input argument
to the output region (from r33 to r38 in this example), so you can't
eliminate any instructions and get better bundle packing. The only
advantage the copy would have vs. the sxt is that it is an A-type
instruction, instead of I-type, which might save a cycle or so in some
other syscall stub that is sign-extending several int argument (as
opposed to the single one here).
But my real issue with the performance of this code is not with
sign-extend or the scheduling these instructions, it's with the break
instruction. I may be mistaken, but hasn't it been many months since
David Mosberger implemented all the kernel infrastructure needed to
support syscalls using the epc instruction? When will glibc be changed
to take advantage of this? I would think that only after this has
happened should we worrying about squeezing out the last cycle or two of
overhead in the syscall stubs.
-- Jim
HP PA-RISC/Itanium Processor Architect
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
` (2 preceding siblings ...)
2003-09-19 23:32 ` Jim Hull
@ 2003-09-20 13:01 ` Andreas Schwab
2003-09-21 21:04 ` Richard Henderson
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andreas Schwab @ 2003-09-20 13:01 UTC (permalink / raw)
To: linux-ia64
"Jim Hull" <jim.hull@hp.com> writes:
> But my real issue with the performance of this code is not with
> sign-extend or the scheduling these instructions, it's with the break
> instruction. I may be mistaken, but hasn't it been many months since
> David Mosberger implemented all the kernel infrastructure needed to
> support syscalls using the epc instruction?
It's only implemented in 2.6 so far.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
` (3 preceding siblings ...)
2003-09-20 13:01 ` Andreas Schwab
@ 2003-09-21 21:04 ` Richard Henderson
2003-09-22 19:39 ` H. J. Lu
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2003-09-21 21:04 UTC (permalink / raw)
To: linux-ia64
On Fri, Sep 19, 2003 at 09:32:18AM -0700, H. J. Lu wrote:
> The basically problem is glibc doesn't store information about what
> the kernel interface is so that it can't efficiently set up parameters
> for system calls. Is there a way to improve the situation?
Use __typeof instead of hard-coding long in the LOAD_ARGS macros.
That's where the extension comes from.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
` (4 preceding siblings ...)
2003-09-21 21:04 ` Richard Henderson
@ 2003-09-22 19:39 ` H. J. Lu
2003-09-22 21:25 ` David Mosberger
2003-09-22 23:21 ` Richard Henderson
7 siblings, 0 replies; 9+ messages in thread
From: H. J. Lu @ 2003-09-22 19:39 UTC (permalink / raw)
To: linux-ia64
On Sun, Sep 21, 2003 at 02:04:34PM -0700, Richard Henderson wrote:
> On Fri, Sep 19, 2003 at 09:32:18AM -0700, H. J. Lu wrote:
> > The basically problem is glibc doesn't store information about what
> > the kernel interface is so that it can't efficiently set up parameters
> > for system calls. Is there a way to improve the situation?
>
> Use __typeof instead of hard-coding long in the LOAD_ARGS macros.
> That's where the extension comes from.
How can I make __typeof to work with
char buf [300];
INLINE_SYSCALL (read, 3, fd, buf, sizeof buf);
Can I get char * from char [300]?
H.J.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
` (5 preceding siblings ...)
2003-09-22 19:39 ` H. J. Lu
@ 2003-09-22 21:25 ` David Mosberger
2003-09-22 23:21 ` Richard Henderson
7 siblings, 0 replies; 9+ messages in thread
From: David Mosberger @ 2003-09-22 21:25 UTC (permalink / raw)
To: linux-ia64
>>>>> On Sat, 20 Sep 2003 15:01:26 +0200, Andreas Schwab <schwab@suse.de> said:
Andreas> "Jim Hull" <jim.hull@hp.com> writes:
>> But my real issue with the performance of this code is not with
>> sign-extend or the scheduling these instructions, it's with the
>> break instruction. I may be mistaken, but hasn't it been many
>> months since David Mosberger implemented all the kernel
>> infrastructure needed to support syscalls using the epc
>> instruction?
Andreas> It's only implemented in 2.6 so far.
No, the glibc support is completely orthogonal to the kernel support.
If glibc uses the new stubs on a kernel which doesn't support
kernel-entry via EPC, it will transparently fall back to using BREAK.
--david
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inefficient ia64 system call implementation in glibc
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
` (6 preceding siblings ...)
2003-09-22 21:25 ` David Mosberger
@ 2003-09-22 23:21 ` Richard Henderson
7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2003-09-22 23:21 UTC (permalink / raw)
To: linux-ia64
On Mon, Sep 22, 2003 at 12:39:18PM -0700, H. J. Lu wrote:
> Can I get char * from char [300]?
x+0 would work in this case; I'd guess it'd work for most of the
cases that syscalls need to handle.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-09-22 23:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-19 16:32 Inefficient ia64 system call implementation in glibc H. J. Lu
2003-09-19 17:29 ` Grant Grundler
2003-09-19 21:46 ` John Worley
2003-09-19 23:32 ` Jim Hull
2003-09-20 13:01 ` Andreas Schwab
2003-09-21 21:04 ` Richard Henderson
2003-09-22 19:39 ` H. J. Lu
2003-09-22 21:25 ` David Mosberger
2003-09-22 23:21 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox