* i386 inline-asm string functions - some questions
@ 2003-12-25 0:20 Denis Zaitsev
2003-12-25 0:38 ` Richard Henderson
2003-12-25 0:39 ` Roland McGrath
0 siblings, 2 replies; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-25 0:20 UTC (permalink / raw)
To: Andreas Jaeger; +Cc: Richard Henderson, libc-alpha, linux-gcc, gcc
From some moment in the past, the next input parameters are used here
and there in sysdeps/i386/i486/bits/string.h:
"m" ( *(struct { char __x[0xfffffff]; } *)__s)
When I was seeking for the reasons to do so, I've found some
discussions about this in libc-alpha and gcc mailing lists. As I
understand from there, there are an options - to use the "m" arg(s)
shown above or just to use "memory" in the list of a clobbered
registers. So, the question is: why the "m"-way had been choosen?
I'm asking, because I've found that this "m"-way leads GCC to produce
an unoptimal enough assembler, while "memory" code is ok.
Let me describe. This is some kind of typical inline-asm string
defun:
extern inline
_s2(const char *a, const char *b)
{
asm volatile (
"/*%0%1%2%3*/"
:"+&r"(a), "+&r"(b)
:"m"(*(struct{__extension__ char __x[0xfffffff];}*)a),
"m"(*(struct{__extension__ char __x[0xfffffff];}*)b)
:"cc"
);
}
It's, of course, just an essence from the typical string defun, all
real elements, which aren't important for the demonstration, are
omited. And the references for the asm operands inside the comment
are inserted - they will be healthy. So, compile the next:
s2(const char *a, const char *b){return _s2(a,b);}
.globl s2
.type s2, @function
s2:
pushl %esi
pushl %ebx
movl 12(%esp), %edx
movl 16(%esp), %eax
movl %edx, %ebx
movl %eax, %esi
#APP
/*%ebx%esi(%edx)(%eax)*/
#NO_APP
popl %ebx
movl %ecx, %eax
popl %esi
ret
Obviously, the following is a garbage:
pushl %esi
pushl %ebx
movl %edx, %ebx
movl %eax, %esi
popl %ebx
popl %esi
And this is the "memory" variant:
extern inline
_s2(const char *a, const char *b)
{
asm volatile (
"/*%0%1*/"
:"+&r"(a), "+&r"(b):
:"cc", "memory"
);
}
.globl s2
.type s2, @function
s2:
movl 4(%esp), %edx
movl 8(%esp), %eax
#APP
/*%edx%eax*/
#NO_APP
movl %ecx, %eax
ret
So, we've no garbage at all, only the very good assembler.
Then the next question is: am I understand right that the problem is
in the combination of the "earlyclobber" modifier of the asm operands
and the "m" with the corresponding args in the input list? And for
some reason GCC decides that "m" is tied with arg itself vs. a memory
this arg points to, and so a separate copy of the arg is needed, as
the corresponding output operand is early clobbered? The content of
the comment in the "m"-way defun shows (%edx)(%eax), but it seems that
GCC thinks about %edx%eax instead. (But very may be I'm wrong - I
don't know these GCC internals.)
Well, this is a very simple example, but my investigation shows that
the situation is the same for any C code - either simple or complex.
Always some extra registers are used, some extra loads are emited etc.
So, if both the variants are correct, it should be healthy to use the
"memory" one (as I understand, there was a time when it was really
used in sysdeps/i386/i486/bits/string.h ?). For example it's an
output from 'size libc.so' for the GLIBC-2.3.2 compiled with
-D__USE_STRING_INLINES:
text data bss dec hex filename
1108363 11296 10820 1130479 113fef libc.so
and this is the same, but if just the only one defun - __strcmp_gg -
is redone thru the "memory"-way:
text data bss dec hex filename
1107779 11296 10820 1129895 113da7 libc.so
The difference of the text's sizes is a little over 0.5k. And there
are tens of such defuns. So, the third question is about redoing all
the inline-asm string functions that way (of course, if there are no
any cons here).
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-25 0:20 i386 inline-asm string functions - some questions Denis Zaitsev
@ 2003-12-25 0:38 ` Richard Henderson
2003-12-25 1:15 ` Denis Zaitsev
[not found] ` <20031225060850.C7419@zzz.ward.six>
2003-12-25 0:39 ` Roland McGrath
1 sibling, 2 replies; 33+ messages in thread
From: Richard Henderson @ 2003-12-25 0:38 UTC (permalink / raw)
To: Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Thu, Dec 25, 2003 at 05:20:46AM +0500, Denis Zaitsev wrote:
> >From some moment in the past, the next input parameters are used here
> and there in sysdeps/i386/i486/bits/string.h:
>
> "m" ( *(struct { char __x[0xfffffff]; } *)__s)
>
> When I was seeking for the reasons to do so, I've found some
> discussions about this in libc-alpha and gcc mailing lists. As I
> understand from there, there are an options - to use the "m" arg(s)
> shown above or just to use "memory" in the list of a clobbered
> registers. So, the question is: why the "m"-way had been choosen?
Someone wanted to describe that memory is read, but not written.
There's no real good way to do that.
You could use the "X" constraint, which is supposed to mean "anything"
and by implication "unused", but it's normally only with scratch
registers, not memories, and the address reloads don't get deleted.
You could file an enhancement pr against "X" if you want.
r~
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-25 0:38 ` Richard Henderson
@ 2003-12-25 1:15 ` Denis Zaitsev
2003-12-25 1:21 ` Zack Weinberg
[not found] ` <20031225060850.C7419@zzz.ward.six>
1 sibling, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-25 1:15 UTC (permalink / raw)
To: Andreas Jaeger; +Cc: libc-alpha, linux-gcc, gcc
On Wed, Dec 24, 2003 at 04:38:19PM -0800, Richard Henderson wrote:
> On Thu, Dec 25, 2003 at 05:20:46AM +0500, Denis Zaitsev wrote:
> > >From some moment in the past, the next input parameters are used here
> > and there in sysdeps/i386/i486/bits/string.h:
> >
> > "m" ( *(struct { char __x[0xfffffff]; } *)__s)
> >
> > When I was seeking for the reasons to do so, I've found some
> > discussions about this in libc-alpha and gcc mailing lists. As I
> > understand from there, there are an options - to use the "m" arg(s)
> > shown above or just to use "memory" in the list of a clobbered
> > registers. So, the question is: why the "m"-way had been choosen?
>
> Someone wanted to describe that memory is read, but not written.
> There's no real good way to do that.
>
> You could use the "X" constraint, which is supposed to mean "anything"
> and by implication "unused", but it's normally only with scratch
> registers, not memories, and the address reloads don't get deleted.
Yes, I've tried the "X" - there is no difference from the "m" - all
the same unneded extra code (exactly).
> You could file an enhancement pr against "X" if you want.
Do you mean a kind of a complain that "X" does't work as it should?
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-25 1:15 ` Denis Zaitsev
@ 2003-12-25 1:21 ` Zack Weinberg
2003-12-25 1:45 ` Denis Zaitsev
0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-12-25 1:21 UTC (permalink / raw)
To: Andreas Jaeger; +Cc: libc-alpha, linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
>> You could use the "X" constraint, which is supposed to mean "anything"
>> and by implication "unused", but it's normally only with scratch
>> registers, not memories, and the address reloads don't get deleted.
>
> Yes, I've tried the "X" - there is no difference from the "m" - all
> the same unneded extra code (exactly).
I think the most constructive thing for you to do is find out _why_
all this unneeded extra code is being generated for "m" constraints
and then submit a patch to fix it.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-25 1:21 ` Zack Weinberg
@ 2003-12-25 1:45 ` Denis Zaitsev
2003-12-26 3:40 ` Zack Weinberg
0 siblings, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-25 1:45 UTC (permalink / raw)
To: Zack Weinberg
Cc: Andreas Jaeger, Richard Henderson, libc-alpha, linux-gcc, gcc
On Wed, Dec 24, 2003 at 05:21:16PM -0800, Zack Weinberg wrote:
> Denis Zaitsev <zzz@anda.ru> writes:
>
> >> You could use the "X" constraint, which is supposed to mean "anything"
> >> and by implication "unused", but it's normally only with scratch
> >> registers, not memories, and the address reloads don't get deleted.
> >
> > Yes, I've tried the "X" - there is no difference from the "m" - all
> > the same unneded extra code (exactly).
>
> I think the most constructive thing for you to do is find out _why_
> all this unneeded extra code is being generated for "m" constraints
> and then submit a patch to fix it.
So, does it mean that we are indeed speaking about the problem in GCC?
And I agree, probably it's the best way... So, whould you please to
show me any points to speed up my start? For now, the only one part
of GCC is not really new for me :)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-25 1:45 ` Denis Zaitsev
@ 2003-12-26 3:40 ` Zack Weinberg
2003-12-27 4:58 ` Richard Henderson
2003-12-27 10:52 ` Denis Zaitsev
0 siblings, 2 replies; 33+ messages in thread
From: Zack Weinberg @ 2003-12-26 3:40 UTC (permalink / raw)
To: Andreas Jaeger; +Cc: Richard Henderson, libc-alpha, linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
> So, does it mean that we are indeed speaking about the problem in
> GCC?
I think you've demonstrated that there isn't an ideal way to write
this construct right now. ("memory" clobbers having their own
problems).
The next stage is to figure out (a) what the right notation is, and
(b) what needs to be done in GCC to make it work. I cannot tell
whether the semantics of "m" should change, or whether new notation
should be introduced. For a starter, try changing "m" and see how far
you get.
> So, whould you please to show me any points to speed up my start?
> For now, the only one part of GCC is not really new for me :)
Sorry, I do not know where this is happening.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-26 3:40 ` Zack Weinberg
@ 2003-12-27 4:58 ` Richard Henderson
2003-12-27 10:24 ` Zack Weinberg
2003-12-27 10:52 ` Denis Zaitsev
1 sibling, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2003-12-27 4:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Thu, Dec 25, 2003 at 07:40:42PM -0800, Zack Weinberg wrote:
> For a starter, try changing "m" and see how far you get.
That would definitely be wrong when the operand is actually used.
r~
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-27 4:58 ` Richard Henderson
@ 2003-12-27 10:24 ` Zack Weinberg
2003-12-27 11:35 ` Denis Zaitsev
0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-12-27 10:24 UTC (permalink / raw)
To: Richard Henderson; +Cc: Andreas Jaeger, libc-alpha, linux-gcc, gcc
Richard Henderson <rth@redhat.com> writes:
> On Thu, Dec 25, 2003 at 07:40:42PM -0800, Zack Weinberg wrote:
>> For a starter, try changing "m" and see how far you get.
>
> That would definitely be wrong when the operand is actually used.
I suspected that might be the case.
Denis' original example doesn't quote actual code but I think it's
talking about stuff like this (from libc cvs,
sysdeps/i386/i486/bits/string.h) -
__STRING_INLINE void *
__memcpy_g (void *__dest, __const void *__src, size_t __n)
{
register unsigned long int __d0, __d1, __d2;
register void *__tmp = __dest;
__asm__ __volatile__
("cld\n\t"
"shrl $1,%%ecx\n\t"
"jnc 1f\n\t"
"movsb\n"
"1:\n\t"
"shrl $1,%%ecx\n\t"
"jnc 2f\n\t"
"movsw\n"
"2:\n\t"
"rep; movsl"
: "=&c" (__d0), "=&D" (__d1), "=&S" (__d2),
"=m" ( *(struct { __extension__ char __x[__n]; } *)__dest)
: "0" (__n), "1" (__tmp), "2" (__src),
"m" ( *(struct { __extension__ char __x[__n]; } *)__src)
: "cc");
return __dest;
}
so, first off, I don't think this kind of optimization is libc's
business; we have the tools to do a better job over here in the
compiler. And furthermore I think it's buggy - if the block to be
copied is large and not aligned, it will overwrite memory past the end
of the destination.
But let's suppose /arguendo/ that there is a legitimate use for a
construct like this: the notation is frankly appalling. Let me try to
make up some better notation, using C99 variably-modified arrays and
GNU forward parameter declarations (we have the blasted things, we
might as well get some use out of them...) Note I am not attempting
to fix the bugs in the assembly.
__STRING_INLINE void *
__memcpy_g (size_t __n; char __dest[restrict static __n],
const char __src[restrict static __n], size_t __n)
{
void *savedest = __dest;
__asm__ __volatile__
("cld\n\t"
"shrl $1,%%ecx\n\t"
"jnc 1f\n\t"
"movsb\n"
"1:\n\t"
"shrl $1,%%ecx\n\t"
"jnc 2f\n\t"
"movsw\n"
"2:\n\t"
"rep; movsl"
: "+c" (__n), "+@S" (__src), "+@D" (__dest));
return savedest;
}
@ is a character not otherwise used in constraints; it means 'the
value here is a pointer and the memory pointed to will be accessed'.
Exactly how much memory, and the nature of the access, are determined
by the type of the pointer. Here, both pointers are restrict-
qualified and point to memory blocks of known size (that's what
"static __n" in the brackets means). Furthermore, __src points to
constant memory, so that block is only read, whereas __dest is not
constant so the compiler shall assume it's written. I had to change
the types from void to char so the size expressions would be
meaningful; if you were actually to use this to implement memcpy,
you'd wrap it in another inline function that casted the arguments.
I think that's all that should be needed. Thoughts?
zw
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-27 10:24 ` Zack Weinberg
@ 2003-12-27 11:35 ` Denis Zaitsev
2003-12-27 18:38 ` Zack Weinberg
0 siblings, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-27 11:35 UTC (permalink / raw)
To: Zack Weinberg
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Sat, Dec 27, 2003 at 02:24:59AM -0800, Zack Weinberg wrote:
> Denis' original example doesn't quote actual code but I think it's
> talking about stuff like this (from libc cvs,
> sysdeps/i386/i486/bits/string.h) -
Exactly. But I've mentioned that...
> __STRING_INLINE void *
> __memcpy_g (void *__dest, __const void *__src, size_t __n)
> {
> register unsigned long int __d0, __d1, __d2;
> register void *__tmp = __dest;
> __asm__ __volatile__
> ("cld\n\t"
> "shrl $1,%%ecx\n\t"
> "jnc 1f\n\t"
> "movsb\n"
> "1:\n\t"
> "shrl $1,%%ecx\n\t"
> "jnc 2f\n\t"
> "movsw\n"
> "2:\n\t"
> "rep; movsl"
> : "=&c" (__d0), "=&D" (__d1), "=&S" (__d2),
> "=m" ( *(struct { __extension__ char __x[__n]; } *)__dest)
> : "0" (__n), "1" (__tmp), "2" (__src),
> "m" ( *(struct { __extension__ char __x[__n]; } *)__src)
> : "cc");
> return __dest;
> }
>
> so, first off, I don't think this kind of optimization is libc's
> business; we have the tools to do a better job over here in the
> compiler.
Should the compiler implement all the string functions? Very probably
not. But anyway, then these problem will be inside the compiler
(again). And anyway, there should be the right way to make such
inline solutions, if so...
> And furthermore I think it's buggy - if the block to be copied is
> large and not aligned, it will overwrite memory past the end of the
> destination.
Why do you think so? The code looks ok. I don't think it's the
fastest one, but it's correct.
> But let's suppose /arguendo/ that there is a legitimate use for a
> construct like this: the notation is frankly appalling. Let me try
> to make up some better notation, using C99 variably-modified arrays
> and GNU forward parameter declarations (we have the blasted things,
> we might as well get some use out of them...) Note I am not
> attempting to fix the bugs in the assembly.
>
> __STRING_INLINE void *
> __memcpy_g (size_t __n; char __dest[restrict static __n],
> const char __src[restrict static __n], size_t __n)
> {
> void *savedest = __dest;
> __asm__ __volatile__
> ("cld\n\t"
> "shrl $1,%%ecx\n\t"
> "jnc 1f\n\t"
> "movsb\n"
> "1:\n\t"
> "shrl $1,%%ecx\n\t"
> "jnc 2f\n\t"
> "movsw\n"
> "2:\n\t"
> "rep; movsl"
> : "+c" (__n), "+@S" (__src), "+@D" (__dest));
> return savedest;
> }
>
> @ is a character not otherwise used in constraints; it means 'the
> value here is a pointer and the memory pointed to will be accessed'.
Why isn't it documented? Is it a kind of "new" one?
(The only remark is - it must be "+@&S" etc., there are the
earlyclobbered operands.)
> Exactly how much memory, and the nature of the access, are
> determined by the type of the pointer. Here, both pointers are
> restrict- qualified and point to memory blocks of known size (that's
> what "static __n" in the brackets means). Furthermore, __src points
> to constant memory, so that block is only read, whereas __dest is
> not constant so the compiler shall assume it's written.
Does this "@" behaves as well with unrestricted pointers like just
(char *s)?
> I had to change the types from void to char so the size expressions
> would be meaningful; if you were actually to use this to implement
> memcpy, you'd wrap it in another inline function that casted the
> arguments.
>
> I think that's all that should be needed. Thoughts?
I've just tried this (on mine examples, with unrestricted pointers).
The things seem to be fine. Not ideal (reloading suffers sometimes,
but this is not the @-specific problem), but completely free of the
problems introduced by "m".
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-27 11:35 ` Denis Zaitsev
@ 2003-12-27 18:38 ` Zack Weinberg
2003-12-28 20:58 ` Denis Zaitsev
0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-12-27 18:38 UTC (permalink / raw)
To: Richard Henderson; +Cc: Andreas Jaeger, libc-alpha, linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
>> so, first off, I don't think this kind of optimization is libc's
>> business; we have the tools to do a better job over here in the
>> compiler.
>
> Should the compiler implement all the string functions?
That is the trend. The compiler can make a better decision about
whether memcpy (for example) should be inlined at all, if it knows the
properties. If it does decide to inline a general memcpy algorithm,
it doesn't have to treat it as a giant opaque block of assembly
language, not to be modified. It can schedule other things
simultaneously, if that's a good move; it can prove that some of the
insns are unnecessary and eliminate them; etc. etc.
> Very probably not. But anyway, then these problem will be inside
> the compiler (again).
No; we have more flexible ways of expressing this sort of thing inside
the compiler.
>> And furthermore I think it's buggy - if the block to be copied is
>> large and not aligned, it will overwrite memory past the end of the
>> destination.
>
> Why do you think so? The code looks ok. I don't think it's the
> fastest one, but it's correct.
I misunderstood the consequences of doing rep movsl with unaligned
pointers. It just does lots of slow misaligned memory accesses; it
doesn't overwrite memory outside the destination block.
>> @ is a character not otherwise used in constraints; it means 'the
>> value here is a pointer and the memory pointed to will be accessed'.
>
> Why isn't it documented? Is it a kind of "new" one?
I just made it up. It is not implemented at present, nor will it
necessarily _be_ implemented. I was making a suggestion for a better
way to write this stuff.
> (The only remark is - it must be "+@&S" etc., there are the
> earlyclobbered operands.)
There are now only three operands and they have non-overlapping
register classes, so & is not necessary.
> Does this "@" behaves as well with unrestricted pointers like just
> (char *s)?
The less information the compiler has about the pointer, the more
memory it would have to assume is modified. At worst, "@" should
be equivalent to clobbering "memory".
> I've just tried this (on mine examples, with unrestricted pointers).
> The things seem to be fine. Not ideal (reloading suffers sometimes,
> but this is not the @-specific problem), but completely free of the
> problems introduced by "m".
Please remember that "m" (extension struct blah blah __dest) was
written in the original for a reason. You're not going to see it in
simple test cases, but the compiler has to be told that the asm
statement modifies memory, or it *will* mis-optimize around it. My
example code, with no meaning implemented for "@", is like that.
The point of the original construct was to tell the compiler exactly
what blocks of memory were modified. This turns out to have
undesirable side effects, which we're trying to get around here, but
let's not forget what the original point was. If there weren't cases
where clobbering "memory" caused poor optimization, no one would have
bothered with the "m" mess in the first place.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-27 18:38 ` Zack Weinberg
@ 2003-12-28 20:58 ` Denis Zaitsev
2003-12-29 2:22 ` Zack Weinberg
0 siblings, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-28 20:58 UTC (permalink / raw)
To: Zack Weinberg
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Sat, Dec 27, 2003 at 10:38:49AM -0800, Zack Weinberg wrote:
> Denis Zaitsev <zzz@anda.ru> writes:
>
> >> so, first off, I don't think this kind of optimization is libc's
> >> business; we have the tools to do a better job over here in the
> >> compiler.
> >
> > Should the compiler implement all the string functions?
>
> That is the trend. The compiler can make a better decision about
> whether memcpy (for example) should be inlined at all, if it knows
> the properties.
Yes, but even if it can, it is rather a kinda political question -
should it do so, or this must be defined by the programmer. I
personally like the latter approach, but who knows...
> If it does decide to inline a general memcpy algorithm, it doesn't
> have to treat it as a giant opaque block of assembly language, not
> to be modified.
I agree definitely. But the same is necessary for inline-asm as well.
There should be an ability to show to the compiler the properties of
the asm block, which would allow the compiler to work well with it.
For now, as I understand, there are two such possibilities - an
abcense of the "volatile" keyword and a (manual) splitting the asm
block into some "volatile" ones. These don't seem to be bad, but the
compiler hasn't got some other abilities (ihmo) to work excellent with
these two.
> It can schedule other things simultaneously, if that's a good move;
> it can prove that some of the insns are unnecessary and eliminate
> them; etc. etc.
Ok, ok again. But in the real life the external inline-asm seems not
to feel so bad from this point of view. The real inline-asm defun
usually contains some prologue and epilogue written in the usual C.
And there are the places where the compiler may do its optimisation
job. It's just like a work with an invariants moved outside the loop.
> > Very probably not. But anyway, then these problem will be inside
> > the compiler (again).
>
> No; we have more flexible ways of expressing this sort of thing
> inside the compiler.
But it seems to be politically wrong - not to keep the library
functions _in_ the library, doesn't it? For some _very basic_
primitives it's ok, but not for the whole library functions, even
though for the basic ones.
> I just made it up. It is not implemented at present, nor will it
> necessarily _be_ implemented. I was making a suggestion for a better
> way to write this stuff.
Heh... :) And I was trying to play with them...
> > (The only remark is - it must be "+@&S" etc., there are the
> > earlyclobbered operands.)
>
> There are now only three operands and they have non-overlapping
> register classes, so & is not necessary.
Ok, I'm sorry. I don't use "S" etc., so I just overlooked the
things...
> Please remember that "m" (extension struct blah blah __dest) was
> written in the original for a reason. You're not going to see it in
> simple test cases, but the compiler has to be told that the asm
> statement modifies memory, or it *will* mis-optimize around it. My
> example code, with no meaning implemented for "@", is like that.
I understand this. And I'm not arguing with it. I just have a
(grounded?) feeling, that this benefit is quite rare possible to be
met in the real life. Other people have the same feeling - look at
http://gcc.gnu.org/ml/gcc-patches/2002-03/msg00521.html and the thread.
> The point of the original construct was to tell the compiler exactly
> what blocks of memory were modified. This turns out to have
> undesirable side effects, which we're trying to get around here, but
> let's not forget what the original point was. If there weren't
> cases where clobbering "memory" caused poor optimization, no one
> would have bothered with the "m" mess in the first place.
So about these "undesirable side effects" - they should be left in
piece till some good time, when GCC will start not to produce them.
Ok, it sounds nearly reasonable. But please look thru that
sysdeps/i386/i486/bits/string.h - it has definitely been written with
some oter approaches in mind(s). It's full of misc. workarounds off
GCC, and it looks like a kind of a way to reach the good machine code
while the world around it, including the compiler, is not ideal. So,
I'm just wondering why one such a subway has been choosen vs. some
other, while this first one obviously has cons and its pros seems to
be just ephemeral... If they are real, then the question in that its
form is vanishing. But what I've found for now is a suspicions more
than the hard evidences... :)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-28 20:58 ` Denis Zaitsev
@ 2003-12-29 2:22 ` Zack Weinberg
2003-12-29 2:44 ` Denis Zaitsev
2003-12-29 3:35 ` Ulrich Drepper
0 siblings, 2 replies; 33+ messages in thread
From: Zack Weinberg @ 2003-12-29 2:22 UTC (permalink / raw)
To: Richard Henderson; +Cc: Andreas Jaeger, libc-alpha, linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
> On Sat, Dec 27, 2003 at 10:38:49AM -0800, Zack Weinberg wrote:
>> Denis Zaitsev <zzz@anda.ru> writes:
>>
>> >> so, first off, I don't think this kind of optimization is libc's
>> >> business; we have the tools to do a better job over here in the
>> >> compiler.
>> >
>> > Should the compiler implement all the string functions?
>>
>> That is the trend. The compiler can make a better decision about
>> whether memcpy (for example) should be inlined at all, if it knows
>> the properties.
>
> Yes, but even if it can, it is rather a kinda political question -
> should it do so, or this must be defined by the programmer. I
> personally like the latter approach, but who knows...
Meh. I personally am convinced that the compiler can do a *much*
better job, and that trying to improve bits/string.h and
bits/string2.h is a waste of time; in fact, I've felt that they have
*always* caused the generated code to get worse, from the day they
were introduced. I once tried to get Uli to take them out again, with
hard numbers to back me up, but he ignored me.
So I have very little interest in pursuing any of your suggestions.
If you want to keep at them, though, and come up with patches, feel
free.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 2:22 ` Zack Weinberg
@ 2003-12-29 2:44 ` Denis Zaitsev
2003-12-29 2:46 ` Zack Weinberg
2003-12-29 3:35 ` Ulrich Drepper
1 sibling, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-29 2:44 UTC (permalink / raw)
To: Zack Weinberg
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Sun, Dec 28, 2003 at 06:22:08PM -0800, Zack Weinberg wrote:
>
> Meh. I personally am convinced that the compiler can do a *much*
> better job, and that trying to improve bits/string.h and
> bits/string2.h is a waste of time; in fact, I've felt that they have
> *always* caused the generated code to get worse, from the day they
> were introduced. I once tried to get Uli to take them out again,
> with hard numbers to back me up, but he ignored me.
Who is Uli - Ulrich Drepper?
> So I have very little interest in pursuing any of your suggestions.
But I don't even try to have you to do so! I'm just trying to
understand what is/was happening. I very don't like the content of
bits/string[2].h too. And I don't want to offend you. I'm very
sorry, if so.
> If you want to keep at them, though, and come up with patches, feel
> free.
Ok.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 2:44 ` Denis Zaitsev
@ 2003-12-29 2:46 ` Zack Weinberg
2003-12-29 2:53 ` Denis Zaitsev
0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-12-29 2:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: Andreas Jaeger, libc-alpha, linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
> On Sun, Dec 28, 2003 at 06:22:08PM -0800, Zack Weinberg wrote:
>>
>> Meh. I personally am convinced that the compiler can do a *much*
>> better job, and that trying to improve bits/string.h and
>> bits/string2.h is a waste of time; in fact, I've felt that they have
>> *always* caused the generated code to get worse, from the day they
>> were introduced. I once tried to get Uli to take them out again,
>> with hard numbers to back me up, but he ignored me.
>
> Who is Uli - Ulrich Drepper?
Yes.
>> So I have very little interest in pursuing any of your suggestions.
>
> But I don't even try to have you to do so! I'm just trying to
> understand what is/was happening. I very don't like the content of
> bits/string[2].h too. And I don't want to offend you. I'm very
> sorry, if so.
No offense was taken, and I'm sorry I was so short.
I meant to indicate that I lack the time even to consider the rest of
your message at any length, so I cannot answer the questions you
raise. Again I apologize.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 2:46 ` Zack Weinberg
@ 2003-12-29 2:53 ` Denis Zaitsev
0 siblings, 0 replies; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-29 2:53 UTC (permalink / raw)
To: Zack Weinberg
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Sun, Dec 28, 2003 at 06:46:33PM -0800, Zack Weinberg wrote:
> > But I don't even try to have you to do so! I'm just trying to
> > understand what is/was happening. I very don't like the content
> > of bits/string[2].h too. And I don't want to offend you. I'm
> > very sorry, if so.
>
> No offense was taken, and I'm sorry I was so short.
>
> I meant to indicate that I lack the time even to consider the rest
> of your message at any length, so I cannot answer the questions you
> raise. Again I apologize.
Ok, I understand.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 2:22 ` Zack Weinberg
2003-12-29 2:44 ` Denis Zaitsev
@ 2003-12-29 3:35 ` Ulrich Drepper
2003-12-29 3:54 ` Andrew Pinski
2003-12-29 3:56 ` Zack Weinberg
1 sibling, 2 replies; 33+ messages in thread
From: Ulrich Drepper @ 2003-12-29 3:35 UTC (permalink / raw)
To: Zack Weinberg
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Zack Weinberg wrote:
> I once tried to get Uli to take them out again, with
> hard numbers to back me up, but he ignored me.
I have absolutely no problem taking out the inlines once gcc is able to
perform the same optimizations. Problem is that nobody spent the time
so far to complete the task in gcc. As far as I know each function we
still have has an advantage over the gcc code.
Just look at the inlines to determine what is optimized, do it in gcc,
and let me know. Then I'll remove the inline.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
iD8DBQE/76EV2ijCOnn/RHQRAiqSAJ94VislJ3isH5sUcxkVnYYPWf5P5wCgtG31
NshLBBxIoD9h39vlQ4cbjU8=
=kZHC
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 3:35 ` Ulrich Drepper
@ 2003-12-29 3:54 ` Andrew Pinski
2003-12-29 6:57 ` Jakub Jelinek
2003-12-29 3:56 ` Zack Weinberg
1 sibling, 1 reply; 33+ messages in thread
From: Andrew Pinski @ 2003-12-29 3:54 UTC (permalink / raw)
To: Ulrich Drepper
Cc: linux-gcc, Andreas Jaeger, gcc, Richard Henderson, Zack Weinberg,
Andrew Pinski, libc-alpha
On Dec 28, 2003, at 22:35, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Zack Weinberg wrote:
>> I once tried to get Uli to take them out again, with
>> hard numbers to back me up, but he ignored me.
>
> I have absolutely no problem taking out the inlines once gcc is able to
> perform the same optimizations. Problem is that nobody spent the time
> so far to complete the task in gcc. As far as I know each function we
> still have has an advantage over the gcc code.
>
> Just look at the inlines to determine what is optimized, do it in gcc,
> and let me know. Then I'll remove the inline.
We already do more when it comes to removing sqrt and other math
functions and also
some string functions we optimize without the need for the string
instructions.
In fact GCC does more optimizations on string functions than glibc does
already.
The functions that GCC does not optimize that glibc does are the
following:
memrchr
strncat (we do sometimes)
strncmp
strchr (with a FIXME in GCC)
strchrnul
strcspn
strspn
strpbrk
strstr (we do a better job for "a", but we do not do it for the general
case)
The common cases of strcpy, memcpy, etc. we do optimize greatly and as
the math
library we do too.
Thanks,
Andrew Pinski
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 3:54 ` Andrew Pinski
@ 2003-12-29 6:57 ` Jakub Jelinek
0 siblings, 0 replies; 33+ messages in thread
From: Jakub Jelinek @ 2003-12-29 6:57 UTC (permalink / raw)
To: Andrew Pinski
Cc: Ulrich Drepper, linux-gcc, Andreas Jaeger, gcc, Richard Henderson,
Zack Weinberg, libc-alpha
On Sun, Dec 28, 2003 at 10:54:19PM -0500, Andrew Pinski wrote:
> >Zack Weinberg wrote:
> >>I once tried to get Uli to take them out again, with
> >>hard numbers to back me up, but he ignored me.
> >
> >I have absolutely no problem taking out the inlines once gcc is able to
> >perform the same optimizations. Problem is that nobody spent the time
> >so far to complete the task in gcc. As far as I know each function we
> >still have has an advantage over the gcc code.
> >
> >Just look at the inlines to determine what is optimized, do it in gcc,
> >and let me know. Then I'll remove the inline.
>
> We already do more when it comes to removing sqrt and other math
> functions and also
> some string functions we optimize without the need for the string
> instructions.
> In fact GCC does more optimizations on string functions than glibc does
> already.
I think all inlines/macros which are to be removed from bits/string{,2}.h
should be benchmarked first with various constant and variable arguments on
various architectures.
I did it for some routines 2 years ago:
http://sources.redhat.com/ml/libc-hacker/2001-11/msg00035.html
http://sources.redhat.com/ml/libc-hacker/2002-01/msg00091.html
> The functions that GCC does not optimize that glibc does are the
> following:
> memrchr
> strncat (we do sometimes)
> strncmp
> strchr (with a FIXME in GCC)
> strchrnul
> strcspn
> strspn
> strpbrk
> strstr (we do a better job for "a", but we do not do it for the general
> case)
>
>
> The common cases of strcpy, memcpy, etc. we do optimize greatly and as
> the math
> library we do too.
Jakub
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 3:35 ` Ulrich Drepper
2003-12-29 3:54 ` Andrew Pinski
@ 2003-12-29 3:56 ` Zack Weinberg
2003-12-29 5:31 ` Daniel Jacobowitz
2003-12-29 18:51 ` Denis Zaitsev
1 sibling, 2 replies; 33+ messages in thread
From: Zack Weinberg @ 2003-12-29 3:56 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
Ulrich Drepper <drepper@redhat.com> writes:
> Zack Weinberg wrote:
>> I once tried to get Uli to take them out again, with
>> hard numbers to back me up, but he ignored me.
>
> I have absolutely no problem taking out the inlines once gcc is able to
> perform the same optimizations. Problem is that nobody spent the time
> so far to complete the task in gcc.
This is true - I believe Joseph Myers put a list of yet-to-be-done
optimizations on the GCC projects page ...
> As far as I know each function we still have has an advantage over
> the gcc code.
... however, that advantage is only theoretical. Experiments such as
Peter Zaitsev's just now, and mine several years ago, demonstrate that
the bits/string.h and bits/string2.h inlines make code worse, not better.
Therefore they should be removed.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 3:56 ` Zack Weinberg
@ 2003-12-29 5:31 ` Daniel Jacobowitz
2003-12-29 5:55 ` Zack Weinberg
2003-12-29 18:37 ` Denis Zaitsev
2003-12-29 18:51 ` Denis Zaitsev
1 sibling, 2 replies; 33+ messages in thread
From: Daniel Jacobowitz @ 2003-12-29 5:31 UTC (permalink / raw)
To: Zack Weinberg
Cc: Ulrich Drepper, Richard Henderson, Andreas Jaeger, libc-alpha,
linux-gcc, gcc
On Sun, Dec 28, 2003 at 07:56:54PM -0800, Zack Weinberg wrote:
> Ulrich Drepper <drepper@redhat.com> writes:
>
> > Zack Weinberg wrote:
> >> I once tried to get Uli to take them out again, with
> >> hard numbers to back me up, but he ignored me.
> >
> > I have absolutely no problem taking out the inlines once gcc is able to
> > perform the same optimizations. Problem is that nobody spent the time
> > so far to complete the task in gcc.
>
> This is true - I believe Joseph Myers put a list of yet-to-be-done
> optimizations on the GCC projects page ...
>
> > As far as I know each function we still have has an advantage over
> > the gcc code.
>
> ... however, that advantage is only theoretical. Experiments such as
> Peter Zaitsev's just now, and mine several years ago, demonstrate that
> the bits/string.h and bits/string2.h inlines make code worse, not better.
> Therefore they should be removed.
Funny, I conducted this experiment last week and found quite the
opposite. Compiling the demangler and a smallish yacc parser
with -D__NO_STRING_INLINES cost about 20% in runtime.
I'm not convinced.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 5:31 ` Daniel Jacobowitz
@ 2003-12-29 5:55 ` Zack Weinberg
2003-12-29 18:37 ` Denis Zaitsev
1 sibling, 0 replies; 33+ messages in thread
From: Zack Weinberg @ 2003-12-29 5:55 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
> > ... however, that advantage is only theoretical. Experiments such as
> > Peter Zaitsev's just now, and mine several years ago, demonstrate that
> > the bits/string.h and bits/string2.h inlines make code worse, not better.
> > Therefore they should be removed.
>
> Funny, I conducted this experiment last week and found quite the
> opposite. Compiling the demangler and a smallish yacc parser
> with -D__NO_STRING_INLINES cost about 20% in runtime.
That's interesting. My testing was with much larger programs where
str* / mem* aren't the bottleneck anyway. I wonder if you would be
willing to take a look at the differences in the assembly language
and see where that 20% is coming from.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 5:31 ` Daniel Jacobowitz
2003-12-29 5:55 ` Zack Weinberg
@ 2003-12-29 18:37 ` Denis Zaitsev
2003-12-29 19:09 ` Zack Weinberg
1 sibling, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-29 18:37 UTC (permalink / raw)
To: Daniel Jacobowitz
Cc: Zack Weinberg, Ulrich Drepper, Richard Henderson, Andreas Jaeger,
libc-alpha, linux-gcc, gcc
On Mon, Dec 29, 2003 at 12:31:52AM -0500, Daniel Jacobowitz wrote:
> Funny, I conducted this experiment last week and found quite the
> opposite. Compiling the demangler and a smallish yacc parser
> with -D__NO_STRING_INLINES cost about 20% in runtime.
-D__NO_STRING_INLINES just puts the inlining off. But nobody here
tells about the inline/noinline comparing. Of course, inlining is
better at speed. The comparison is doing between some versions of the
inlining.
> I'm not convinced.
If it's possible, please try you experiment with "memory" vs. "m"
versions of inlines. I will send the patch to you, if it's needed.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 18:37 ` Denis Zaitsev
@ 2003-12-29 19:09 ` Zack Weinberg
2003-12-29 19:31 ` Denis Zaitsev
0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-12-29 19:09 UTC (permalink / raw)
To: Daniel Jacobowitz
Cc: Ulrich Drepper, Richard Henderson, Andreas Jaeger, libc-alpha,
linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
> On Mon, Dec 29, 2003 at 12:31:52AM -0500, Daniel Jacobowitz wrote:
>> Funny, I conducted this experiment last week and found quite the
>> opposite. Compiling the demangler and a smallish yacc parser
>> with -D__NO_STRING_INLINES cost about 20% in runtime.
>
> -D__NO_STRING_INLINES just puts the inlining off. But nobody here
> tells about the inline/noinline comparing. Of course, inlining is
> better at speed. The comparison is doing between some versions of the
> inlining.
No. There is no "of course" here. If your inlined functions blow out
the instruction cache, it may wind up being a net lose. Same if the
out-of-line memcpy takes several more instructions to set up but makes
damn sure to do aligned memory accesses (full-bus-width loads,
nontemporal store, prefetches, etc etc etc), whereas the inline one
doesn't.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 19:09 ` Zack Weinberg
@ 2003-12-29 19:31 ` Denis Zaitsev
2003-12-29 19:37 ` Denis Zaitsev
0 siblings, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-29 19:31 UTC (permalink / raw)
To: Zack Weinberg
Cc: Daniel Jacobowitz, Ulrich Drepper, Richard Henderson,
Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Mon, Dec 29, 2003 at 11:09:14AM -0800, Zack Weinberg wrote:
> Denis Zaitsev <zzz@anda.ru> writes:
>
> > On Mon, Dec 29, 2003 at 12:31:52AM -0500, Daniel Jacobowitz wrote:
> >> Funny, I conducted this experiment last week and found quite the
> >> opposite. Compiling the demangler and a smallish yacc parser
> >> with -D__NO_STRING_INLINES cost about 20% in runtime.
> >
> > -D__NO_STRING_INLINES just puts the inlining off. But nobody here
> > tells about the inline/noinline comparing. Of course, inlining is
> > better at speed. The comparison is doing between some versions of the
> > inlining.
>
> No. There is no "of course" here.
Strictly speaking, you are right. But in general, in the real life in
average etc., when strings are too short the effects you are
describing below to appear etc. etc., my "of course" is near enough to
the truth.
> If your inlined functions blow out the instruction cache, it may
> wind up being a net lose.
Definitely. This is why I don't like when the inline functions are
growing, putting the effects of "m" down...
> Same if the out-of-line memcpy takes several more instructions to
> set up but makes damn sure to do aligned memory accesses
> (full-bus-width loads, nontemporal store, prefetches, etc etc etc),
> whereas the inline one doesn't.
First, they are trying to do so here and there (but ok, not fine).
Second, I have experimented with this nonaligned access some time
ago. I don't remember exactly, but it seems that the modern x86
processors do the job as fine as when the memory access is aligned.
But as it was too far ago, I can't recollect the details and so I
won't insist. Better I will repeat the measurements...
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 19:31 ` Denis Zaitsev
@ 2003-12-29 19:37 ` Denis Zaitsev
0 siblings, 0 replies; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-29 19:37 UTC (permalink / raw)
To: Zack Weinberg, Daniel Jacobowitz, Ulrich Drepper,
Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
On Tue, Dec 30, 2003 at 12:31:47AM +0500, Denis Zaitsev wrote:
> On Mon, Dec 29, 2003 at 11:09:14AM -0800, Zack Weinberg wrote:
> > Same if the out-of-line memcpy takes several more instructions to
> > set up but makes damn sure to do aligned memory accesses
> > (full-bus-width loads, nontemporal store, prefetches, etc etc etc),
> > whereas the inline one doesn't.
Sorry, I mean the inline functions below! Of course!
> First, they are trying to do so here and there (but ok, not fine).
>
> Second, I have experimented with this nonaligned access some time
> ago. I don't remember exactly, but it seems that the modern x86
> processors do the job as fine as when the memory access is aligned.
> But as it was too far ago, I can't recollect the details and so I
> won't insist. Better I will repeat the measurements...
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 3:56 ` Zack Weinberg
2003-12-29 5:31 ` Daniel Jacobowitz
@ 2003-12-29 18:51 ` Denis Zaitsev
2003-12-29 19:15 ` Zack Weinberg
1 sibling, 1 reply; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-29 18:51 UTC (permalink / raw)
To: Zack Weinberg
Cc: Ulrich Drepper, Richard Henderson, Andreas Jaeger, libc-alpha,
linux-gcc, gcc
On Sun, Dec 28, 2003 at 07:56:54PM -0800, Zack Weinberg wrote:
> ... however, that advantage is only theoretical. Experiments such as
> Peter Zaitsev's just now, and mine several years ago, demonstrate that
^^^^^^^^^^^^^
Do you mean me? Pete Zaitcev is the real person too (AFAIK), but he
is the other person... :)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-29 18:51 ` Denis Zaitsev
@ 2003-12-29 19:15 ` Zack Weinberg
0 siblings, 0 replies; 33+ messages in thread
From: Zack Weinberg @ 2003-12-29 19:15 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Richard Henderson, Andreas Jaeger, libc-alpha, linux-gcc, gcc
Denis Zaitsev <zzz@anda.ru> writes:
> On Sun, Dec 28, 2003 at 07:56:54PM -0800, Zack Weinberg wrote:
>> ... however, that advantage is only theoretical. Experiments such as
>> Peter Zaitsev's just now, and mine several years ago, demonstrate that
> ^^^^^^^^^^^^^
> Do you mean me? Pete Zaitcev is the real person too (AFAIK), but he
> is the other person... :)
I'm sorry, I got you mixed up.
zw
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: i386 inline-asm string functions - some questions
2003-12-26 3:40 ` Zack Weinberg
2003-12-27 4:58 ` Richard Henderson
@ 2003-12-27 10:52 ` Denis Zaitsev
1 sibling, 0 replies; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-27 10:52 UTC (permalink / raw)
To: Zack Weinberg
Cc: Andreas Jaeger, Richard Henderson, libc-alpha, linux-gcc, gcc
On Thu, Dec 25, 2003 at 07:40:42PM -0800, Zack Weinberg wrote:
> Denis Zaitsev <zzz@anda.ru> writes:
>
> > So, does it mean that we are indeed speaking about the problem in
> > GCC?
>
> I think you've demonstrated that there isn't an ideal way to write
> this construct right now. ("memory" clobbers having their own
> problems).
Really, I did mean that "m" is worse than "memory" (say, "in
general"), but it was choosen to use and it is enigmatic for me.
There was a discussion in the past about the advantages given by "m"
over just "memory". And as I understand, these advantages are really
nothing. But the dummy code size they added to, say, glibc-2.3.2 is
6Kb.
> The next stage is to figure out (a) what the right notation is, and
> (b) what needs to be done in GCC to make it work. I cannot tell
> whether the semantics of "m" should change, or whether new notation
Semantics? Or may be implementation? It seems that all ok with the
semantics...
^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20031225060850.C7419@zzz.ward.six>]
* Re: i386 inline-asm string functions - some questions
2003-12-25 0:20 i386 inline-asm string functions - some questions Denis Zaitsev
2003-12-25 0:38 ` Richard Henderson
@ 2003-12-25 0:39 ` Roland McGrath
2003-12-25 1:13 ` Denis Zaitsev
1 sibling, 1 reply; 33+ messages in thread
From: Roland McGrath @ 2003-12-25 0:39 UTC (permalink / raw)
To: Denis Zaitsev
Cc: Andreas Jaeger, Richard Henderson, libc-alpha, linux-gcc, gcc
> >From some moment in the past, the next input parameters are used here
> and there in sysdeps/i386/i486/bits/string.h:
>
> "m" ( *(struct { char __x[0xfffffff]; } *)__s)
>
> When I was seeking for the reasons to do so, I've found some
> discussions about this in libc-alpha and gcc mailing lists. As I
> understand from there, there are an options - to use the "m" arg(s)
> shown above or just to use "memory" in the list of a clobbered
> registers. So, the question is: why the "m"-way had been choosen?
The reason we use this kind of "m" constraint is that it indicates what we
want to say: memory __s points to might be used. That means that if the C
aliasing rules allow the compiler to assume that a given other expression
cannot point to the same memory as __s does, it is free to do so and
optimize out stores through unrelated pointers that cannot affect __s.
Conversely, a "memory" clobber tells the compiler that it must assume that
all memory any pointer points to might be read by this asm.
> I'm asking, because I've found that this "m"-way leads GCC to produce
> an unoptimal enough assembler, while "memory" code is ok.
That is an issue for GCC. It is correct for glibc (and other code) to use
the asm constraints that express the true precise set of constraints and
tell the compiler it is free to do us much as is in fact safe. As to
whether your "+&r" constraints on the pointer values are correct, I don't know.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: i386 inline-asm string functions - some questions
2003-12-25 0:39 ` Roland McGrath
@ 2003-12-25 1:13 ` Denis Zaitsev
0 siblings, 0 replies; 33+ messages in thread
From: Denis Zaitsev @ 2003-12-25 1:13 UTC (permalink / raw)
To: Andreas Jaeger; +Cc: Richard Henderson, libc-alpha, linux-gcc, gcc
On Wed, Dec 24, 2003 at 04:39:37PM -0800, Roland McGrath wrote:
> > >From some moment in the past, the next input parameters are used here
> > and there in sysdeps/i386/i486/bits/string.h:
> >
> > "m" ( *(struct { char __x[0xfffffff]; } *)__s)
> >
> > When I was seeking for the reasons to do so, I've found some
> > discussions about this in libc-alpha and gcc mailing lists. As I
> > understand from there, there are an options - to use the "m" arg(s)
> > shown above or just to use "memory" in the list of a clobbered
> > registers. So, the question is: why the "m"-way had been choosen?
>
> The reason we use this kind of "m" constraint is that it indicates what we
> want to say: memory __s points to might be used. That means that if the C
> aliasing rules allow the compiler to assume that a given other expression
> cannot point to the same memory as __s does, it is free to do so and
> optimize out stores through unrelated pointers that cannot affect __s.
> Conversely, a "memory" clobber tells the compiler that it must assume that
> all memory any pointer points to might be read by this asm.
Yes, I understand all this. But in that discussion in past the
similar question was already touched - the "precise" "m"-way doesn't
get any real-life benefit.
> > I'm asking, because I've found that this "m"-way leads GCC to produce
> > an unoptimal enough assembler, while "memory" code is ok.
>
> That is an issue for GCC.
Yes, I have no way other than just to agree...
> It is correct for glibc (and other code) to use the asm constraints
> that express the true precise set of constraints and tell the
> compiler it is free to do us much as is in fact safe. As to whether
> your "+&r" constraints on the pointer values are correct, I don't
> know.
"+&r" is just a shortcut for :"=&r":"0":. It's correct and doesn't
touch the "m" issue.
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2003-12-29 19:37 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-25 0:20 i386 inline-asm string functions - some questions Denis Zaitsev
2003-12-25 0:38 ` Richard Henderson
2003-12-25 1:15 ` Denis Zaitsev
2003-12-25 1:21 ` Zack Weinberg
2003-12-25 1:45 ` Denis Zaitsev
2003-12-26 3:40 ` Zack Weinberg
2003-12-27 4:58 ` Richard Henderson
2003-12-27 10:24 ` Zack Weinberg
2003-12-27 11:35 ` Denis Zaitsev
2003-12-27 18:38 ` Zack Weinberg
2003-12-28 20:58 ` Denis Zaitsev
2003-12-29 2:22 ` Zack Weinberg
2003-12-29 2:44 ` Denis Zaitsev
2003-12-29 2:46 ` Zack Weinberg
2003-12-29 2:53 ` Denis Zaitsev
2003-12-29 3:35 ` Ulrich Drepper
2003-12-29 3:54 ` Andrew Pinski
2003-12-29 6:57 ` Jakub Jelinek
2003-12-29 3:56 ` Zack Weinberg
2003-12-29 5:31 ` Daniel Jacobowitz
2003-12-29 5:55 ` Zack Weinberg
2003-12-29 18:37 ` Denis Zaitsev
2003-12-29 19:09 ` Zack Weinberg
2003-12-29 19:31 ` Denis Zaitsev
2003-12-29 19:37 ` Denis Zaitsev
2003-12-29 18:51 ` Denis Zaitsev
2003-12-29 19:15 ` Zack Weinberg
2003-12-27 10:52 ` Denis Zaitsev
[not found] ` <20031225060850.C7419@zzz.ward.six>
[not found] ` <20031225012711.GD13447@redhat.com>
2003-12-25 1:38 ` Denis Zaitsev
2003-12-25 1:53 ` Richard Henderson
2003-12-25 2:08 ` Denis Zaitsev
2003-12-25 0:39 ` Roland McGrath
2003-12-25 1:13 ` Denis Zaitsev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).