All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] llimd.
@ 2008-10-28 19:54 Gilles Chanteperdrix
  2008-10-28 21:00 ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2008-10-28 19:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core


Hi Jan,

I see that the implementation of rthal_llmulshft seems to account for
the first argument sign. Does it work ? Namely, in the generic
implementation will __rthal_u96shift propagate the sign bit ?

If yes, do you see a way llimd could be made to work the same way ? This
way we would avoid inline ullimd twice in llimd code.

Regards.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-28 19:54 [Xenomai-core] llimd Gilles Chanteperdrix
@ 2008-10-28 21:00 ` Jan Kiszka
  2008-10-30 10:02   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2008-10-28 21:00 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

Gilles Chanteperdrix wrote:
> Hi Jan,
> 
> I see that the implementation of rthal_llmulshft seems to account for
> the first argument sign. Does it work ? Namely, in the generic
> implementation will __rthal_u96shift propagate the sign bit ?

Yes, this works (given there is no overflow, of course). If you consider
a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
because we replace a right shift with a left shift (plus some OR'ing
later on). If we had to do a real right shift, we would also have to
take signed vs. unsigned into account (ie. shift in zeros or the sign
bit from the left?).

> 
> If yes, do you see a way llimd could be made to work the same way ? This
> way we would avoid inline ullimd twice in llimd code.

As the basic building block here is a multiplication, we cannot get
around telling apart signed from unsigned (or converting signed into
unsigned): the underlying multiplication logic is different.

But what about this approach:

static inline __attribute__((__const__)) long long
__rthal_generic_llimd (long long op, unsigned m, unsigned d)
{
	int signed = 0;
	long long ret;

	if (op < 0LL) {
		op = -op;
		signed = 1;
	}
	ret = __rthal_generic_ullimd(op, m, d);
	return signed ? -ret : ret;
}

However, I guess writing this in assembly for archs that suffer should
be more efficient.

Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-28 21:00 ` Jan Kiszka
@ 2008-10-30 10:02   ` Gilles Chanteperdrix
  2008-10-31  8:18     ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2008-10-30 10:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Hi Jan,
>>
>> I see that the implementation of rthal_llmulshft seems to account for
>> the first argument sign. Does it work ? Namely, in the generic
>> implementation will __rthal_u96shift propagate the sign bit ?
> 
> Yes, this works (given there is no overflow, of course). If you consider
> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
> because we replace a right shift with a left shift (plus some OR'ing
> later on). If we had to do a real right shift, we would also have to
> take signed vs. unsigned into account (ie. shift in zeros or the sign
> bit from the left?).
> 
>> If yes, do you see a way llimd could be made to work the same way ? This
>> way we would avoid inline ullimd twice in llimd code.
> 
> As the basic building block here is a multiplication, we cannot get
> around telling apart signed from unsigned (or converting signed into
> unsigned): the underlying multiplication logic is different.
> 
> But what about this approach:
> 
> static inline __attribute__((__const__)) long long
> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
> {
> 	int signed = 0;
> 	long long ret;
> 
> 	if (op < 0LL) {
> 		op = -op;
> 		signed = 1;
> 	}
> 	ret = __rthal_generic_ullimd(op, m, d);
> 	return signed ? -ret : ret;
> }
> 
> However, I guess writing this in assembly for archs that suffer should
> be more efficient.

Hi Jan,

You may have noticed that we played a bit with arithmetic operations
(namely, we use an llimd without division to make the reverse of
llmulshft), and it pays off on slow machines, such as ARM, where the
division is done in software.

At this chance, I looked at the code generated by this soluion, and I am
not sure that it is better: on ARM, and I suspect this is true on other
architectures, the operations needed to negate a long long clobbers the
code conditions, which means we can not make these operations
conditionals without a conditional jump, so the hand-coded assembler is
not better than what the compiler does: it uses two conditional jumps
whereas the original solution uses only one. Of course we could set sign
to -1 or 1, and multiply by sign at the end, but the multiplication is
probably even heavier than conditional jump.

So, would you have any idea of a better solution ?

-- 
                                                 Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-30 10:02   ` Gilles Chanteperdrix
@ 2008-10-31  8:18     ` Jan Kiszka
  2008-10-31 10:05       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2008-10-31  8:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

[-- Attachment #1: Type: text/plain, Size: 2922 bytes --]

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Hi Jan,
>>>
>>> I see that the implementation of rthal_llmulshft seems to account for
>>> the first argument sign. Does it work ? Namely, in the generic
>>> implementation will __rthal_u96shift propagate the sign bit ?
>> Yes, this works (given there is no overflow, of course). If you consider
>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>> because we replace a right shift with a left shift (plus some OR'ing
>> later on). If we had to do a real right shift, we would also have to
>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>> bit from the left?).
>>
>>> If yes, do you see a way llimd could be made to work the same way ? This
>>> way we would avoid inline ullimd twice in llimd code.
>> As the basic building block here is a multiplication, we cannot get
>> around telling apart signed from unsigned (or converting signed into
>> unsigned): the underlying multiplication logic is different.
>>
>> But what about this approach:
>>
>> static inline __attribute__((__const__)) long long
>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>> {
>> 	int signed = 0;
>> 	long long ret;
>>
>> 	if (op < 0LL) {
>> 		op = -op;
>> 		signed = 1;
>> 	}
>> 	ret = __rthal_generic_ullimd(op, m, d);
>> 	return signed ? -ret : ret;
>> }
>>
>> However, I guess writing this in assembly for archs that suffer should
>> be more efficient.
> 
> Hi Jan,
> 
> You may have noticed that we played a bit with arithmetic operations
> (namely, we use an llimd without division to make the reverse of
> llmulshft), and it pays off on slow machines, such as ARM, where the
> division is done in software.
> 
> At this chance, I looked at the code generated by this soluion, and I am
> not sure that it is better: on ARM, and I suspect this is true on other
> architectures, the operations needed to negate a long long clobbers the
> code conditions, which means we can not make these operations
> conditionals without a conditional jump, so the hand-coded assembler is
> not better than what the compiler does: it uses two conditional jumps
> whereas the original solution uses only one. Of course we could set sign
> to -1 or 1, and multiply by sign at the end, but the multiplication is
> probably even heavier than conditional jump.

Yes, on the archs that matter here (32-bit).

> 
> So, would you have any idea of a better solution ?

In an assembly version, one could save 'sign' in form of a jump target
that should be taken after __rthal_generic_ullimd (ie. jump to the
negation, or jump over it). Specifically when that address is kept in a
register, I think smart branch prediction units will be able to do the
right forecast.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-31  8:18     ` Jan Kiszka
@ 2008-10-31 10:05       ` Gilles Chanteperdrix
  2008-10-31 10:29         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2008-10-31 10:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Hi Jan,
>>>>
>>>> I see that the implementation of rthal_llmulshft seems to account for
>>>> the first argument sign. Does it work ? Namely, in the generic
>>>> implementation will __rthal_u96shift propagate the sign bit ?
>>> Yes, this works (given there is no overflow, of course). If you consider
>>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>>> because we replace a right shift with a left shift (plus some OR'ing
>>> later on). If we had to do a real right shift, we would also have to
>>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>>> bit from the left?).
>>>
>>>> If yes, do you see a way llimd could be made to work the same way ? This
>>>> way we would avoid inline ullimd twice in llimd code.
>>> As the basic building block here is a multiplication, we cannot get
>>> around telling apart signed from unsigned (or converting signed into
>>> unsigned): the underlying multiplication logic is different.
>>>
>>> But what about this approach:
>>>
>>> static inline __attribute__((__const__)) long long
>>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>>> {
>>> 	int signed = 0;
>>> 	long long ret;
>>>
>>> 	if (op < 0LL) {
>>> 		op = -op;
>>> 		signed = 1;
>>> 	}
>>> 	ret = __rthal_generic_ullimd(op, m, d);
>>> 	return signed ? -ret : ret;
>>> }
>>>
>>> However, I guess writing this in assembly for archs that suffer should
>>> be more efficient.
>> Hi Jan,
>>
>> You may have noticed that we played a bit with arithmetic operations
>> (namely, we use an llimd without division to make the reverse of
>> llmulshft), and it pays off on slow machines, such as ARM, where the
>> division is done in software.
>>
>> At this chance, I looked at the code generated by this soluion, and I am
>> not sure that it is better: on ARM, and I suspect this is true on other
>> architectures, the operations needed to negate a long long clobbers the
>> code conditions, which means we can not make these operations
>> conditionals without a conditional jump, so the hand-coded assembler is
>> not better than what the compiler does: it uses two conditional jumps
>> whereas the original solution uses only one. Of course we could set sign
>> to -1 or 1, and multiply by sign at the end, but the multiplication is
>> probably even heavier than conditional jump.
> 
> Yes, on the archs that matter here (32-bit).
> 
>> So, would you have any idea of a better solution ?
> 
> In an assembly version, one could save 'sign' in form of a jump target
> that should be taken after __rthal_generic_ullimd (ie. jump to the
> negation, or jump over it). Specifically when that address is kept in a
> register, I think smart branch prediction units will be able to do the
> right forecast.

Good idea, there is even a gcc extension which allows to do this in the
generic section:

static inline __attribute__((__const__)) long long
__rthal_generic_llimd (long long op, unsigned m, unsigned d)
{
 	void *epilogue;
 	long long ret;

 	if (op < 0LL) {
 		op = -op;
 		epilogue = &&ret_neg;
 	} else
		epilogue = &&ret_unchanged;
 	ret = __rthal_generic_ullimd(op, m, d);
	goto *epilogue;
ret_unchanged:
	return ret;
ret_neg:
 	return -ret;
}


-- 
                                                 Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-31 10:05       ` Gilles Chanteperdrix
@ 2008-10-31 10:29         ` Gilles Chanteperdrix
  2008-10-31 10:45           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2008-10-31 10:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Hi Jan,
>>>>>
>>>>> I see that the implementation of rthal_llmulshft seems to account for
>>>>> the first argument sign. Does it work ? Namely, in the generic
>>>>> implementation will __rthal_u96shift propagate the sign bit ?
>>>> Yes, this works (given there is no overflow, of course). If you consider
>>>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>>>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>>>> because we replace a right shift with a left shift (plus some OR'ing
>>>> later on). If we had to do a real right shift, we would also have to
>>>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>>>> bit from the left?).
>>>>
>>>>> If yes, do you see a way llimd could be made to work the same way ? This
>>>>> way we would avoid inline ullimd twice in llimd code.
>>>> As the basic building block here is a multiplication, we cannot get
>>>> around telling apart signed from unsigned (or converting signed into
>>>> unsigned): the underlying multiplication logic is different.
>>>>
>>>> But what about this approach:
>>>>
>>>> static inline __attribute__((__const__)) long long
>>>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>>>> {
>>>> 	int signed = 0;
>>>> 	long long ret;
>>>>
>>>> 	if (op < 0LL) {
>>>> 		op = -op;
>>>> 		signed = 1;
>>>> 	}
>>>> 	ret = __rthal_generic_ullimd(op, m, d);
>>>> 	return signed ? -ret : ret;
>>>> }
>>>>
>>>> However, I guess writing this in assembly for archs that suffer should
>>>> be more efficient.
>>> Hi Jan,
>>>
>>> You may have noticed that we played a bit with arithmetic operations
>>> (namely, we use an llimd without division to make the reverse of
>>> llmulshft), and it pays off on slow machines, such as ARM, where the
>>> division is done in software.
>>>
>>> At this chance, I looked at the code generated by this soluion, and I am
>>> not sure that it is better: on ARM, and I suspect this is true on other
>>> architectures, the operations needed to negate a long long clobbers the
>>> code conditions, which means we can not make these operations
>>> conditionals without a conditional jump, so the hand-coded assembler is
>>> not better than what the compiler does: it uses two conditional jumps
>>> whereas the original solution uses only one. Of course we could set sign
>>> to -1 or 1, and multiply by sign at the end, but the multiplication is
>>> probably even heavier than conditional jump.
>> Yes, on the archs that matter here (32-bit).
>>
>>> So, would you have any idea of a better solution ?
>> In an assembly version, one could save 'sign' in form of a jump target
>> that should be taken after __rthal_generic_ullimd (ie. jump to the
>> negation, or jump over it). Specifically when that address is kept in a
>> register, I think smart branch prediction units will be able to do the
>> right forecast.
> 
> Good idea, there is even a gcc extension which allows to do this in the
> generic section:
> 
> static inline __attribute__((__const__)) long long
> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
> {
>  	void *epilogue;
>  	long long ret;
> 
>  	if (op < 0LL) {
>  		op = -op;
>  		epilogue = &&ret_neg;
>  	} else
> 		epilogue = &&ret_unchanged;
>  	ret = __rthal_generic_ullimd(op, m, d);
> 	goto *epilogue;
> ret_unchanged:
> 	return ret;
> ret_neg:
>  	return -ret;
> }

This works as expected on ARM, however, gcc 4.0 on x86 generates two
calls to __rthal_generic_ullimd with the indirect jump after each one.
It seems it has stopped half-way when "optimizing"...

-- 
                                                 Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-31 10:29         ` Gilles Chanteperdrix
@ 2008-10-31 10:45           ` Gilles Chanteperdrix
  2008-10-31 11:26             ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2008-10-31 10:45 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> Hi Jan,
>>>>>>
>>>>>> I see that the implementation of rthal_llmulshft seems to account for
>>>>>> the first argument sign. Does it work ? Namely, in the generic
>>>>>> implementation will __rthal_u96shift propagate the sign bit ?
>>>>> Yes, this works (given there is no overflow, of course). If you consider
>>>>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>>>>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>>>>> because we replace a right shift with a left shift (plus some OR'ing
>>>>> later on). If we had to do a real right shift, we would also have to
>>>>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>>>>> bit from the left?).
>>>>>
>>>>>> If yes, do you see a way llimd could be made to work the same way ? This
>>>>>> way we would avoid inline ullimd twice in llimd code.
>>>>> As the basic building block here is a multiplication, we cannot get
>>>>> around telling apart signed from unsigned (or converting signed into
>>>>> unsigned): the underlying multiplication logic is different.
>>>>>
>>>>> But what about this approach:
>>>>>
>>>>> static inline __attribute__((__const__)) long long
>>>>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>>>>> {
>>>>> 	int signed = 0;
>>>>> 	long long ret;
>>>>>
>>>>> 	if (op < 0LL) {
>>>>> 		op = -op;
>>>>> 		signed = 1;
>>>>> 	}
>>>>> 	ret = __rthal_generic_ullimd(op, m, d);
>>>>> 	return signed ? -ret : ret;
>>>>> }
>>>>>
>>>>> However, I guess writing this in assembly for archs that suffer should
>>>>> be more efficient.
>>>> Hi Jan,
>>>>
>>>> You may have noticed that we played a bit with arithmetic operations
>>>> (namely, we use an llimd without division to make the reverse of
>>>> llmulshft), and it pays off on slow machines, such as ARM, where the
>>>> division is done in software.
>>>>
>>>> At this chance, I looked at the code generated by this soluion, and I am
>>>> not sure that it is better: on ARM, and I suspect this is true on other
>>>> architectures, the operations needed to negate a long long clobbers the
>>>> code conditions, which means we can not make these operations
>>>> conditionals without a conditional jump, so the hand-coded assembler is
>>>> not better than what the compiler does: it uses two conditional jumps
>>>> whereas the original solution uses only one. Of course we could set sign
>>>> to -1 or 1, and multiply by sign at the end, but the multiplication is
>>>> probably even heavier than conditional jump.
>>> Yes, on the archs that matter here (32-bit).
>>>
>>>> So, would you have any idea of a better solution ?
>>> In an assembly version, one could save 'sign' in form of a jump target
>>> that should be taken after __rthal_generic_ullimd (ie. jump to the
>>> negation, or jump over it). Specifically when that address is kept in a
>>> register, I think smart branch prediction units will be able to do the
>>> right forecast.
>> Good idea, there is even a gcc extension which allows to do this in the
>> generic section:
>>
>> static inline __attribute__((__const__)) long long
>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>> {
>>  	void *epilogue;
>>  	long long ret;
>>
>>  	if (op < 0LL) {
>>  		op = -op;
>>  		epilogue = &&ret_neg;
>>  	} else
>> 		epilogue = &&ret_unchanged;
>>  	ret = __rthal_generic_ullimd(op, m, d);
>> 	goto *epilogue;
>> ret_unchanged:
>> 	return ret;
>> ret_neg:
>>  	return -ret;
>> }
> 
> This works as expected on ARM, however, gcc 4.0 on x86 generates two
> calls to __rthal_generic_ullimd with the indirect jump after each one.
> It seems it has stopped half-way when "optimizing"...

Actually, gcc does the right thing if the implementation of
__rthal_generic_ullimd is not trivial.

-- 
                                                 Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] llimd.
  2008-10-31 10:45           ` Gilles Chanteperdrix
@ 2008-10-31 11:26             ` Jan Kiszka
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2008-10-31 11:26 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Gilles Chanteperdrix wrote:
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> I see that the implementation of rthal_llmulshft seems to account for
>>>>>>> the first argument sign. Does it work ? Namely, in the generic
>>>>>>> implementation will __rthal_u96shift propagate the sign bit ?
>>>>>> Yes, this works (given there is no overflow, of course). If you consider
>>>>>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>>>>>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>>>>>> because we replace a right shift with a left shift (plus some OR'ing
>>>>>> later on). If we had to do a real right shift, we would also have to
>>>>>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>>>>>> bit from the left?).
>>>>>>
>>>>>>> If yes, do you see a way llimd could be made to work the same way ? This
>>>>>>> way we would avoid inline ullimd twice in llimd code.
>>>>>> As the basic building block here is a multiplication, we cannot get
>>>>>> around telling apart signed from unsigned (or converting signed into
>>>>>> unsigned): the underlying multiplication logic is different.
>>>>>>
>>>>>> But what about this approach:
>>>>>>
>>>>>> static inline __attribute__((__const__)) long long
>>>>>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>>>>>> {
>>>>>> 	int signed = 0;
>>>>>> 	long long ret;
>>>>>>
>>>>>> 	if (op < 0LL) {
>>>>>> 		op = -op;
>>>>>> 		signed = 1;
>>>>>> 	}
>>>>>> 	ret = __rthal_generic_ullimd(op, m, d);
>>>>>> 	return signed ? -ret : ret;
>>>>>> }
>>>>>>
>>>>>> However, I guess writing this in assembly for archs that suffer should
>>>>>> be more efficient.
>>>>> Hi Jan,
>>>>>
>>>>> You may have noticed that we played a bit with arithmetic operations
>>>>> (namely, we use an llimd without division to make the reverse of
>>>>> llmulshft), and it pays off on slow machines, such as ARM, where the
>>>>> division is done in software.
>>>>>
>>>>> At this chance, I looked at the code generated by this soluion, and I am
>>>>> not sure that it is better: on ARM, and I suspect this is true on other
>>>>> architectures, the operations needed to negate a long long clobbers the
>>>>> code conditions, which means we can not make these operations
>>>>> conditionals without a conditional jump, so the hand-coded assembler is
>>>>> not better than what the compiler does: it uses two conditional jumps
>>>>> whereas the original solution uses only one. Of course we could set sign
>>>>> to -1 or 1, and multiply by sign at the end, but the multiplication is
>>>>> probably even heavier than conditional jump.
>>>> Yes, on the archs that matter here (32-bit).
>>>>
>>>>> So, would you have any idea of a better solution ?
>>>> In an assembly version, one could save 'sign' in form of a jump target
>>>> that should be taken after __rthal_generic_ullimd (ie. jump to the
>>>> negation, or jump over it). Specifically when that address is kept in a
>>>> register, I think smart branch prediction units will be able to do the
>>>> right forecast.
>>> Good idea, there is even a gcc extension which allows to do this in the
>>> generic section:
>>>
>>> static inline __attribute__((__const__)) long long
>>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>>> {
>>>  	void *epilogue;
>>>  	long long ret;
>>>
>>>  	if (op < 0LL) {
>>>  		op = -op;
>>>  		epilogue = &&ret_neg;
>>>  	} else
>>> 		epilogue = &&ret_unchanged;
>>>  	ret = __rthal_generic_ullimd(op, m, d);
>>> 	goto *epilogue;
>>> ret_unchanged:
>>> 	return ret;
>>> ret_neg:
>>>  	return -ret;
>>> }
>> This works as expected on ARM, however, gcc 4.0 on x86 generates two
>> calls to __rthal_generic_ullimd with the indirect jump after each one.
>> It seems it has stopped half-way when "optimizing"...
> 
> Actually, gcc does the right thing if the implementation of
> __rthal_generic_ullimd is not trivial.

I think in that case un-inlining __rthal_generic_ullimd, keeping only
the two different call paths inlined, should be better anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-10-31 11:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-28 19:54 [Xenomai-core] llimd Gilles Chanteperdrix
2008-10-28 21:00 ` Jan Kiszka
2008-10-30 10:02   ` Gilles Chanteperdrix
2008-10-31  8:18     ` Jan Kiszka
2008-10-31 10:05       ` Gilles Chanteperdrix
2008-10-31 10:29         ` Gilles Chanteperdrix
2008-10-31 10:45           ` Gilles Chanteperdrix
2008-10-31 11:26             ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.