* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
[not found] <E1U2RE9-0007Zc-Em@xenomai.org>
@ 2013-04-13 16:11 ` Gilles Chanteperdrix
2013-04-15 22:02 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Gilles Chanteperdrix @ 2013-04-13 16:11 UTC (permalink / raw)
To: xenomai
On 02/04/2013 07:57 PM, GIT version control wrote:
> Module: xenomai-2.6
> Branch: master
> Commit: 192597326a0becd1980cb6c5cc9395af18a19c60
> URL: http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=192597326a0becd1980cb6c5cc9395af18a19c60
>
> Author: Jan Kiszka <jan.kiszka@siemens.com>
> Date: Tue Jan 29 18:46:13 2013 +0100
>
> switchtest: Add SSE and AVX check
>
> Add a test for switching the lower SSE registers xmm0..7 or AVX
> registers ymm0..7, provided the CPU supports the corresponding
> feature. As xmm and ymm share their storage, we only need to check
> one of the features.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>
> ---
> static inline unsigned fp_regs_check(unsigned val)
> {
> unsigned i, result = val;
> + uint64_t vec[8][4];
> unsigned e[8];
>
> for (i = 0; i < 8; i++)
> __asm__ __volatile__("fistpl %0":"=m"(e[7 - i]));
> + if (fp_features & FP_FEATURE_AVX) {
> + __asm__ __volatile__(
> + "vmovupd %%ymm0,%0;"
> + "vmovupd %%ymm1,%1;"
> + "vmovupd %%ymm2,%2;"
> + "vmovupd %%ymm3,%3;"
> + "vmovupd %%ymm4,%4;"
> + "vmovupd %%ymm5,%5;"
> + "vmovupd %%ymm6,%6;"
> + "vmovupd %%ymm7,%7;"
> + :
> + : "m" (vec[0][0]), "m" (vec[1][0]),
> + "m" (vec[2][0]), "m" (vec[3][0]),
> + "m" (vec[4][0]), "m" (vec[5][0]),
> + "m" (vec[6][0]), "m" (vec[7][0]));
> + } else if (fp_features & FP_FEATURE_SSE) {
> + __asm__ __volatile__(
> + "movupd %%xmm0,%0;"
> + "movupd %%xmm1,%1;"
> + "movupd %%xmm2,%2;"
> + "movupd %%xmm3,%3;"
> + "movupd %%xmm4,%4;"
> + "movupd %%xmm5,%5;"
> + "movupd %%xmm6,%6;"
> + "movupd %%xmm7,%7;"
> + :
> + : "m" (vec[0][0]), "m" (vec[1][0]),
> + "m" (vec[2][0]), "m" (vec[3][0]),
> + "m" (vec[4][0]), "m" (vec[5][0]),
> + "m" (vec[6][0]), "m" (vec[7][0]));
> + }
>
> for (i = 0; i < 8; i++)
> if (e[i] != val) {
> @@ -65,8 +148,33 @@ static inline unsigned fp_regs_check(unsigned val)
> result = e[i];
> }
>
> + if (fp_features & FP_FEATURE_AVX) {
> + for (i = 0; i < 8; i++) {
> + int error = 0;
> + if (vec[i][0] != val) {
> + result = vec[i][0];
> + error = 1;
> + }
> + if (vec[i][2] != val) {
> + result = vec[i][2];
> + error = 1;
> + }
> + if (error)
> + printk("ymm%d: %llu/%llu != %u/%u\n",
> + i, (unsigned long long)vec[i][0],
> + (unsigned long long)vec[i][2],
> + val, val);
> + }
> + } else if (fp_features & FP_FEATURE_SSE) {
> + for (i = 0; i < 8; i++)
> + if (vec[i][0] != val) {
> + printk("xmm%d: %llu != %u\n",
> + i, (unsigned long long)vec[i][0], val);
> + result = vec[i][0];
> + }
> + }
> +
> return result;
> }
This routine causes a warning from gcc and looks indeed wrong: if the
"vec" variable is used as an output variable of the inline assembly, it
should be in the output section of the inline assembly, not the input
section.
--
Gilles.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
2013-04-13 16:11 ` [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check Gilles Chanteperdrix
@ 2013-04-15 22:02 ` Jan Kiszka
2013-04-16 7:24 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2013-04-15 22:02 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On 2013-04-13 18:11, Gilles Chanteperdrix wrote:
> On 02/04/2013 07:57 PM, GIT version control wrote:
>
>> Module: xenomai-2.6
>> Branch: master
>> Commit: 192597326a0becd1980cb6c5cc9395af18a19c60
>> URL: http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=192597326a0becd1980cb6c5cc9395af18a19c60
>>
>> Author: Jan Kiszka <jan.kiszka@siemens.com>
>> Date: Tue Jan 29 18:46:13 2013 +0100
>>
>> switchtest: Add SSE and AVX check
>>
>> Add a test for switching the lower SSE registers xmm0..7 or AVX
>> registers ymm0..7, provided the CPU supports the corresponding
>> feature. As xmm and ymm share their storage, we only need to check
>> one of the features.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> ---
>
>
>> static inline unsigned fp_regs_check(unsigned val)
>> {
>> unsigned i, result = val;
>> + uint64_t vec[8][4];
>> unsigned e[8];
>>
>> for (i = 0; i < 8; i++)
>> __asm__ __volatile__("fistpl %0":"=m"(e[7 - i]));
>> + if (fp_features & FP_FEATURE_AVX) {
>> + __asm__ __volatile__(
>> + "vmovupd %%ymm0,%0;"
>> + "vmovupd %%ymm1,%1;"
>> + "vmovupd %%ymm2,%2;"
>> + "vmovupd %%ymm3,%3;"
>> + "vmovupd %%ymm4,%4;"
>> + "vmovupd %%ymm5,%5;"
>> + "vmovupd %%ymm6,%6;"
>> + "vmovupd %%ymm7,%7;"
>> + :
>> + : "m" (vec[0][0]), "m" (vec[1][0]),
>> + "m" (vec[2][0]), "m" (vec[3][0]),
>> + "m" (vec[4][0]), "m" (vec[5][0]),
>> + "m" (vec[6][0]), "m" (vec[7][0]));
>> + } else if (fp_features & FP_FEATURE_SSE) {
>> + __asm__ __volatile__(
>> + "movupd %%xmm0,%0;"
>> + "movupd %%xmm1,%1;"
>> + "movupd %%xmm2,%2;"
>> + "movupd %%xmm3,%3;"
>> + "movupd %%xmm4,%4;"
>> + "movupd %%xmm5,%5;"
>> + "movupd %%xmm6,%6;"
>> + "movupd %%xmm7,%7;"
>> + :
>> + : "m" (vec[0][0]), "m" (vec[1][0]),
>> + "m" (vec[2][0]), "m" (vec[3][0]),
>> + "m" (vec[4][0]), "m" (vec[5][0]),
>> + "m" (vec[6][0]), "m" (vec[7][0]));
>> + }
>>
>> for (i = 0; i < 8; i++)
>> if (e[i] != val) {
>> @@ -65,8 +148,33 @@ static inline unsigned fp_regs_check(unsigned val)
>> result = e[i];
>> }
>>
>> + if (fp_features & FP_FEATURE_AVX) {
>> + for (i = 0; i < 8; i++) {
>> + int error = 0;
>> + if (vec[i][0] != val) {
>> + result = vec[i][0];
>> + error = 1;
>> + }
>> + if (vec[i][2] != val) {
>> + result = vec[i][2];
>> + error = 1;
>> + }
>> + if (error)
>> + printk("ymm%d: %llu/%llu != %u/%u\n",
>> + i, (unsigned long long)vec[i][0],
>> + (unsigned long long)vec[i][2],
>> + val, val);
>> + }
>> + } else if (fp_features & FP_FEATURE_SSE) {
>> + for (i = 0; i < 8; i++)
>> + if (vec[i][0] != val) {
>> + printk("xmm%d: %llu != %u\n",
>> + i, (unsigned long long)vec[i][0], val);
>> + result = vec[i][0];
>> + }
>> + }
>> +
>> return result;
>> }
>
>
> This routine causes a warning from gcc and looks indeed wrong: if the
> "vec" variable is used as an output variable of the inline assembly, it
> should be in the output section of the inline assembly, not the input
> section.
>
Yes, seems wrong. Will try to look into it the next days.
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20130416/fdb62eb0/attachment.pgp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
2013-04-15 22:02 ` Jan Kiszka
@ 2013-04-16 7:24 ` Jan Kiszka
2013-04-16 7:32 ` Gilles Chanteperdrix
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2013-04-16 7:24 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On 2013-04-16 00:02, Jan Kiszka wrote:
> On 2013-04-13 18:11, Gilles Chanteperdrix wrote:
>> On 02/04/2013 07:57 PM, GIT version control wrote:
>>
>>> Module: xenomai-2.6
>>> Branch: master
>>> Commit: 192597326a0becd1980cb6c5cc9395af18a19c60
>>> URL: http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=192597326a0becd1980cb6c5cc9395af18a19c60
>>>
>>> Author: Jan Kiszka <jan.kiszka@siemens.com>
>>> Date: Tue Jan 29 18:46:13 2013 +0100
>>>
>>> switchtest: Add SSE and AVX check
>>>
>>> Add a test for switching the lower SSE registers xmm0..7 or AVX
>>> registers ymm0..7, provided the CPU supports the corresponding
>>> feature. As xmm and ymm share their storage, we only need to check
>>> one of the features.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> ---
>>
>>
>>> static inline unsigned fp_regs_check(unsigned val)
>>> {
>>> unsigned i, result = val;
>>> + uint64_t vec[8][4];
>>> unsigned e[8];
>>>
>>> for (i = 0; i < 8; i++)
>>> __asm__ __volatile__("fistpl %0":"=m"(e[7 - i]));
>>> + if (fp_features & FP_FEATURE_AVX) {
>>> + __asm__ __volatile__(
>>> + "vmovupd %%ymm0,%0;"
>>> + "vmovupd %%ymm1,%1;"
>>> + "vmovupd %%ymm2,%2;"
>>> + "vmovupd %%ymm3,%3;"
>>> + "vmovupd %%ymm4,%4;"
>>> + "vmovupd %%ymm5,%5;"
>>> + "vmovupd %%ymm6,%6;"
>>> + "vmovupd %%ymm7,%7;"
>>> + :
>>> + : "m" (vec[0][0]), "m" (vec[1][0]),
>>> + "m" (vec[2][0]), "m" (vec[3][0]),
>>> + "m" (vec[4][0]), "m" (vec[5][0]),
>>> + "m" (vec[6][0]), "m" (vec[7][0]));
>>> + } else if (fp_features & FP_FEATURE_SSE) {
>>> + __asm__ __volatile__(
>>> + "movupd %%xmm0,%0;"
>>> + "movupd %%xmm1,%1;"
>>> + "movupd %%xmm2,%2;"
>>> + "movupd %%xmm3,%3;"
>>> + "movupd %%xmm4,%4;"
>>> + "movupd %%xmm5,%5;"
>>> + "movupd %%xmm6,%6;"
>>> + "movupd %%xmm7,%7;"
>>> + :
>>> + : "m" (vec[0][0]), "m" (vec[1][0]),
>>> + "m" (vec[2][0]), "m" (vec[3][0]),
>>> + "m" (vec[4][0]), "m" (vec[5][0]),
>>> + "m" (vec[6][0]), "m" (vec[7][0]));
>>> + }
>>>
>>> for (i = 0; i < 8; i++)
>>> if (e[i] != val) {
>>> @@ -65,8 +148,33 @@ static inline unsigned fp_regs_check(unsigned val)
>>> result = e[i];
>>> }
>>>
>>> + if (fp_features & FP_FEATURE_AVX) {
>>> + for (i = 0; i < 8; i++) {
>>> + int error = 0;
>>> + if (vec[i][0] != val) {
>>> + result = vec[i][0];
>>> + error = 1;
>>> + }
>>> + if (vec[i][2] != val) {
>>> + result = vec[i][2];
>>> + error = 1;
>>> + }
>>> + if (error)
>>> + printk("ymm%d: %llu/%llu != %u/%u\n",
>>> + i, (unsigned long long)vec[i][0],
>>> + (unsigned long long)vec[i][2],
>>> + val, val);
>>> + }
>>> + } else if (fp_features & FP_FEATURE_SSE) {
>>> + for (i = 0; i < 8; i++)
>>> + if (vec[i][0] != val) {
>>> + printk("xmm%d: %llu != %u\n",
>>> + i, (unsigned long long)vec[i][0], val);
>>> + result = vec[i][0];
>>> + }
>>> + }
>>> +
>>> return result;
>>> }
>>
>>
>> This routine causes a warning from gcc and looks indeed wrong: if the
>> "vec" variable is used as an output variable of the inline assembly, it
>> should be in the output section of the inline assembly, not the input
>> section.
>>
>
> Yes, seems wrong. Will try to look into it the next days.
Done, you can find the obvious fix in my for-upstream queue.
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20130416/7ad3152c/attachment.pgp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
2013-04-16 7:24 ` Jan Kiszka
@ 2013-04-16 7:32 ` Gilles Chanteperdrix
2013-04-16 7:34 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Gilles Chanteperdrix @ 2013-04-16 7:32 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai
On 04/16/2013 09:24 AM, Jan Kiszka wrote:
> On 2013-04-16 00:02, Jan Kiszka wrote:
>> On 2013-04-13 18:11, Gilles Chanteperdrix wrote:
>>> This routine causes a warning from gcc and looks indeed wrong: if the
>>> "vec" variable is used as an output variable of the inline assembly, it
>>> should be in the output section of the inline assembly, not the input
>>> section.
>>>
>>
>> Yes, seems wrong. Will try to look into it the next days.
>
> Done, you can find the obvious fix in my for-upstream queue.
Hi Jan,
yes the fix is obvious, however the fact that machines with AVX support
still pass xeno-regression-test is not, have you checked that?
Regards.
--
Gilles.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
2013-04-16 7:32 ` Gilles Chanteperdrix
@ 2013-04-16 7:34 ` Jan Kiszka
2013-04-16 7:53 ` Gilles Chanteperdrix
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2013-04-16 7:34 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On 2013-04-16 09:32, Gilles Chanteperdrix wrote:
> On 04/16/2013 09:24 AM, Jan Kiszka wrote:
>
>> On 2013-04-16 00:02, Jan Kiszka wrote:
>>> On 2013-04-13 18:11, Gilles Chanteperdrix wrote:
>>>> This routine causes a warning from gcc and looks indeed wrong: if the
>>>> "vec" variable is used as an output variable of the inline assembly, it
>>>> should be in the output section of the inline assembly, not the input
>>>> section.
>>>>
>>>
>>> Yes, seems wrong. Will try to look into it the next days.
>>
>> Done, you can find the obvious fix in my for-upstream queue.
>
>
> Hi Jan,
>
> yes the fix is obvious, however the fact that machines with AVX support
> still pass xeno-regression-test is not, have you checked that?
I've checked that the binary output is unaffected, at least for my
compiler here. Makes sense as there is no other use for the registers
around, and the assembler just blindly put the input variables into the
destination slot of the instructions.
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20130416/a3769237/attachment.pgp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
2013-04-16 7:34 ` Jan Kiszka
@ 2013-04-16 7:53 ` Gilles Chanteperdrix
2013-04-16 8:27 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Gilles Chanteperdrix @ 2013-04-16 7:53 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai
On 04/16/2013 09:34 AM, Jan Kiszka wrote:
> On 2013-04-16 09:32, Gilles Chanteperdrix wrote:
>> On 04/16/2013 09:24 AM, Jan Kiszka wrote:
>>
>>> On 2013-04-16 00:02, Jan Kiszka wrote:
>>>> On 2013-04-13 18:11, Gilles Chanteperdrix wrote:
>>>>> This routine causes a warning from gcc and looks indeed wrong: if the
>>>>> "vec" variable is used as an output variable of the inline assembly, it
>>>>> should be in the output section of the inline assembly, not the input
>>>>> section.
>>>>>
>>>>
>>>> Yes, seems wrong. Will try to look into it the next days.
>>>
>>> Done, you can find the obvious fix in my for-upstream queue.
>>
>>
>> Hi Jan,
>>
>> yes the fix is obvious, however the fact that machines with AVX support
>> still pass xeno-regression-test is not, have you checked that?
>
> I've checked that the binary output is unaffected, at least for my
> compiler here. Makes sense as there is no other use for the registers
> around, and the assembler just blindly put the input variables into the
> destination slot of the instructions.
No, it does not really make sense: a piece of assembly without any side
effects and any outputs is basically useless, so the compiler could very
well decide to optimize it out. But if you have checked, OK.
--
Gilles.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check
2013-04-16 7:53 ` Gilles Chanteperdrix
@ 2013-04-16 8:27 ` Jan Kiszka
0 siblings, 0 replies; 7+ messages in thread
From: Jan Kiszka @ 2013-04-16 8:27 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On 2013-04-16 09:53, Gilles Chanteperdrix wrote:
> On 04/16/2013 09:34 AM, Jan Kiszka wrote:
>
>> On 2013-04-16 09:32, Gilles Chanteperdrix wrote:
>>> On 04/16/2013 09:24 AM, Jan Kiszka wrote:
>>>
>>>> On 2013-04-16 00:02, Jan Kiszka wrote:
>>>>> On 2013-04-13 18:11, Gilles Chanteperdrix wrote:
>>>>>> This routine causes a warning from gcc and looks indeed wrong: if the
>>>>>> "vec" variable is used as an output variable of the inline assembly, it
>>>>>> should be in the output section of the inline assembly, not the input
>>>>>> section.
>>>>>>
>>>>>
>>>>> Yes, seems wrong. Will try to look into it the next days.
>>>>
>>>> Done, you can find the obvious fix in my for-upstream queue.
>>>
>>>
>>> Hi Jan,
>>>
>>> yes the fix is obvious, however the fact that machines with AVX support
>>> still pass xeno-regression-test is not, have you checked that?
>>
>> I've checked that the binary output is unaffected, at least for my
>> compiler here. Makes sense as there is no other use for the registers
>> around, and the assembler just blindly put the input variables into the
>> destination slot of the instructions.
>
>
> No, it does not really make sense: a piece of assembly without any side
> effects and any outputs is basically useless, so the compiler could very
> well decide to optimize it out. But if you have checked, OK.
Maybe it's the volatile in the inline assembly that prevented it.
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20130416/41ba12fa/attachment.pgp>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-04-16 8:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1U2RE9-0007Zc-Em@xenomai.org>
2013-04-13 16:11 ` [Xenomai] [Xenomai-git] Jan Kiszka : switchtest: Add SSE and AVX check Gilles Chanteperdrix
2013-04-15 22:02 ` Jan Kiszka
2013-04-16 7:24 ` Jan Kiszka
2013-04-16 7:32 ` Gilles Chanteperdrix
2013-04-16 7:34 ` Jan Kiszka
2013-04-16 7:53 ` Gilles Chanteperdrix
2013-04-16 8:27 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.