* [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
@ 2006-11-01 15:46 Jeff Webb
2006-11-01 19:42 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-01 15:46 UTC (permalink / raw)
To: Xenomai help
Jeff Webb wrote:
> If I run the attached program, I get the following result:
>
> [root]# ./mqtest2
> CPU time limit exceeded
>
> The kernel log contains:
>
> Oct 25 14:13:03 kernel: invalid use of FPU in Xenomai context at
> 0x80492f6
I have another piece of information on this problem. I was able to test the program on two other machines: an Athlon XP, and a PIII laptop. The program works on the PIII, but fails on the Athlon XP. I then compiled a new kernel, selecting the "Pentium-Pro" processor family instead of the "Athlon/Duron/K7" processor family. The mqtest2 program now works on my Athlon64 X2 and Athlon XP systems, if I use the "Pentium-Pro" kernel.
Here is a summary of what I tried:
Machine #1: AMD Athlon(tm)64 X2 Dual Core Processor 4400+
OS: Fedora Core 5
a) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
b) Linux SMP 2.6.17.13 (K7 config) / Xenomai trunk r1749 + patch -> mqtest2 fails
patch = https://mail.gna.org/public/xenomai-core/2006-10/msg00069.html
c) Linux UP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
d) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.1 -> mqtest2 fails
e) Linux UP 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
Machine #2: AMD Athlon(tm) XP 3200+
a) Fedora Core 1 / Linux 2.4.32 (K7 config) / Xenomai 2.2.3 -> mqtest2 fails
b) Fedora Core 5 / Linux 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
Machine #3: Intel Pentium III Mobile CPU
OS: Debian Unstable (old)
Linux 2.4.33.3 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
So, it seems that a bug is introduced when compiling for the AMD K7 family. Is the mqtest2 problem a compiler optimization bug, a Linux bug, a Linux configuration problem, a Xenomai bug, or a problem in my code? Any ideas on how to proceed? I am not familiar with the Xenomai internals or low-level x86 code, but I will do what I can to help debug this.
Does anyone else have an AMD system that can verify my results?
The mqtest2.c program in question was attached here:
https://mail.gna.org/public/xenomai-help/2006-10/msg00147.html
The problem seems to be connected with the size of writes to Xenomai pipes. This example uses POSIX message queues, but I had a similar problem a while back with RTAI pipes. Maybe this tells us the problem is in the nucleus pipe code? Just a guess. The problem seems to affect both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
Thanks,
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-01 15:46 [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] Jeff Webb
@ 2006-11-01 19:42 ` Jan Kiszka
2006-11-01 20:21 ` Jeff Webb
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-11-01 19:42 UTC (permalink / raw)
To: Jeff Webb; +Cc: Xenomai help
[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]
Jeff Webb wrote:
> Jeff Webb wrote:
>> If I run the attached program, I get the following result:
>>
>> [root]# ./mqtest2
>> CPU time limit exceeded
>>
>> The kernel log contains:
>>
>> Oct 25 14:13:03 kernel: invalid use of FPU in Xenomai context at
>> 0x80492f6
>
> I have another piece of information on this problem. I was able to test
> the program on two other machines: an Athlon XP, and a PIII laptop. The
> program works on the PIII, but fails on the Athlon XP. I then compiled
> a new kernel, selecting the "Pentium-Pro" processor family instead of
> the "Athlon/Duron/K7" processor family. The mqtest2 program now works
> on my Athlon64 X2 and Athlon XP systems, if I use the "Pentium-Pro" kernel.
>
> Here is a summary of what I tried:
>
> Machine #1: AMD Athlon(tm)64 X2 Dual Core Processor 4400+
> OS: Fedora Core 5
> a) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
> b) Linux SMP 2.6.17.13 (K7 config) / Xenomai trunk r1749 + patch ->
> mqtest2 fails
> patch = https://mail.gna.org/public/xenomai-core/2006-10/msg00069.html
> c) Linux UP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
> d) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.1 -> mqtest2 fails
> e) Linux UP 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
>
> Machine #2: AMD Athlon(tm) XP 3200+
> a) Fedora Core 1 / Linux 2.4.32 (K7 config) / Xenomai 2.2.3 -> mqtest2
> fails
> b) Fedora Core 5 / Linux 2.6.17.13 (ppro config) / Xenomai 2.2.4 ->
> mqtest2 WORKS!
>
> Machine #3: Intel Pentium III Mobile CPU
> OS: Debian Unstable (old)
> Linux 2.4.33.3 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
>
> So, it seems that a bug is introduced when compiling for the AMD K7
> family. Is the mqtest2 problem a compiler optimization bug, a Linux
> bug, a Linux configuration problem, a Xenomai bug, or a problem in my
> code? Any ideas on how to proceed? I am not familiar with the Xenomai
> internals or low-level x86 code, but I will do what I can to help debug
> this.
>
> Does anyone else have an AMD system that can verify my results?
I have an old Athlon 800. Maybe we are lucky and it exposes the problem
when the kernel is optimised for it. I'm going to give this a try, but
it may take a few days (and a free time slot).
>
> The mqtest2.c program in question was attached here:
> https://mail.gna.org/public/xenomai-help/2006-10/msg00147.html
>
> The problem seems to be connected with the size of writes to Xenomai
> pipes. This example uses POSIX message queues, but I had a similar
> problem a while back with RTAI pipes. Maybe this tells us the problem
> is in the nucleus pipe code? Just a guess. The problem seems to affect
> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
is still /at least/ one piece missing in the puzzle.
BTW, did you already write what compiler version you are using for these
tests (/me too lazy to search the archives)?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-01 19:42 ` Jan Kiszka
@ 2006-11-01 20:21 ` Jeff Webb
2006-11-03 9:45 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-01 20:21 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Xenomai help
Jan Kiszka wrote:
> Jeff Webb wrote:
>> Does anyone else have an AMD system that can verify my results?
>
> I have an old Athlon 800. Maybe we are lucky and it exposes the problem
> when the kernel is optimised for it. I'm going to give this a try, but
> it may take a few days (and a free time slot).
Thank you. I appreciate you giving it a try when you get some free time. I was able to work around the problem by writing the queue data in smaller chunks (or use an i686 kernel), so I am not in urgent need of an immediate fix. I do think it's important to fix this bug eventually, so I didn't want it to slip through the cracks.
>> The problem seems to be connected with the size of writes to Xenomai
>> pipes. This example uses POSIX message queues, but I had a similar
>> problem a while back with RTAI pipes. Maybe this tells us the problem
>> is in the nucleus pipe code? Just a guess. The problem seems to affect
>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>
> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
> is still /at least/ one piece missing in the puzzle.
True. It is very strange that the amount of data in the write call ends up affecting the FPU context.
> BTW, did you already write what compiler version you are using for these
> tests (/me too lazy to search the archives)?
I forgot to include this. I compiled with at least three versions of gcc:
Machine #1: FC5 : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)
Machine #2: FC1 : gcc version 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
FC5 : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)
Machine #3: debian : gcc version 3.3.6 (Debian 1:3.3.6-13)
Thanks again,
-Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-01 20:21 ` Jeff Webb
@ 2006-11-03 9:45 ` Jan Kiszka
2006-11-03 10:01 ` Gilles Chanteperdrix
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-11-03 9:45 UTC (permalink / raw)
To: Jeff Webb, Philippe Gerum, Gilles Chanteperdrix; +Cc: Xenomai help
[-- Attachment #1: Type: text/plain, Size: 1954 bytes --]
Jeff Webb wrote:
> Jan Kiszka wrote:
>> Jeff Webb wrote:
>>> Does anyone else have an AMD system that can verify my results?
>>
>> I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>> when the kernel is optimised for it. I'm going to give this a try, but
>> it may take a few days (and a free time slot).
>
> Thank you. I appreciate you giving it a try when you get some free
> time. I was able to work around the problem by writing the queue data
> in smaller chunks (or use an i686 kernel), so I am not in urgent need of
> an immediate fix. I do think it's important to fix this bug eventually,
> so I didn't want it to slip through the cracks.
>
>>> The problem seems to be connected with the size of writes to Xenomai
>>> pipes. This example uses POSIX message queues, but I had a similar
>>> problem a while back with RTAI pipes. Maybe this tells us the problem
>>> is in the nucleus pipe code? Just a guess. The problem seems to affect
>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>
>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>> is still /at least/ one piece missing in the puzzle.
>
> True. It is very strange that the amount of data in the write call ends
> up affecting the FPU context.
I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
bytes. It messes up with the FPU state and may get trapped by other
issues as well (blind access to "current" in order to test
in_interrupt()). I don't have an answer for this right now beyond "don't
switch on AMD optimisations when using Xenomai". But that's a bit
unsatisfying.
Another way would be to wrap any memcpy access from Xenomai context, but
that's likely impractical (think of all the drivers).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 9:45 ` Jan Kiszka
@ 2006-11-03 10:01 ` Gilles Chanteperdrix
2006-11-03 10:11 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-03 10:01 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Xenomai help
Jan Kiszka wrote:
> Jeff Webb wrote:
>
>>Jan Kiszka wrote:
>>
>>>Jeff Webb wrote:
>>>
>>>>Does anyone else have an AMD system that can verify my results?
>>>
>>>I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>>>when the kernel is optimised for it. I'm going to give this a try, but
>>>it may take a few days (and a free time slot).
>>
>>Thank you. I appreciate you giving it a try when you get some free
>>time. I was able to work around the problem by writing the queue data
>>in smaller chunks (or use an i686 kernel), so I am not in urgent need of
>>an immediate fix. I do think it's important to fix this bug eventually,
>>so I didn't want it to slip through the cracks.
>>
>>
>>>>The problem seems to be connected with the size of writes to Xenomai
>>>>pipes. This example uses POSIX message queues, but I had a similar
>>>>problem a while back with RTAI pipes. Maybe this tells us the problem
>>>>is in the nucleus pipe code? Just a guess. The problem seems to affect
>>>>both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>
>>>Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>>>is still /at least/ one piece missing in the puzzle.
>>
>>True. It is very strange that the amount of data in the write call ends
>>up affecting the FPU context.
>
>
> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>
> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>
> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
> bytes. It messes up with the FPU state and may get trapped by other
> issues as well (blind access to "current" in order to test
> in_interrupt()). I don't have an answer for this right now beyond "don't
> switch on AMD optimisations when using Xenomai". But that's a bit
> unsatisfying.
>
> Another way would be to wrap any memcpy access from Xenomai context, but
> that's likely impractical (think of all the drivers).
I see other ways to solve this issue:
- either we disable the use of the mmx memcpy in string.h if
ipipe_current_domain is not root
- or we allow the exception to happen for threads in primary mode with
the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
this case.
--
Gilles Chanteperdrix
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 10:01 ` Gilles Chanteperdrix
@ 2006-11-03 10:11 ` Jan Kiszka
2006-11-03 10:19 ` Gilles Chanteperdrix
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-11-03 10:11 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: Xenomai help
[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Jeff Webb wrote:
>>
>>> Jan Kiszka wrote:
>>>
>>>> Jeff Webb wrote:
>>>>
>>>>> Does anyone else have an AMD system that can verify my results?
>>>>
>>>> I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>>>> when the kernel is optimised for it. I'm going to give this a try, but
>>>> it may take a few days (and a free time slot).
>>>
>>> Thank you. I appreciate you giving it a try when you get some free
>>> time. I was able to work around the problem by writing the queue data
>>> in smaller chunks (or use an i686 kernel), so I am not in urgent need of
>>> an immediate fix. I do think it's important to fix this bug eventually,
>>> so I didn't want it to slip through the cracks.
>>>
>>>
>>>>> The problem seems to be connected with the size of writes to Xenomai
>>>>> pipes. This example uses POSIX message queues, but I had a similar
>>>>> problem a while back with RTAI pipes. Maybe this tells us the problem
>>>>> is in the nucleus pipe code? Just a guess. The problem seems to
>>>>> affect
>>>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>>
>>>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>>>> is still /at least/ one piece missing in the puzzle.
>>>
>>> True. It is very strange that the amount of data in the write call ends
>>> up affecting the FPU context.
>>
>>
>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>
>> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>>
>> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
>> bytes. It messes up with the FPU state and may get trapped by other
>> issues as well (blind access to "current" in order to test
>> in_interrupt()). I don't have an answer for this right now beyond "don't
>> switch on AMD optimisations when using Xenomai". But that's a bit
>> unsatisfying.
>>
>> Another way would be to wrap any memcpy access from Xenomai context, but
>> that's likely impractical (think of all the drivers).
>
> I see other ways to solve this issue:
> - either we disable the use of the mmx memcpy in string.h if
> ipipe_current_domain is not root
This is what came to my mind as well meanwhile.
> - or we allow the exception to happen for threads in primary mode with
> the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
> this case.
Given that this is a special case for a subset of x86[_64] CPUs, I
rather think we should go for the first variant. Should be simpler.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 10:11 ` Jan Kiszka
@ 2006-11-03 10:19 ` Gilles Chanteperdrix
2006-11-03 12:21 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-03 10:19 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Xenomai help
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>
>>Jan Kiszka wrote:
>>
>>>Jeff Webb wrote:
>>>
>>>
>>>>Jan Kiszka wrote:
>>>>
>>>>
>>>>>Jeff Webb wrote:
>>>>>
>>>>>
>>>>>>Does anyone else have an AMD system that can verify my results?
>>>>>
>>>>>I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>>>>>when the kernel is optimised for it. I'm going to give this a try, but
>>>>>it may take a few days (and a free time slot).
>>>>
>>>>Thank you. I appreciate you giving it a try when you get some free
>>>>time. I was able to work around the problem by writing the queue data
>>>>in smaller chunks (or use an i686 kernel), so I am not in urgent need of
>>>>an immediate fix. I do think it's important to fix this bug eventually,
>>>>so I didn't want it to slip through the cracks.
>>>>
>>>>
>>>>
>>>>>>The problem seems to be connected with the size of writes to Xenomai
>>>>>>pipes. This example uses POSIX message queues, but I had a similar
>>>>>>problem a while back with RTAI pipes. Maybe this tells us the problem
>>>>>>is in the nucleus pipe code? Just a guess. The problem seems to
>>>>>>affect
>>>>>>both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>>>
>>>>>Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>>>>>is still /at least/ one piece missing in the puzzle.
>>>>
>>>>True. It is very strange that the amount of data in the write call ends
>>>>up affecting the FPU context.
>>>
>>>
>>>I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>>
>>>http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>>>
>>>It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
>>>bytes. It messes up with the FPU state and may get trapped by other
>>>issues as well (blind access to "current" in order to test
>>>in_interrupt()). I don't have an answer for this right now beyond "don't
>>>switch on AMD optimisations when using Xenomai". But that's a bit
>>>unsatisfying.
>>>
>>>Another way would be to wrap any memcpy access from Xenomai context, but
>>>that's likely impractical (think of all the drivers).
>>
>>I see other ways to solve this issue:
>>- either we disable the use of the mmx memcpy in string.h if
>>ipipe_current_domain is not root
>
>
> This is what came to my mind as well meanwhile.
>
>
>>- or we allow the exception to happen for threads in primary mode with
>>the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
>>this case.
>
>
> Given that this is a special case for a subset of x86[_64] CPUs, I
> rather think we should go for the first variant. Should be simpler.
This second way would not work correctly with kernel-space threads, so,
if we wanted to implement it, we would still need to disable the mmx
memcpy in string.h for kernel-space threads, i.e. if current is not
safe.
--
Gilles Chanteperdrix
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 10:19 ` Gilles Chanteperdrix
@ 2006-11-03 12:21 ` Jan Kiszka
2006-11-03 17:22 ` Jeff Webb
2006-11-04 14:30 ` Philippe Gerum
0 siblings, 2 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-11-03 12:21 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: Xenomai help
[-- Attachment #1.1: Type: text/plain, Size: 3193 bytes --]
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>
>>> Jan Kiszka wrote:
>>>
>>>> Jeff Webb wrote:
>>>>
>>>>
>>>>> Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>> Jeff Webb wrote:
>>>>>>
>>>>>>
>>>>>>> Does anyone else have an AMD system that can verify my results?
>>>>>>
>>>>>> I have an old Athlon 800. Maybe we are lucky and it exposes the
>>>>>> problem
>>>>>> when the kernel is optimised for it. I'm going to give this a try,
>>>>>> but
>>>>>> it may take a few days (and a free time slot).
>>>>>
>>>>> Thank you. I appreciate you giving it a try when you get some free
>>>>> time. I was able to work around the problem by writing the queue data
>>>>> in smaller chunks (or use an i686 kernel), so I am not in urgent
>>>>> need of
>>>>> an immediate fix. I do think it's important to fix this bug
>>>>> eventually,
>>>>> so I didn't want it to slip through the cracks.
>>>>>
>>>>>
>>>>>
>>>>>>> The problem seems to be connected with the size of writes to Xenomai
>>>>>>> pipes. This example uses POSIX message queues, but I had a similar
>>>>>>> problem a while back with RTAI pipes. Maybe this tells us the
>>>>>>> problem
>>>>>>> is in the nucleus pipe code? Just a guess. The problem seems to
>>>>>>> affect
>>>>>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>>>>
>>>>>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so
>>>>>> there
>>>>>> is still /at least/ one piece missing in the puzzle.
>>>>>
>>>>> True. It is very strange that the amount of data in the write call
>>>>> ends
>>>>> up affecting the FPU context.
>>>>
>>>>
>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>>>
>>>> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>>>>
>>>> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
>>>> bytes. It messes up with the FPU state and may get trapped by other
>>>> issues as well (blind access to "current" in order to test
>>>> in_interrupt()). I don't have an answer for this right now beyond
>>>> "don't
>>>> switch on AMD optimisations when using Xenomai". But that's a bit
>>>> unsatisfying.
>>>>
>>>> Another way would be to wrap any memcpy access from Xenomai context,
>>>> but
>>>> that's likely impractical (think of all the drivers).
>>>
>>> I see other ways to solve this issue:
>>> - either we disable the use of the mmx memcpy in string.h if
>>> ipipe_current_domain is not root
>>
>>
>> This is what came to my mind as well meanwhile.
>>
>>
>>> - or we allow the exception to happen for threads in primary mode with
>>> the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
>>> this case.
>>
>>
>> Given that this is a special case for a subset of x86[_64] CPUs, I
>> rather think we should go for the first variant. Should be simpler.
>
> This second way would not work correctly with kernel-space threads, so,
> if we wanted to implement it, we would still need to disable the mmx
> memcpy in string.h for kernel-space threads, i.e. if current is not
> safe.
>
True.
This patch fixes the issue for me.
Jan
[-- Attachment #1.2: disable-mmx_memcpy.patch --]
[-- Type: text/plain, Size: 512 bytes --]
---
arch/i386/lib/mmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.17.13/arch/i386/lib/mmx.c
===================================================================
--- linux-2.6.17.13.orig/arch/i386/lib/mmx.c
+++ linux-2.6.17.13/arch/i386/lib/mmx.c
@@ -32,7 +32,7 @@ void *_mmx_memcpy(void *to, const void *
void *p;
int i;
- if (unlikely(in_interrupt()))
+ if (unlikely(!ipipe_root_domain_p || in_interrupt()))
return __memcpy(to, from, len);
p = to;
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 12:21 ` Jan Kiszka
@ 2006-11-03 17:22 ` Jeff Webb
2006-11-06 9:04 ` Gilles Chanteperdrix
2006-11-04 14:30 ` Philippe Gerum
1 sibling, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-03 17:22 UTC (permalink / raw)
To: Xenomai help
Jan Kiszka wrote:
>>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
> ...
> True.
>
> This patch fixes the issue for me.
Works for me as well on my Athlon64 X2 machine.
Many thanks for hunting this down, Jan.
-Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 12:21 ` Jan Kiszka
2006-11-03 17:22 ` Jeff Webb
@ 2006-11-04 14:30 ` Philippe Gerum
1 sibling, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2006-11-04 14:30 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Xenomai help
On Fri, 2006-11-03 at 13:21 +0100, Jan Kiszka wrote:
[...]
> This patch fixes the issue for me.
>
> Jan
> plain text document attachment (disable-mmx_memcpy.patch)
> ---
> arch/i386/lib/mmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.17.13/arch/i386/lib/mmx.c
> ===================================================================
> --- linux-2.6.17.13.orig/arch/i386/lib/mmx.c
> +++ linux-2.6.17.13/arch/i386/lib/mmx.c
> @@ -32,7 +32,7 @@ void *_mmx_memcpy(void *to, const void *
> void *p;
> int i;
>
> - if (unlikely(in_interrupt()))
> + if (unlikely(!ipipe_root_domain_p || in_interrupt()))
> return __memcpy(to, from, len);
>
> p = to;
Merged, thanks.
--
Philippe.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
2006-11-03 17:22 ` Jeff Webb
@ 2006-11-06 9:04 ` Gilles Chanteperdrix
0 siblings, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-06 9:04 UTC (permalink / raw)
To: Jeff Webb; +Cc: Xenomai help
[-- Attachment #1: Type: text/plain, Size: 457 bytes --]
Jeff Webb wrote:
> Jan Kiszka wrote:
>
>>>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>
>> ... True.
>>
>> This patch fixes the issue for me.
>
>
> Works for me as well on my Athlon64 X2 machine.
To see if trying to use this mmx_memcpy is worth the trouble, I made a
test program to benchmark __memcpy versus _mmx_memcpy. Could you try
it on AMD ?
--
Gilles Chanteperdrix
[-- Attachment #2: test_memcpy.c --]
[-- Type: text/x-csrc, Size: 6278 bytes --]
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <sys/io.h> /* iopl */
#include <sys/mman.h> /* mlockall */
#define unlikely(expr) (__builtin_expect((expr), 0))
#include <asm/processor.h>
#define COUNT 1000
#define SIZE 512
#define hw_cli() \
__asm__ __volatile__ ("cli")
#define hw_sti() \
__asm__ __volatile__ ("sti")
void *_mmx_memcpy_prefetch(void *to, const void *from, size_t len);
void *_mmx_memcpy(void *to, const void *from, size_t len);
static inline __attribute__((always_inline)) void * __memcpy(void * to, const void * from, size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
"rep ; movsl\n\t"
"movl %4,%%ecx\n\t"
"andl $3,%%ecx\n\t"
#if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */
"jz 1f\n\t"
#endif
"rep ; movsb\n\t"
"1:"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
: "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from)
: "memory");
return (to);
}
jmp_buf jmpbuf;
void sigill_handler(int sig __attribute__((unused)))
{
longjmp(jmpbuf, 1);
}
int main(void)
{
char src[SIZE];
char dst[SIZE];
unsigned long long begin, end;
double d;
unsigned i, use_prefetch;
if (iopl(3)) {
perror("iopl(3)");
return EXIT_FAILURE;
}
if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
perror("mlockall");
return EXIT_FAILURE;
}
memset(src, '\0', sizeof(src));
memset(dst, '\0', sizeof(src));
if (signal(SIGILL, sigill_handler) == SIG_ERR) {
perror("signal");
return EXIT_FAILURE;
}
if (!setjmp(jmpbuf)) {
use_prefetch = 1;
__asm__ __volatile__ ("prefetch (%0)"
: /* no out */ : "r" (src));
} else
use_prefetch = 0;
if (signal(SIGILL, SIG_DFL) == SIG_ERR) {
perror("signal");
return EXIT_FAILURE;
}
hw_cli();
rdtscll(begin);
for (i = 0; i < COUNT; i++)
memcpy(dst, src, sizeof(dst));
rdtscll(end);
hw_sti();
printf("libc memcpy: %llu\n", (end - begin)/COUNT);
hw_cli();
rdtscll(begin);
for (i = 0; i < COUNT; i++)
__memcpy(dst, src, sizeof(dst));
rdtscll(end);
hw_sti();
printf("__memcpy: %llu\n", (end - begin)/COUNT);
d = 0;
for (i = 0; i < COUNT; i++) /* use fpu in order to avoid a fault when
* fxsave is called. */
d += 0.1;
if (use_prefetch) {
hw_cli();
rdtscll(begin);
for (i = 0; i < COUNT; i++)
_mmx_memcpy_prefetch(dst, src, sizeof(dst));
rdtscll(end);
hw_sti();
printf("_mmx_memcpy(with prefetch): %llu\n",
(end - begin)/COUNT);
} else {
hw_cli();
rdtscll(begin);
for (i = 0; i < COUNT; i++)
_mmx_memcpy(dst, src, sizeof(dst));
rdtscll(end);
hw_sti();
printf("_mmx_memcpy(without prefetch): %llu\n",
(end - begin)/COUNT);
}
printf("d: %g\n", d); /* Use d to avoid it being optimized out. */
return EXIT_SUCCESS;
}
__attribute__((noinline)) void *_mmx_memcpy_prefetch(void *to, const void *from, size_t len)
{
struct i387_fxsave_struct fxsave;
char pad[15] __attribute__((unused));
struct i387_fxsave_struct *fpenv =
(struct i387_fxsave_struct *) (((unsigned) &fxsave + 15) & ~15);
void *p;
int i;
p = to;
i = len >> 6; /* len/64 */
__asm__ __volatile__ ("fxsave %0; fnclex":"=m"(*fpenv));
__asm__ __volatile__ (
" prefetch (%0)\n" /* This set is 28 bytes */
" prefetch 64(%0)\n"
" prefetch 128(%0)\n"
" prefetch 192(%0)\n"
" prefetch 256(%0)\n"
: /* no out */ : "r" (from) );
for(; i>5; i--)
{
__asm__ __volatile__ (
" prefetch 320(%0)\n"
" movq (%0), %%mm0\n"
" movq 8(%0), %%mm1\n"
" movq 16(%0), %%mm2\n"
" movq 24(%0), %%mm3\n"
" movq %%mm0, (%1)\n"
" movq %%mm1, 8(%1)\n"
" movq %%mm2, 16(%1)\n"
" movq %%mm3, 24(%1)\n"
" movq 32(%0), %%mm0\n"
" movq 40(%0), %%mm1\n"
" movq 48(%0), %%mm2\n"
" movq 56(%0), %%mm3\n"
" movq %%mm0, 32(%1)\n"
" movq %%mm1, 40(%1)\n"
" movq %%mm2, 48(%1)\n"
" movq %%mm3, 56(%1)\n"
: /* no out */ : "r" (from), "r" (to) : "memory");
from+=64;
to+=64;
}
for(; i>0; i--)
{
__asm__ __volatile__ (
" movq (%0), %%mm0\n"
" movq 8(%0), %%mm1\n"
" movq 16(%0), %%mm2\n"
" movq 24(%0), %%mm3\n"
" movq %%mm0, (%1)\n"
" movq %%mm1, 8(%1)\n"
" movq %%mm2, 16(%1)\n"
" movq %%mm3, 24(%1)\n"
" movq 32(%0), %%mm0\n"
" movq 40(%0), %%mm1\n"
" movq 48(%0), %%mm2\n"
" movq 56(%0), %%mm3\n"
" movq %%mm0, 32(%1)\n"
" movq %%mm1, 40(%1)\n"
" movq %%mm2, 48(%1)\n"
" movq %%mm3, 56(%1)\n"
: /* no out */ : "r" (from), "r" (to) : "memory");
from+=64;
to+=64;
}
/*
* Now do the tail of the block
*/
__memcpy(to, from, len&63);
__asm__ __volatile__ ("fxrstor %0" : /* no out */ : "m"(*fpenv));
return p;
}
__attribute__((noinline)) void *_mmx_memcpy(void *to, const void *from, size_t len)
{
struct i387_fxsave_struct fxsave;
char pad[15] __attribute__((unused));
struct i387_fxsave_struct *fpenv =
(struct i387_fxsave_struct *) (((unsigned) &fxsave + 15) & ~15);
void *p;
int i;
p = to;
i = len >> 6; /* len/64 */
__asm__ __volatile__ ("fxsave %0; fnclex":"=m"(*fpenv));
for(; i>5; i--)
{
__asm__ __volatile__ (
" movq (%0), %%mm0\n"
" movq 8(%0), %%mm1\n"
" movq 16(%0), %%mm2\n"
" movq 24(%0), %%mm3\n"
" movq %%mm0, (%1)\n"
" movq %%mm1, 8(%1)\n"
" movq %%mm2, 16(%1)\n"
" movq %%mm3, 24(%1)\n"
" movq 32(%0), %%mm0\n"
" movq 40(%0), %%mm1\n"
" movq 48(%0), %%mm2\n"
" movq 56(%0), %%mm3\n"
" movq %%mm0, 32(%1)\n"
" movq %%mm1, 40(%1)\n"
" movq %%mm2, 48(%1)\n"
" movq %%mm3, 56(%1)\n"
: /* no out */ : "r" (from), "r" (to) : "memory");
from+=64;
to+=64;
}
for(; i>0; i--)
{
__asm__ __volatile__ (
" movq (%0), %%mm0\n"
" movq 8(%0), %%mm1\n"
" movq 16(%0), %%mm2\n"
" movq 24(%0), %%mm3\n"
" movq %%mm0, (%1)\n"
" movq %%mm1, 8(%1)\n"
" movq %%mm2, 16(%1)\n"
" movq %%mm3, 24(%1)\n"
" movq 32(%0), %%mm0\n"
" movq 40(%0), %%mm1\n"
" movq 48(%0), %%mm2\n"
" movq 56(%0), %%mm3\n"
" movq %%mm0, 32(%1)\n"
" movq %%mm1, 40(%1)\n"
" movq %%mm2, 48(%1)\n"
" movq %%mm3, 56(%1)\n"
: /* no out */ : "r" (from), "r" (to) : "memory");
from+=64;
to+=64;
}
/*
* Now do the tail of the block
*/
__memcpy(to, from, len&63);
__asm__ __volatile__ ("fxrstor %0" : /* no out */ : "m"(*fpenv));
return p;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-11-06 9:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-01 15:46 [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] Jeff Webb
2006-11-01 19:42 ` Jan Kiszka
2006-11-01 20:21 ` Jeff Webb
2006-11-03 9:45 ` Jan Kiszka
2006-11-03 10:01 ` Gilles Chanteperdrix
2006-11-03 10:11 ` Jan Kiszka
2006-11-03 10:19 ` Gilles Chanteperdrix
2006-11-03 12:21 ` Jan Kiszka
2006-11-03 17:22 ` Jeff Webb
2006-11-06 9:04 ` Gilles Chanteperdrix
2006-11-04 14:30 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.