All of lore.kernel.org
 help / color / mirror / Atom feed
* [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
@ 2006-11-01 15:46 Jeff Webb
  2006-11-01 19:42 ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-01 15:46 UTC (permalink / raw)
  To: Xenomai help

Jeff Webb wrote:
> If I run the attached program, I get the following result:
>
>  [root]# ./mqtest2
>  CPU time limit exceeded
>
> The kernel log contains:
>
>  Oct 25 14:13:03 kernel: invalid use of FPU in Xenomai context at 
> 0x80492f6

I have another piece of information on this problem.  I was able to test the program on two other machines: an Athlon XP, and a PIII laptop.  The program works on the PIII, but fails on the Athlon XP.  I then compiled a new kernel, selecting the "Pentium-Pro" processor family instead of the "Athlon/Duron/K7" processor family.  The mqtest2 program now works on my Athlon64 X2 and Athlon XP systems, if I use the "Pentium-Pro" kernel.

Here is a summary of what I tried:

Machine #1: AMD Athlon(tm)64 X2 Dual Core Processor  4400+
        OS: Fedora Core 5
a) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
b) Linux SMP 2.6.17.13 (K7 config) / Xenomai trunk r1749 + patch -> mqtest2 fails
   patch = https://mail.gna.org/public/xenomai-core/2006-10/msg00069.html
c) Linux UP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
d) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.1 -> mqtest2 fails
e) Linux UP 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!

Machine #2: AMD Athlon(tm) XP 3200+
a) Fedora Core 1 / Linux 2.4.32 (K7 config) / Xenomai 2.2.3 -> mqtest2 fails
b) Fedora Core 5 / Linux 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!

Machine #3: Intel Pentium III Mobile CPU
OS: Debian Unstable (old)
Linux 2.4.33.3 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!

So, it seems that a bug is introduced when compiling for the AMD K7 family.  Is the mqtest2 problem a compiler optimization bug, a Linux bug, a Linux configuration problem, a Xenomai bug, or a problem in my code?  Any ideas on how to proceed?  I am not familiar with the Xenomai internals or low-level x86 code, but I will do what I can to help debug this.

Does anyone else have an AMD system that can verify my results?

The mqtest2.c program in question was attached here:
  https://mail.gna.org/public/xenomai-help/2006-10/msg00147.html

The problem seems to be connected with the size of writes to Xenomai pipes.  This example uses POSIX message queues, but I had a similar problem a while back with RTAI pipes.  Maybe this tells us the problem is in the nucleus pipe code?  Just a guess.  The problem seems to affect both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.

Thanks,

Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-01 15:46 [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] Jeff Webb
@ 2006-11-01 19:42 ` Jan Kiszka
  2006-11-01 20:21   ` Jeff Webb
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-11-01 19:42 UTC (permalink / raw)
  To: Jeff Webb; +Cc: Xenomai help

[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]

Jeff Webb wrote:
> Jeff Webb wrote:
>> If I run the attached program, I get the following result:
>>
>>  [root]# ./mqtest2
>>  CPU time limit exceeded
>>
>> The kernel log contains:
>>
>>  Oct 25 14:13:03 kernel: invalid use of FPU in Xenomai context at
>> 0x80492f6
> 
> I have another piece of information on this problem.  I was able to test
> the program on two other machines: an Athlon XP, and a PIII laptop.  The
> program works on the PIII, but fails on the Athlon XP.  I then compiled
> a new kernel, selecting the "Pentium-Pro" processor family instead of
> the "Athlon/Duron/K7" processor family.  The mqtest2 program now works
> on my Athlon64 X2 and Athlon XP systems, if I use the "Pentium-Pro" kernel.
> 
> Here is a summary of what I tried:
> 
> Machine #1: AMD Athlon(tm)64 X2 Dual Core Processor  4400+
>        OS: Fedora Core 5
> a) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
> b) Linux SMP 2.6.17.13 (K7 config) / Xenomai trunk r1749 + patch ->
> mqtest2 fails
>   patch = https://mail.gna.org/public/xenomai-core/2006-10/msg00069.html
> c) Linux UP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
> d) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.1 -> mqtest2 fails
> e) Linux UP 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
> 
> Machine #2: AMD Athlon(tm) XP 3200+
> a) Fedora Core 1 / Linux 2.4.32 (K7 config) / Xenomai 2.2.3 -> mqtest2
> fails
> b) Fedora Core 5 / Linux 2.6.17.13 (ppro config) / Xenomai 2.2.4 ->
> mqtest2 WORKS!
> 
> Machine #3: Intel Pentium III Mobile CPU
> OS: Debian Unstable (old)
> Linux 2.4.33.3 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
> 
> So, it seems that a bug is introduced when compiling for the AMD K7
> family.  Is the mqtest2 problem a compiler optimization bug, a Linux
> bug, a Linux configuration problem, a Xenomai bug, or a problem in my
> code?  Any ideas on how to proceed?  I am not familiar with the Xenomai
> internals or low-level x86 code, but I will do what I can to help debug
> this.
> 
> Does anyone else have an AMD system that can verify my results?

I have an old Athlon 800. Maybe we are lucky and it exposes the problem
when the kernel is optimised for it. I'm going to give this a try, but
it may take a few days (and a free time slot).

> 
> The mqtest2.c program in question was attached here:
>  https://mail.gna.org/public/xenomai-help/2006-10/msg00147.html
> 
> The problem seems to be connected with the size of writes to Xenomai
> pipes.  This example uses POSIX message queues, but I had a similar
> problem a while back with RTAI pipes.  Maybe this tells us the problem
> is in the nucleus pipe code?  Just a guess.  The problem seems to affect
> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.

Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
is still /at least/ one piece missing in the puzzle.

BTW, did you already write what compiler version you are using for these
tests (/me too lazy to search the archives)?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-01 19:42 ` Jan Kiszka
@ 2006-11-01 20:21   ` Jeff Webb
  2006-11-03  9:45     ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-01 20:21 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai help

Jan Kiszka wrote:
> Jeff Webb wrote:
>> Does anyone else have an AMD system that can verify my results?
> 
> I have an old Athlon 800. Maybe we are lucky and it exposes the problem
> when the kernel is optimised for it. I'm going to give this a try, but
> it may take a few days (and a free time slot).

Thank you.  I appreciate you giving it a try when you get some free time.  I was able to work around the problem by writing the queue data in smaller chunks (or use an i686 kernel), so I am not in urgent need of an immediate fix.  I do think it's important to fix this bug eventually, so I didn't want it to slip through the cracks.

>> The problem seems to be connected with the size of writes to Xenomai
>> pipes.  This example uses POSIX message queues, but I had a similar
>> problem a while back with RTAI pipes.  Maybe this tells us the problem
>> is in the nucleus pipe code?  Just a guess.  The problem seems to affect
>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
> 
> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
> is still /at least/ one piece missing in the puzzle.

True.  It is very strange that the amount of data in the write call ends up affecting the FPU context.

> BTW, did you already write what compiler version you are using for these
> tests (/me too lazy to search the archives)?

I forgot to include this.  I compiled with at least three versions of gcc:

Machine #1: FC5 : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)
Machine #2: FC1 : gcc version 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
            FC5 : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)
Machine #3: debian : gcc version 3.3.6 (Debian 1:3.3.6-13)

Thanks again,

-Jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-01 20:21   ` Jeff Webb
@ 2006-11-03  9:45     ` Jan Kiszka
  2006-11-03 10:01       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-11-03  9:45 UTC (permalink / raw)
  To: Jeff Webb, Philippe Gerum, Gilles Chanteperdrix; +Cc: Xenomai help

[-- Attachment #1: Type: text/plain, Size: 1954 bytes --]

Jeff Webb wrote:
> Jan Kiszka wrote:
>> Jeff Webb wrote:
>>> Does anyone else have an AMD system that can verify my results?
>>
>> I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>> when the kernel is optimised for it. I'm going to give this a try, but
>> it may take a few days (and a free time slot).
> 
> Thank you.  I appreciate you giving it a try when you get some free
> time.  I was able to work around the problem by writing the queue data
> in smaller chunks (or use an i686 kernel), so I am not in urgent need of
> an immediate fix.  I do think it's important to fix this bug eventually,
> so I didn't want it to slip through the cracks.
> 
>>> The problem seems to be connected with the size of writes to Xenomai
>>> pipes.  This example uses POSIX message queues, but I had a similar
>>> problem a while back with RTAI pipes.  Maybe this tells us the problem
>>> is in the nucleus pipe code?  Just a guess.  The problem seems to affect
>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>
>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>> is still /at least/ one piece missing in the puzzle.
> 
> True.  It is very strange that the amount of data in the write call ends
> up affecting the FPU context.

I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)

http://lxr.free-electrons.com/source/include/asm-i386/string.h#285

It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
bytes. It messes up with the FPU state and may get trapped by other
issues as well (blind access to "current" in order to test
in_interrupt()). I don't have an answer for this right now beyond "don't
switch on AMD optimisations when using Xenomai". But that's a bit
unsatisfying.

Another way would be to wrap any memcpy access from Xenomai context, but
that's likely impractical (think of all the drivers).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03  9:45     ` Jan Kiszka
@ 2006-11-03 10:01       ` Gilles Chanteperdrix
  2006-11-03 10:11         ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-03 10:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai help

Jan Kiszka wrote:
> Jeff Webb wrote:
> 
>>Jan Kiszka wrote:
>>
>>>Jeff Webb wrote:
>>>
>>>>Does anyone else have an AMD system that can verify my results?
>>>
>>>I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>>>when the kernel is optimised for it. I'm going to give this a try, but
>>>it may take a few days (and a free time slot).
>>
>>Thank you.  I appreciate you giving it a try when you get some free
>>time.  I was able to work around the problem by writing the queue data
>>in smaller chunks (or use an i686 kernel), so I am not in urgent need of
>>an immediate fix.  I do think it's important to fix this bug eventually,
>>so I didn't want it to slip through the cracks.
>>
>>
>>>>The problem seems to be connected with the size of writes to Xenomai
>>>>pipes.  This example uses POSIX message queues, but I had a similar
>>>>problem a while back with RTAI pipes.  Maybe this tells us the problem
>>>>is in the nucleus pipe code?  Just a guess.  The problem seems to affect
>>>>both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>
>>>Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>>>is still /at least/ one piece missing in the puzzle.
>>
>>True.  It is very strange that the amount of data in the write call ends
>>up affecting the FPU context.
> 
> 
> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
> 
> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
> 
> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
> bytes. It messes up with the FPU state and may get trapped by other
> issues as well (blind access to "current" in order to test
> in_interrupt()). I don't have an answer for this right now beyond "don't
> switch on AMD optimisations when using Xenomai". But that's a bit
> unsatisfying.
> 
> Another way would be to wrap any memcpy access from Xenomai context, but
> that's likely impractical (think of all the drivers).

I see other ways to solve this issue:
- either we disable the use of the mmx memcpy in string.h if
ipipe_current_domain is not root
- or we allow the exception to happen for threads in primary mode with 
the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in 
this case.

-- 
                                                  Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03 10:01       ` Gilles Chanteperdrix
@ 2006-11-03 10:11         ` Jan Kiszka
  2006-11-03 10:19           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-11-03 10:11 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai help

[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Jeff Webb wrote:
>>
>>> Jan Kiszka wrote:
>>>
>>>> Jeff Webb wrote:
>>>>
>>>>> Does anyone else have an AMD system that can verify my results?
>>>>
>>>> I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>>>> when the kernel is optimised for it. I'm going to give this a try, but
>>>> it may take a few days (and a free time slot).
>>>
>>> Thank you.  I appreciate you giving it a try when you get some free
>>> time.  I was able to work around the problem by writing the queue data
>>> in smaller chunks (or use an i686 kernel), so I am not in urgent need of
>>> an immediate fix.  I do think it's important to fix this bug eventually,
>>> so I didn't want it to slip through the cracks.
>>>
>>>
>>>>> The problem seems to be connected with the size of writes to Xenomai
>>>>> pipes.  This example uses POSIX message queues, but I had a similar
>>>>> problem a while back with RTAI pipes.  Maybe this tells us the problem
>>>>> is in the nucleus pipe code?  Just a guess.  The problem seems to
>>>>> affect
>>>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>>
>>>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>>>> is still /at least/ one piece missing in the puzzle.
>>>
>>> True.  It is very strange that the amount of data in the write call ends
>>> up affecting the FPU context.
>>
>>
>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>
>> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>>
>> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
>> bytes. It messes up with the FPU state and may get trapped by other
>> issues as well (blind access to "current" in order to test
>> in_interrupt()). I don't have an answer for this right now beyond "don't
>> switch on AMD optimisations when using Xenomai". But that's a bit
>> unsatisfying.
>>
>> Another way would be to wrap any memcpy access from Xenomai context, but
>> that's likely impractical (think of all the drivers).
> 
> I see other ways to solve this issue:
> - either we disable the use of the mmx memcpy in string.h if
> ipipe_current_domain is not root

This is what came to my mind as well meanwhile.

> - or we allow the exception to happen for threads in primary mode with
> the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
> this case.

Given that this is a special case for a subset of x86[_64] CPUs, I
rather think we should go for the first variant. Should be simpler.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03 10:11         ` Jan Kiszka
@ 2006-11-03 10:19           ` Gilles Chanteperdrix
  2006-11-03 12:21             ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-03 10:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai help

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
> 
>>Jan Kiszka wrote:
>>
>>>Jeff Webb wrote:
>>>
>>>
>>>>Jan Kiszka wrote:
>>>>
>>>>
>>>>>Jeff Webb wrote:
>>>>>
>>>>>
>>>>>>Does anyone else have an AMD system that can verify my results?
>>>>>
>>>>>I have an old Athlon 800. Maybe we are lucky and it exposes the problem
>>>>>when the kernel is optimised for it. I'm going to give this a try, but
>>>>>it may take a few days (and a free time slot).
>>>>
>>>>Thank you.  I appreciate you giving it a try when you get some free
>>>>time.  I was able to work around the problem by writing the queue data
>>>>in smaller chunks (or use an i686 kernel), so I am not in urgent need of
>>>>an immediate fix.  I do think it's important to fix this bug eventually,
>>>>so I didn't want it to slip through the cracks.
>>>>
>>>>
>>>>
>>>>>>The problem seems to be connected with the size of writes to Xenomai
>>>>>>pipes.  This example uses POSIX message queues, but I had a similar
>>>>>>problem a while back with RTAI pipes.  Maybe this tells us the problem
>>>>>>is in the nucleus pipe code?  Just a guess.  The problem seems to
>>>>>>affect
>>>>>>both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>>>
>>>>>Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there
>>>>>is still /at least/ one piece missing in the puzzle.
>>>>
>>>>True.  It is very strange that the amount of data in the write call ends
>>>>up affecting the FPU context.
>>>
>>>
>>>I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>>
>>>http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>>>
>>>It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
>>>bytes. It messes up with the FPU state and may get trapped by other
>>>issues as well (blind access to "current" in order to test
>>>in_interrupt()). I don't have an answer for this right now beyond "don't
>>>switch on AMD optimisations when using Xenomai". But that's a bit
>>>unsatisfying.
>>>
>>>Another way would be to wrap any memcpy access from Xenomai context, but
>>>that's likely impractical (think of all the drivers).
>>
>>I see other ways to solve this issue:
>>- either we disable the use of the mmx memcpy in string.h if
>>ipipe_current_domain is not root
> 
> 
> This is what came to my mind as well meanwhile.
> 
> 
>>- or we allow the exception to happen for threads in primary mode with
>>the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
>>this case.
> 
> 
> Given that this is a special case for a subset of x86[_64] CPUs, I
> rather think we should go for the first variant. Should be simpler.

This second way would not work correctly with kernel-space threads, so,
if we wanted to implement it, we would still need to disable the mmx
memcpy in string.h for kernel-space threads, i.e. if current is not
safe.

-- 
                                                  Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03 10:19           ` Gilles Chanteperdrix
@ 2006-11-03 12:21             ` Jan Kiszka
  2006-11-03 17:22               ` Jeff Webb
  2006-11-04 14:30               ` Philippe Gerum
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-11-03 12:21 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai help


[-- Attachment #1.1: Type: text/plain, Size: 3193 bytes --]

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>
>>> Jan Kiszka wrote:
>>>
>>>> Jeff Webb wrote:
>>>>
>>>>
>>>>> Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>> Jeff Webb wrote:
>>>>>>
>>>>>>
>>>>>>> Does anyone else have an AMD system that can verify my results?
>>>>>>
>>>>>> I have an old Athlon 800. Maybe we are lucky and it exposes the
>>>>>> problem
>>>>>> when the kernel is optimised for it. I'm going to give this a try,
>>>>>> but
>>>>>> it may take a few days (and a free time slot).
>>>>>
>>>>> Thank you.  I appreciate you giving it a try when you get some free
>>>>> time.  I was able to work around the problem by writing the queue data
>>>>> in smaller chunks (or use an i686 kernel), so I am not in urgent
>>>>> need of
>>>>> an immediate fix.  I do think it's important to fix this bug
>>>>> eventually,
>>>>> so I didn't want it to slip through the cracks.
>>>>>
>>>>>
>>>>>
>>>>>>> The problem seems to be connected with the size of writes to Xenomai
>>>>>>> pipes.  This example uses POSIX message queues, but I had a similar
>>>>>>> problem a while back with RTAI pipes.  Maybe this tells us the
>>>>>>> problem
>>>>>>> is in the nucleus pipe code?  Just a guess.  The problem seems to
>>>>>>> affect
>>>>>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
>>>>>>
>>>>>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so
>>>>>> there
>>>>>> is still /at least/ one piece missing in the puzzle.
>>>>>
>>>>> True.  It is very strange that the amount of data in the write call
>>>>> ends
>>>>> up affecting the FPU context.
>>>>
>>>>
>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>>>
>>>> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285
>>>>
>>>> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512
>>>> bytes. It messes up with the FPU state and may get trapped by other
>>>> issues as well (blind access to "current" in order to test
>>>> in_interrupt()). I don't have an answer for this right now beyond
>>>> "don't
>>>> switch on AMD optimisations when using Xenomai". But that's a bit
>>>> unsatisfying.
>>>>
>>>> Another way would be to wrap any memcpy access from Xenomai context,
>>>> but
>>>> that's likely impractical (think of all the drivers).
>>>
>>> I see other ways to solve this issue:
>>> - either we disable the use of the mmx memcpy in string.h if
>>> ipipe_current_domain is not root
>>
>>
>> This is what came to my mind as well meanwhile.
>>
>>
>>> - or we allow the exception to happen for threads in primary mode with
>>> the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in
>>> this case.
>>
>>
>> Given that this is a special case for a subset of x86[_64] CPUs, I
>> rather think we should go for the first variant. Should be simpler.
> 
> This second way would not work correctly with kernel-space threads, so,
> if we wanted to implement it, we would still need to disable the mmx
> memcpy in string.h for kernel-space threads, i.e. if current is not
> safe.
> 

True.

This patch fixes the issue for me.

Jan

[-- Attachment #1.2: disable-mmx_memcpy.patch --]
[-- Type: text/plain, Size: 512 bytes --]

---
 arch/i386/lib/mmx.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.17.13/arch/i386/lib/mmx.c
===================================================================
--- linux-2.6.17.13.orig/arch/i386/lib/mmx.c
+++ linux-2.6.17.13/arch/i386/lib/mmx.c
@@ -32,7 +32,7 @@ void *_mmx_memcpy(void *to, const void *
 	void *p;
 	int i;
 
-	if (unlikely(in_interrupt()))
+	if (unlikely(!ipipe_root_domain_p || in_interrupt()))
 		return __memcpy(to, from, len);
 
 	p = to;

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03 12:21             ` Jan Kiszka
@ 2006-11-03 17:22               ` Jeff Webb
  2006-11-06  9:04                 ` Gilles Chanteperdrix
  2006-11-04 14:30               ` Philippe Gerum
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-03 17:22 UTC (permalink / raw)
  To: Xenomai help

Jan Kiszka wrote:
>>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
> ... 
> True.
> 
> This patch fixes the issue for me.

Works for me as well on my Athlon64 X2 machine.

Many thanks for hunting this down, Jan.

-Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03 12:21             ` Jan Kiszka
  2006-11-03 17:22               ` Jeff Webb
@ 2006-11-04 14:30               ` Philippe Gerum
  1 sibling, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2006-11-04 14:30 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai help

On Fri, 2006-11-03 at 13:21 +0100, Jan Kiszka wrote:

[...]

> This patch fixes the issue for me.
> 
> Jan
> plain text document attachment (disable-mmx_memcpy.patch)
> ---
>  arch/i386/lib/mmx.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6.17.13/arch/i386/lib/mmx.c
> ===================================================================
> --- linux-2.6.17.13.orig/arch/i386/lib/mmx.c
> +++ linux-2.6.17.13/arch/i386/lib/mmx.c
> @@ -32,7 +32,7 @@ void *_mmx_memcpy(void *to, const void *
>  	void *p;
>  	int i;
>  
> -	if (unlikely(in_interrupt()))
> +	if (unlikely(!ipipe_root_domain_p || in_interrupt()))
>  		return __memcpy(to, from, len);
>  
>  	p = to;

Merged, thanks.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
  2006-11-03 17:22               ` Jeff Webb
@ 2006-11-06  9:04                 ` Gilles Chanteperdrix
  0 siblings, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-06  9:04 UTC (permalink / raw)
  To: Jeff Webb; +Cc: Xenomai help

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

Jeff Webb wrote:
> Jan Kiszka wrote:
> 
>>>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy)
>>
>> ... True.
>>
>> This patch fixes the issue for me.
> 
> 
> Works for me as well on my Athlon64 X2 machine.

To see if trying to use this mmx_memcpy is worth the trouble, I made a
test program to benchmark __memcpy versus _mmx_memcpy. Could you try
it on AMD ?

-- 
                                                  Gilles Chanteperdrix

[-- Attachment #2: test_memcpy.c --]
[-- Type: text/x-csrc, Size: 6278 bytes --]

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <sys/io.h>		/* iopl */
#include <sys/mman.h>		/* mlockall */

#define unlikely(expr) (__builtin_expect((expr), 0))
#include <asm/processor.h>

#define COUNT 1000
#define SIZE 512

#define hw_cli() \
	__asm__ __volatile__ ("cli")

#define hw_sti() \
	__asm__ __volatile__ ("sti")

void *_mmx_memcpy_prefetch(void *to, const void *from, size_t len);

void *_mmx_memcpy(void *to, const void *from, size_t len);

static inline __attribute__((always_inline)) void * __memcpy(void * to, const void * from, size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
	"rep ; movsl\n\t"
	"movl %4,%%ecx\n\t"
	"andl $3,%%ecx\n\t"
#if 1	/* want to pay 2 byte penalty for a chance to skip microcoded rep? */
	"jz 1f\n\t"
#endif
	"rep ; movsb\n\t"
	"1:"
	: "=&c" (d0), "=&D" (d1), "=&S" (d2)
	: "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from)
	: "memory");
return (to);
}

jmp_buf jmpbuf;

void sigill_handler(int sig __attribute__((unused)))
{
	longjmp(jmpbuf, 1);
}

int main(void)
{
	char src[SIZE];
	char dst[SIZE];
	unsigned long long begin, end;
	double d;
	unsigned i, use_prefetch;
	
	if (iopl(3)) {
		perror("iopl(3)");
		return EXIT_FAILURE;
	}

	if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
		perror("mlockall");
		return EXIT_FAILURE;
	}

	memset(src, '\0', sizeof(src));
	memset(dst, '\0', sizeof(src));

	if (signal(SIGILL, sigill_handler) == SIG_ERR) {
		perror("signal");
		return EXIT_FAILURE;
	}

	if (!setjmp(jmpbuf)) {
		use_prefetch = 1;

		__asm__ __volatile__ ("prefetch (%0)"
				      : /* no out */ : "r" (src));
	} else
		use_prefetch = 0;

	if (signal(SIGILL, SIG_DFL) == SIG_ERR) {
		perror("signal");
		return EXIT_FAILURE;
	}

	hw_cli();
	rdtscll(begin);
	for (i = 0; i < COUNT; i++)
		memcpy(dst, src, sizeof(dst));
	rdtscll(end);
	hw_sti();
	
	printf("libc memcpy: %llu\n", (end - begin)/COUNT);

	hw_cli();
	rdtscll(begin);
	for (i = 0; i < COUNT; i++)
		__memcpy(dst, src, sizeof(dst));
	rdtscll(end);
	hw_sti();
	
	printf("__memcpy: %llu\n", (end - begin)/COUNT);

	d = 0;
	for (i = 0; i < COUNT; i++) /* use fpu in order to avoid a fault when
				     * fxsave is called. */
		d += 0.1;

	if (use_prefetch) {
		hw_cli();
		rdtscll(begin);
		for (i = 0; i < COUNT; i++)
			_mmx_memcpy_prefetch(dst, src, sizeof(dst));
		rdtscll(end);
		hw_sti();

		printf("_mmx_memcpy(with prefetch): %llu\n",
		       (end - begin)/COUNT);
	} else {
		hw_cli();
		rdtscll(begin);
		for (i = 0; i < COUNT; i++)
			_mmx_memcpy(dst, src, sizeof(dst));
		rdtscll(end);
		hw_sti();

		printf("_mmx_memcpy(without prefetch): %llu\n",
		       (end - begin)/COUNT);
	}

	printf("d: %g\n", d);	/* Use d to avoid it being optimized out. */

	return EXIT_SUCCESS;
}

__attribute__((noinline)) void *_mmx_memcpy_prefetch(void *to, const void *from, size_t len)
{
	struct i387_fxsave_struct fxsave;
	char pad[15] __attribute__((unused));
	struct i387_fxsave_struct *fpenv =
		(struct i387_fxsave_struct *) (((unsigned) &fxsave + 15) & ~15);
	void *p;
	int i;

	p = to;
	i = len >> 6; /* len/64 */

	__asm__ __volatile__ ("fxsave %0; fnclex":"=m"(*fpenv));

	__asm__ __volatile__ (
		"   prefetch (%0)\n"		/* This set is 28 bytes */
		"   prefetch 64(%0)\n"
		"   prefetch 128(%0)\n"
		"   prefetch 192(%0)\n"
		"   prefetch 256(%0)\n"
		: /* no out */ : "r" (from) );
	
	for(; i>5; i--)
	{
		__asm__ __volatile__ (
		"  prefetch 320(%0)\n"
		"  movq (%0), %%mm0\n"
		"  movq 8(%0), %%mm1\n"
		"  movq 16(%0), %%mm2\n"
		"  movq 24(%0), %%mm3\n"
		"  movq %%mm0, (%1)\n"
		"  movq %%mm1, 8(%1)\n"
		"  movq %%mm2, 16(%1)\n"
		"  movq %%mm3, 24(%1)\n"
		"  movq 32(%0), %%mm0\n"
		"  movq 40(%0), %%mm1\n"
		"  movq 48(%0), %%mm2\n"
		"  movq 56(%0), %%mm3\n"
		"  movq %%mm0, 32(%1)\n"
		"  movq %%mm1, 40(%1)\n"
		"  movq %%mm2, 48(%1)\n"
		"  movq %%mm3, 56(%1)\n"
		: /* no out */ : "r" (from), "r" (to) : "memory");
		from+=64;
		to+=64;
	}

	for(; i>0; i--)
	{
		__asm__ __volatile__ (
		"  movq (%0), %%mm0\n"
		"  movq 8(%0), %%mm1\n"
		"  movq 16(%0), %%mm2\n"
		"  movq 24(%0), %%mm3\n"
		"  movq %%mm0, (%1)\n"
		"  movq %%mm1, 8(%1)\n"
		"  movq %%mm2, 16(%1)\n"
		"  movq %%mm3, 24(%1)\n"
		"  movq 32(%0), %%mm0\n"
		"  movq 40(%0), %%mm1\n"
		"  movq 48(%0), %%mm2\n"
		"  movq 56(%0), %%mm3\n"
		"  movq %%mm0, 32(%1)\n"
		"  movq %%mm1, 40(%1)\n"
		"  movq %%mm2, 48(%1)\n"
		"  movq %%mm3, 56(%1)\n"
		: /* no out */ : "r" (from), "r" (to) : "memory");
		from+=64;
		to+=64;
	}
	/*
	 *	Now do the tail of the block
	 */
	__memcpy(to, from, len&63);

	__asm__ __volatile__ ("fxrstor %0" : /* no out */ : "m"(*fpenv));

	return p;
}

__attribute__((noinline)) void *_mmx_memcpy(void *to, const void *from, size_t len)
{
	struct i387_fxsave_struct fxsave;
	char pad[15] __attribute__((unused));
	struct i387_fxsave_struct *fpenv =
		(struct i387_fxsave_struct *) (((unsigned) &fxsave + 15) & ~15);
	void *p;
	int i;

	p = to;
	i = len >> 6; /* len/64 */

	__asm__ __volatile__ ("fxsave %0; fnclex":"=m"(*fpenv));

	for(; i>5; i--)
	{
		__asm__ __volatile__ (
		"  movq (%0), %%mm0\n"
		"  movq 8(%0), %%mm1\n"
		"  movq 16(%0), %%mm2\n"
		"  movq 24(%0), %%mm3\n"
		"  movq %%mm0, (%1)\n"
		"  movq %%mm1, 8(%1)\n"
		"  movq %%mm2, 16(%1)\n"
		"  movq %%mm3, 24(%1)\n"
		"  movq 32(%0), %%mm0\n"
		"  movq 40(%0), %%mm1\n"
		"  movq 48(%0), %%mm2\n"
		"  movq 56(%0), %%mm3\n"
		"  movq %%mm0, 32(%1)\n"
		"  movq %%mm1, 40(%1)\n"
		"  movq %%mm2, 48(%1)\n"
		"  movq %%mm3, 56(%1)\n"
		: /* no out */ : "r" (from), "r" (to) : "memory");
		from+=64;
		to+=64;
	}

	for(; i>0; i--)
	{
		__asm__ __volatile__ (
		"  movq (%0), %%mm0\n"
		"  movq 8(%0), %%mm1\n"
		"  movq 16(%0), %%mm2\n"
		"  movq 24(%0), %%mm3\n"
		"  movq %%mm0, (%1)\n"
		"  movq %%mm1, 8(%1)\n"
		"  movq %%mm2, 16(%1)\n"
		"  movq %%mm3, 24(%1)\n"
		"  movq 32(%0), %%mm0\n"
		"  movq 40(%0), %%mm1\n"
		"  movq 48(%0), %%mm2\n"
		"  movq 56(%0), %%mm3\n"
		"  movq %%mm0, 32(%1)\n"
		"  movq %%mm1, 40(%1)\n"
		"  movq %%mm2, 48(%1)\n"
		"  movq %%mm3, 56(%1)\n"
		: /* no out */ : "r" (from), "r" (to) : "memory");
		from+=64;
		to+=64;
	}
	/*
	 *	Now do the tail of the block
	 */
	__memcpy(to, from, len&63);

	__asm__ __volatile__ ("fxrstor %0" : /* no out */ : "m"(*fpenv));

	return p;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-11-06  9:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-01 15:46 [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] Jeff Webb
2006-11-01 19:42 ` Jan Kiszka
2006-11-01 20:21   ` Jeff Webb
2006-11-03  9:45     ` Jan Kiszka
2006-11-03 10:01       ` Gilles Chanteperdrix
2006-11-03 10:11         ` Jan Kiszka
2006-11-03 10:19           ` Gilles Chanteperdrix
2006-11-03 12:21             ` Jan Kiszka
2006-11-03 17:22               ` Jeff Webb
2006-11-06  9:04                 ` Gilles Chanteperdrix
2006-11-04 14:30               ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.