* [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context]
@ 2006-11-01 15:46 Jeff Webb
2006-11-01 19:42 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Webb @ 2006-11-01 15:46 UTC (permalink / raw)
To: Xenomai help
Jeff Webb wrote:
> If I run the attached program, I get the following result:
>
> [root]# ./mqtest2
> CPU time limit exceeded
>
> The kernel log contains:
>
> Oct 25 14:13:03 kernel: invalid use of FPU in Xenomai context at
> 0x80492f6
I have another piece of information on this problem. I was able to test the program on two other machines: an Athlon XP, and a PIII laptop. The program works on the PIII, but fails on the Athlon XP. I then compiled a new kernel, selecting the "Pentium-Pro" processor family instead of the "Athlon/Duron/K7" processor family. The mqtest2 program now works on my Athlon64 X2 and Athlon XP systems, if I use the "Pentium-Pro" kernel.
Here is a summary of what I tried:
Machine #1: AMD Athlon(tm)64 X2 Dual Core Processor 4400+
OS: Fedora Core 5
a) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
b) Linux SMP 2.6.17.13 (K7 config) / Xenomai trunk r1749 + patch -> mqtest2 fails
patch = https://mail.gna.org/public/xenomai-core/2006-10/msg00069.html
c) Linux UP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails
d) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.1 -> mqtest2 fails
e) Linux UP 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
Machine #2: AMD Athlon(tm) XP 3200+
a) Fedora Core 1 / Linux 2.4.32 (K7 config) / Xenomai 2.2.3 -> mqtest2 fails
b) Fedora Core 5 / Linux 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
Machine #3: Intel Pentium III Mobile CPU
OS: Debian Unstable (old)
Linux 2.4.33.3 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS!
So, it seems that a bug is introduced when compiling for the AMD K7 family. Is the mqtest2 problem a compiler optimization bug, a Linux bug, a Linux configuration problem, a Xenomai bug, or a problem in my code? Any ideas on how to proceed? I am not familiar with the Xenomai internals or low-level x86 code, but I will do what I can to help debug this.
Does anyone else have an AMD system that can verify my results?
The mqtest2.c program in question was attached here:
https://mail.gna.org/public/xenomai-help/2006-10/msg00147.html
The problem seems to be connected with the size of writes to Xenomai pipes. This example uses POSIX message queues, but I had a similar problem a while back with RTAI pipes. Maybe this tells us the problem is in the nucleus pipe code? Just a guess. The problem seems to affect both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1.
Thanks,
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-01 15:46 [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] Jeff Webb @ 2006-11-01 19:42 ` Jan Kiszka 2006-11-01 20:21 ` Jeff Webb 0 siblings, 1 reply; 11+ messages in thread From: Jan Kiszka @ 2006-11-01 19:42 UTC (permalink / raw) To: Jeff Webb; +Cc: Xenomai help [-- Attachment #1: Type: text/plain, Size: 3078 bytes --] Jeff Webb wrote: > Jeff Webb wrote: >> If I run the attached program, I get the following result: >> >> [root]# ./mqtest2 >> CPU time limit exceeded >> >> The kernel log contains: >> >> Oct 25 14:13:03 kernel: invalid use of FPU in Xenomai context at >> 0x80492f6 > > I have another piece of information on this problem. I was able to test > the program on two other machines: an Athlon XP, and a PIII laptop. The > program works on the PIII, but fails on the Athlon XP. I then compiled > a new kernel, selecting the "Pentium-Pro" processor family instead of > the "Athlon/Duron/K7" processor family. The mqtest2 program now works > on my Athlon64 X2 and Athlon XP systems, if I use the "Pentium-Pro" kernel. > > Here is a summary of what I tried: > > Machine #1: AMD Athlon(tm)64 X2 Dual Core Processor 4400+ > OS: Fedora Core 5 > a) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails > b) Linux SMP 2.6.17.13 (K7 config) / Xenomai trunk r1749 + patch -> > mqtest2 fails > patch = https://mail.gna.org/public/xenomai-core/2006-10/msg00069.html > c) Linux UP 2.6.17.13 (K7 config) / Xenomai 2.2.4 -> mqtest2 fails > d) Linux SMP 2.6.17.13 (K7 config) / Xenomai 2.2.1 -> mqtest2 fails > e) Linux UP 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS! > > Machine #2: AMD Athlon(tm) XP 3200+ > a) Fedora Core 1 / Linux 2.4.32 (K7 config) / Xenomai 2.2.3 -> mqtest2 > fails > b) Fedora Core 5 / Linux 2.6.17.13 (ppro config) / Xenomai 2.2.4 -> > mqtest2 WORKS! > > Machine #3: Intel Pentium III Mobile CPU > OS: Debian Unstable (old) > Linux 2.4.33.3 (ppro config) / Xenomai 2.2.4 -> mqtest2 WORKS! > > So, it seems that a bug is introduced when compiling for the AMD K7 > family. Is the mqtest2 problem a compiler optimization bug, a Linux > bug, a Linux configuration problem, a Xenomai bug, or a problem in my > code? Any ideas on how to proceed? I am not familiar with the Xenomai > internals or low-level x86 code, but I will do what I can to help debug > this. > > Does anyone else have an AMD system that can verify my results? I have an old Athlon 800. Maybe we are lucky and it exposes the problem when the kernel is optimised for it. I'm going to give this a try, but it may take a few days (and a free time slot). > > The mqtest2.c program in question was attached here: > https://mail.gna.org/public/xenomai-help/2006-10/msg00147.html > > The problem seems to be connected with the size of writes to Xenomai > pipes. This example uses POSIX message queues, but I had a similar > problem a while back with RTAI pipes. Maybe this tells us the problem > is in the nucleus pipe code? Just a guess. The problem seems to affect > both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there is still /at least/ one piece missing in the puzzle. BTW, did you already write what compiler version you are using for these tests (/me too lazy to search the archives)? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-01 19:42 ` Jan Kiszka @ 2006-11-01 20:21 ` Jeff Webb 2006-11-03 9:45 ` Jan Kiszka 0 siblings, 1 reply; 11+ messages in thread From: Jeff Webb @ 2006-11-01 20:21 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai help Jan Kiszka wrote: > Jeff Webb wrote: >> Does anyone else have an AMD system that can verify my results? > > I have an old Athlon 800. Maybe we are lucky and it exposes the problem > when the kernel is optimised for it. I'm going to give this a try, but > it may take a few days (and a free time slot). Thank you. I appreciate you giving it a try when you get some free time. I was able to work around the problem by writing the queue data in smaller chunks (or use an i686 kernel), so I am not in urgent need of an immediate fix. I do think it's important to fix this bug eventually, so I didn't want it to slip through the cracks. >> The problem seems to be connected with the size of writes to Xenomai >> pipes. This example uses POSIX message queues, but I had a similar >> problem a while back with RTAI pipes. Maybe this tells us the problem >> is in the nucleus pipe code? Just a guess. The problem seems to affect >> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. > > Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there > is still /at least/ one piece missing in the puzzle. True. It is very strange that the amount of data in the write call ends up affecting the FPU context. > BTW, did you already write what compiler version you are using for these > tests (/me too lazy to search the archives)? I forgot to include this. I compiled with at least three versions of gcc: Machine #1: FC5 : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1) Machine #2: FC1 : gcc version 3.3.2 20031022 (Red Hat Linux 3.3.2-1) FC5 : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1) Machine #3: debian : gcc version 3.3.6 (Debian 1:3.3.6-13) Thanks again, -Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-01 20:21 ` Jeff Webb @ 2006-11-03 9:45 ` Jan Kiszka 2006-11-03 10:01 ` Gilles Chanteperdrix 0 siblings, 1 reply; 11+ messages in thread From: Jan Kiszka @ 2006-11-03 9:45 UTC (permalink / raw) To: Jeff Webb, Philippe Gerum, Gilles Chanteperdrix; +Cc: Xenomai help [-- Attachment #1: Type: text/plain, Size: 1954 bytes --] Jeff Webb wrote: > Jan Kiszka wrote: >> Jeff Webb wrote: >>> Does anyone else have an AMD system that can verify my results? >> >> I have an old Athlon 800. Maybe we are lucky and it exposes the problem >> when the kernel is optimised for it. I'm going to give this a try, but >> it may take a few days (and a free time slot). > > Thank you. I appreciate you giving it a try when you get some free > time. I was able to work around the problem by writing the queue data > in smaller chunks (or use an i686 kernel), so I am not in urgent need of > an immediate fix. I do think it's important to fix this bug eventually, > so I didn't want it to slip through the cracks. > >>> The problem seems to be connected with the size of writes to Xenomai >>> pipes. This example uses POSIX message queues, but I had a similar >>> problem a while back with RTAI pipes. Maybe this tells us the problem >>> is in the nucleus pipe code? Just a guess. The problem seems to affect >>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. >> >> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there >> is still /at least/ one piece missing in the puzzle. > > True. It is very strange that the amount of data in the write call ends > up affecting the FPU context. I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) http://lxr.free-electrons.com/source/include/asm-i386/string.h#285 It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512 bytes. It messes up with the FPU state and may get trapped by other issues as well (blind access to "current" in order to test in_interrupt()). I don't have an answer for this right now beyond "don't switch on AMD optimisations when using Xenomai". But that's a bit unsatisfying. Another way would be to wrap any memcpy access from Xenomai context, but that's likely impractical (think of all the drivers). Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 9:45 ` Jan Kiszka @ 2006-11-03 10:01 ` Gilles Chanteperdrix 2006-11-03 10:11 ` Jan Kiszka 0 siblings, 1 reply; 11+ messages in thread From: Gilles Chanteperdrix @ 2006-11-03 10:01 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai help Jan Kiszka wrote: > Jeff Webb wrote: > >>Jan Kiszka wrote: >> >>>Jeff Webb wrote: >>> >>>>Does anyone else have an AMD system that can verify my results? >>> >>>I have an old Athlon 800. Maybe we are lucky and it exposes the problem >>>when the kernel is optimised for it. I'm going to give this a try, but >>>it may take a few days (and a free time slot). >> >>Thank you. I appreciate you giving it a try when you get some free >>time. I was able to work around the problem by writing the queue data >>in smaller chunks (or use an i686 kernel), so I am not in urgent need of >>an immediate fix. I do think it's important to fix this bug eventually, >>so I didn't want it to slip through the cracks. >> >> >>>>The problem seems to be connected with the size of writes to Xenomai >>>>pipes. This example uses POSIX message queues, but I had a similar >>>>problem a while back with RTAI pipes. Maybe this tells us the problem >>>>is in the nucleus pipe code? Just a guess. The problem seems to affect >>>>both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. >>> >>>Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there >>>is still /at least/ one piece missing in the puzzle. >> >>True. It is very strange that the amount of data in the write call ends >>up affecting the FPU context. > > > I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) > > http://lxr.free-electrons.com/source/include/asm-i386/string.h#285 > > It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512 > bytes. It messes up with the FPU state and may get trapped by other > issues as well (blind access to "current" in order to test > in_interrupt()). I don't have an answer for this right now beyond "don't > switch on AMD optimisations when using Xenomai". But that's a bit > unsatisfying. > > Another way would be to wrap any memcpy access from Xenomai context, but > that's likely impractical (think of all the drivers). I see other ways to solve this issue: - either we disable the use of the mmx memcpy in string.h if ipipe_current_domain is not root - or we allow the exception to happen for threads in primary mode with the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in this case. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 10:01 ` Gilles Chanteperdrix @ 2006-11-03 10:11 ` Jan Kiszka 2006-11-03 10:19 ` Gilles Chanteperdrix 0 siblings, 1 reply; 11+ messages in thread From: Jan Kiszka @ 2006-11-03 10:11 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai help [-- Attachment #1: Type: text/plain, Size: 2623 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Jeff Webb wrote: >> >>> Jan Kiszka wrote: >>> >>>> Jeff Webb wrote: >>>> >>>>> Does anyone else have an AMD system that can verify my results? >>>> >>>> I have an old Athlon 800. Maybe we are lucky and it exposes the problem >>>> when the kernel is optimised for it. I'm going to give this a try, but >>>> it may take a few days (and a free time slot). >>> >>> Thank you. I appreciate you giving it a try when you get some free >>> time. I was able to work around the problem by writing the queue data >>> in smaller chunks (or use an i686 kernel), so I am not in urgent need of >>> an immediate fix. I do think it's important to fix this bug eventually, >>> so I didn't want it to slip through the cracks. >>> >>> >>>>> The problem seems to be connected with the size of writes to Xenomai >>>>> pipes. This example uses POSIX message queues, but I had a similar >>>>> problem a while back with RTAI pipes. Maybe this tells us the problem >>>>> is in the nucleus pipe code? Just a guess. The problem seems to >>>>> affect >>>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. >>>> >>>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there >>>> is still /at least/ one piece missing in the puzzle. >>> >>> True. It is very strange that the amount of data in the write call ends >>> up affecting the FPU context. >> >> >> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) >> >> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285 >> >> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512 >> bytes. It messes up with the FPU state and may get trapped by other >> issues as well (blind access to "current" in order to test >> in_interrupt()). I don't have an answer for this right now beyond "don't >> switch on AMD optimisations when using Xenomai". But that's a bit >> unsatisfying. >> >> Another way would be to wrap any memcpy access from Xenomai context, but >> that's likely impractical (think of all the drivers). > > I see other ways to solve this issue: > - either we disable the use of the mmx memcpy in string.h if > ipipe_current_domain is not root This is what came to my mind as well meanwhile. > - or we allow the exception to happen for threads in primary mode with > the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in > this case. Given that this is a special case for a subset of x86[_64] CPUs, I rather think we should go for the first variant. Should be simpler. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 10:11 ` Jan Kiszka @ 2006-11-03 10:19 ` Gilles Chanteperdrix 2006-11-03 12:21 ` Jan Kiszka 0 siblings, 1 reply; 11+ messages in thread From: Gilles Chanteperdrix @ 2006-11-03 10:19 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai help Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > >>Jan Kiszka wrote: >> >>>Jeff Webb wrote: >>> >>> >>>>Jan Kiszka wrote: >>>> >>>> >>>>>Jeff Webb wrote: >>>>> >>>>> >>>>>>Does anyone else have an AMD system that can verify my results? >>>>> >>>>>I have an old Athlon 800. Maybe we are lucky and it exposes the problem >>>>>when the kernel is optimised for it. I'm going to give this a try, but >>>>>it may take a few days (and a free time slot). >>>> >>>>Thank you. I appreciate you giving it a try when you get some free >>>>time. I was able to work around the problem by writing the queue data >>>>in smaller chunks (or use an i686 kernel), so I am not in urgent need of >>>>an immediate fix. I do think it's important to fix this bug eventually, >>>>so I didn't want it to slip through the cracks. >>>> >>>> >>>> >>>>>>The problem seems to be connected with the size of writes to Xenomai >>>>>>pipes. This example uses POSIX message queues, but I had a similar >>>>>>problem a while back with RTAI pipes. Maybe this tells us the problem >>>>>>is in the nucleus pipe code? Just a guess. The problem seems to >>>>>>affect >>>>>>both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. >>>>> >>>>>Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so there >>>>>is still /at least/ one piece missing in the puzzle. >>>> >>>>True. It is very strange that the amount of data in the write call ends >>>>up affecting the FPU context. >>> >>> >>>I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) >>> >>>http://lxr.free-electrons.com/source/include/asm-i386/string.h#285 >>> >>>It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512 >>>bytes. It messes up with the FPU state and may get trapped by other >>>issues as well (blind access to "current" in order to test >>>in_interrupt()). I don't have an answer for this right now beyond "don't >>>switch on AMD optimisations when using Xenomai". But that's a bit >>>unsatisfying. >>> >>>Another way would be to wrap any memcpy access from Xenomai context, but >>>that's likely impractical (think of all the drivers). >> >>I see other ways to solve this issue: >>- either we disable the use of the mmx memcpy in string.h if >>ipipe_current_domain is not root > > > This is what came to my mind as well meanwhile. > > >>- or we allow the exception to happen for threads in primary mode with >>the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in >>this case. > > > Given that this is a special case for a subset of x86[_64] CPUs, I > rather think we should go for the first variant. Should be simpler. This second way would not work correctly with kernel-space threads, so, if we wanted to implement it, we would still need to disable the mmx memcpy in string.h for kernel-space threads, i.e. if current is not safe. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 10:19 ` Gilles Chanteperdrix @ 2006-11-03 12:21 ` Jan Kiszka 2006-11-03 17:22 ` Jeff Webb 2006-11-04 14:30 ` Philippe Gerum 0 siblings, 2 replies; 11+ messages in thread From: Jan Kiszka @ 2006-11-03 12:21 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai help [-- Attachment #1.1: Type: text/plain, Size: 3193 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >> >>> Jan Kiszka wrote: >>> >>>> Jeff Webb wrote: >>>> >>>> >>>>> Jan Kiszka wrote: >>>>> >>>>> >>>>>> Jeff Webb wrote: >>>>>> >>>>>> >>>>>>> Does anyone else have an AMD system that can verify my results? >>>>>> >>>>>> I have an old Athlon 800. Maybe we are lucky and it exposes the >>>>>> problem >>>>>> when the kernel is optimised for it. I'm going to give this a try, >>>>>> but >>>>>> it may take a few days (and a free time slot). >>>>> >>>>> Thank you. I appreciate you giving it a try when you get some free >>>>> time. I was able to work around the problem by writing the queue data >>>>> in smaller chunks (or use an i686 kernel), so I am not in urgent >>>>> need of >>>>> an immediate fix. I do think it's important to fix this bug >>>>> eventually, >>>>> so I didn't want it to slip through the cracks. >>>>> >>>>> >>>>> >>>>>>> The problem seems to be connected with the size of writes to Xenomai >>>>>>> pipes. This example uses POSIX message queues, but I had a similar >>>>>>> problem a while back with RTAI pipes. Maybe this tells us the >>>>>>> problem >>>>>>> is in the nucleus pipe code? Just a guess. The problem seems to >>>>>>> affect >>>>>>> both 2.4 and 2.6 systems, and goes back to at least Xenomai 2.2.1. >>>>>> >>>>>> Maybe, maybe not. Pipes remain fairly unrelated to FPU usage, so >>>>>> there >>>>>> is still /at least/ one piece missing in the puzzle. >>>>> >>>>> True. It is very strange that the amount of data in the write call >>>>> ends >>>>> up affecting the FPU context. >>>> >>>> >>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) >>>> >>>> http://lxr.free-electrons.com/source/include/asm-i386/string.h#285 >>>> >>>> It's an optimised memcpy for 3DNow CPUs that is used with blocks >= 512 >>>> bytes. It messes up with the FPU state and may get trapped by other >>>> issues as well (blind access to "current" in order to test >>>> in_interrupt()). I don't have an answer for this right now beyond >>>> "don't >>>> switch on AMD optimisations when using Xenomai". But that's a bit >>>> unsatisfying. >>>> >>>> Another way would be to wrap any memcpy access from Xenomai context, >>>> but >>>> that's likely impractical (think of all the drivers). >>> >>> I see other ways to solve this issue: >>> - either we disable the use of the mmx memcpy in string.h if >>> ipipe_current_domain is not root >> >> >> This is what came to my mind as well meanwhile. >> >> >>> - or we allow the exception to happen for threads in primary mode with >>> the XNFPU bit set and call xnarch_restore_fpu in xnpod_fault_handler in >>> this case. >> >> >> Given that this is a special case for a subset of x86[_64] CPUs, I >> rather think we should go for the first variant. Should be simpler. > > This second way would not work correctly with kernel-space threads, so, > if we wanted to implement it, we would still need to disable the mmx > memcpy in string.h for kernel-space threads, i.e. if current is not > safe. > True. This patch fixes the issue for me. Jan [-- Attachment #1.2: disable-mmx_memcpy.patch --] [-- Type: text/plain, Size: 512 bytes --] --- arch/i386/lib/mmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.17.13/arch/i386/lib/mmx.c =================================================================== --- linux-2.6.17.13.orig/arch/i386/lib/mmx.c +++ linux-2.6.17.13/arch/i386/lib/mmx.c @@ -32,7 +32,7 @@ void *_mmx_memcpy(void *to, const void * void *p; int i; - if (unlikely(in_interrupt())) + if (unlikely(!ipipe_root_domain_p || in_interrupt())) return __memcpy(to, from, len); p = to; [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 12:21 ` Jan Kiszka @ 2006-11-03 17:22 ` Jeff Webb 2006-11-06 9:04 ` Gilles Chanteperdrix 2006-11-04 14:30 ` Philippe Gerum 1 sibling, 1 reply; 11+ messages in thread From: Jeff Webb @ 2006-11-03 17:22 UTC (permalink / raw) To: Xenomai help Jan Kiszka wrote: >>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) > ... > True. > > This patch fixes the issue for me. Works for me as well on my Athlon64 X2 machine. Many thanks for hunting this down, Jan. -Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 17:22 ` Jeff Webb @ 2006-11-06 9:04 ` Gilles Chanteperdrix 0 siblings, 0 replies; 11+ messages in thread From: Gilles Chanteperdrix @ 2006-11-06 9:04 UTC (permalink / raw) To: Jeff Webb; +Cc: Xenomai help [-- Attachment #1: Type: text/plain, Size: 457 bytes --] Jeff Webb wrote: > Jan Kiszka wrote: > >>>>>> I found the reason: "3-dimensional" memcpy (__memcpy3d/_mmx_memcpy) >> >> ... True. >> >> This patch fixes the issue for me. > > > Works for me as well on my Athlon64 X2 machine. To see if trying to use this mmx_memcpy is worth the trouble, I made a test program to benchmark __memcpy versus _mmx_memcpy. Could you try it on AMD ? -- Gilles Chanteperdrix [-- Attachment #2: test_memcpy.c --] [-- Type: text/x-csrc, Size: 6278 bytes --] #include <stdlib.h> #include <stdio.h> #include <string.h> #include <signal.h> #include <setjmp.h> #include <sys/io.h> /* iopl */ #include <sys/mman.h> /* mlockall */ #define unlikely(expr) (__builtin_expect((expr), 0)) #include <asm/processor.h> #define COUNT 1000 #define SIZE 512 #define hw_cli() \ __asm__ __volatile__ ("cli") #define hw_sti() \ __asm__ __volatile__ ("sti") void *_mmx_memcpy_prefetch(void *to, const void *from, size_t len); void *_mmx_memcpy(void *to, const void *from, size_t len); static inline __attribute__((always_inline)) void * __memcpy(void * to, const void * from, size_t n) { int d0, d1, d2; __asm__ __volatile__( "rep ; movsl\n\t" "movl %4,%%ecx\n\t" "andl $3,%%ecx\n\t" #if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */ "jz 1f\n\t" #endif "rep ; movsb\n\t" "1:" : "=&c" (d0), "=&D" (d1), "=&S" (d2) : "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from) : "memory"); return (to); } jmp_buf jmpbuf; void sigill_handler(int sig __attribute__((unused))) { longjmp(jmpbuf, 1); } int main(void) { char src[SIZE]; char dst[SIZE]; unsigned long long begin, end; double d; unsigned i, use_prefetch; if (iopl(3)) { perror("iopl(3)"); return EXIT_FAILURE; } if (mlockall(MCL_CURRENT | MCL_FUTURE)) { perror("mlockall"); return EXIT_FAILURE; } memset(src, '\0', sizeof(src)); memset(dst, '\0', sizeof(src)); if (signal(SIGILL, sigill_handler) == SIG_ERR) { perror("signal"); return EXIT_FAILURE; } if (!setjmp(jmpbuf)) { use_prefetch = 1; __asm__ __volatile__ ("prefetch (%0)" : /* no out */ : "r" (src)); } else use_prefetch = 0; if (signal(SIGILL, SIG_DFL) == SIG_ERR) { perror("signal"); return EXIT_FAILURE; } hw_cli(); rdtscll(begin); for (i = 0; i < COUNT; i++) memcpy(dst, src, sizeof(dst)); rdtscll(end); hw_sti(); printf("libc memcpy: %llu\n", (end - begin)/COUNT); hw_cli(); rdtscll(begin); for (i = 0; i < COUNT; i++) __memcpy(dst, src, sizeof(dst)); rdtscll(end); hw_sti(); printf("__memcpy: %llu\n", (end - begin)/COUNT); d = 0; for (i = 0; i < COUNT; i++) /* use fpu in order to avoid a fault when * fxsave is called. */ d += 0.1; if (use_prefetch) { hw_cli(); rdtscll(begin); for (i = 0; i < COUNT; i++) _mmx_memcpy_prefetch(dst, src, sizeof(dst)); rdtscll(end); hw_sti(); printf("_mmx_memcpy(with prefetch): %llu\n", (end - begin)/COUNT); } else { hw_cli(); rdtscll(begin); for (i = 0; i < COUNT; i++) _mmx_memcpy(dst, src, sizeof(dst)); rdtscll(end); hw_sti(); printf("_mmx_memcpy(without prefetch): %llu\n", (end - begin)/COUNT); } printf("d: %g\n", d); /* Use d to avoid it being optimized out. */ return EXIT_SUCCESS; } __attribute__((noinline)) void *_mmx_memcpy_prefetch(void *to, const void *from, size_t len) { struct i387_fxsave_struct fxsave; char pad[15] __attribute__((unused)); struct i387_fxsave_struct *fpenv = (struct i387_fxsave_struct *) (((unsigned) &fxsave + 15) & ~15); void *p; int i; p = to; i = len >> 6; /* len/64 */ __asm__ __volatile__ ("fxsave %0; fnclex":"=m"(*fpenv)); __asm__ __volatile__ ( " prefetch (%0)\n" /* This set is 28 bytes */ " prefetch 64(%0)\n" " prefetch 128(%0)\n" " prefetch 192(%0)\n" " prefetch 256(%0)\n" : /* no out */ : "r" (from) ); for(; i>5; i--) { __asm__ __volatile__ ( " prefetch 320(%0)\n" " movq (%0), %%mm0\n" " movq 8(%0), %%mm1\n" " movq 16(%0), %%mm2\n" " movq 24(%0), %%mm3\n" " movq %%mm0, (%1)\n" " movq %%mm1, 8(%1)\n" " movq %%mm2, 16(%1)\n" " movq %%mm3, 24(%1)\n" " movq 32(%0), %%mm0\n" " movq 40(%0), %%mm1\n" " movq 48(%0), %%mm2\n" " movq 56(%0), %%mm3\n" " movq %%mm0, 32(%1)\n" " movq %%mm1, 40(%1)\n" " movq %%mm2, 48(%1)\n" " movq %%mm3, 56(%1)\n" : /* no out */ : "r" (from), "r" (to) : "memory"); from+=64; to+=64; } for(; i>0; i--) { __asm__ __volatile__ ( " movq (%0), %%mm0\n" " movq 8(%0), %%mm1\n" " movq 16(%0), %%mm2\n" " movq 24(%0), %%mm3\n" " movq %%mm0, (%1)\n" " movq %%mm1, 8(%1)\n" " movq %%mm2, 16(%1)\n" " movq %%mm3, 24(%1)\n" " movq 32(%0), %%mm0\n" " movq 40(%0), %%mm1\n" " movq 48(%0), %%mm2\n" " movq 56(%0), %%mm3\n" " movq %%mm0, 32(%1)\n" " movq %%mm1, 40(%1)\n" " movq %%mm2, 48(%1)\n" " movq %%mm3, 56(%1)\n" : /* no out */ : "r" (from), "r" (to) : "memory"); from+=64; to+=64; } /* * Now do the tail of the block */ __memcpy(to, from, len&63); __asm__ __volatile__ ("fxrstor %0" : /* no out */ : "m"(*fpenv)); return p; } __attribute__((noinline)) void *_mmx_memcpy(void *to, const void *from, size_t len) { struct i387_fxsave_struct fxsave; char pad[15] __attribute__((unused)); struct i387_fxsave_struct *fpenv = (struct i387_fxsave_struct *) (((unsigned) &fxsave + 15) & ~15); void *p; int i; p = to; i = len >> 6; /* len/64 */ __asm__ __volatile__ ("fxsave %0; fnclex":"=m"(*fpenv)); for(; i>5; i--) { __asm__ __volatile__ ( " movq (%0), %%mm0\n" " movq 8(%0), %%mm1\n" " movq 16(%0), %%mm2\n" " movq 24(%0), %%mm3\n" " movq %%mm0, (%1)\n" " movq %%mm1, 8(%1)\n" " movq %%mm2, 16(%1)\n" " movq %%mm3, 24(%1)\n" " movq 32(%0), %%mm0\n" " movq 40(%0), %%mm1\n" " movq 48(%0), %%mm2\n" " movq 56(%0), %%mm3\n" " movq %%mm0, 32(%1)\n" " movq %%mm1, 40(%1)\n" " movq %%mm2, 48(%1)\n" " movq %%mm3, 56(%1)\n" : /* no out */ : "r" (from), "r" (to) : "memory"); from+=64; to+=64; } for(; i>0; i--) { __asm__ __volatile__ ( " movq (%0), %%mm0\n" " movq 8(%0), %%mm1\n" " movq 16(%0), %%mm2\n" " movq 24(%0), %%mm3\n" " movq %%mm0, (%1)\n" " movq %%mm1, 8(%1)\n" " movq %%mm2, 16(%1)\n" " movq %%mm3, 24(%1)\n" " movq 32(%0), %%mm0\n" " movq 40(%0), %%mm1\n" " movq 48(%0), %%mm2\n" " movq 56(%0), %%mm3\n" " movq %%mm0, 32(%1)\n" " movq %%mm1, 40(%1)\n" " movq %%mm2, 48(%1)\n" " movq %%mm3, 56(%1)\n" : /* no out */ : "r" (from), "r" (to) : "memory"); from+=64; to+=64; } /* * Now do the tail of the block */ __memcpy(to, from, len&63); __asm__ __volatile__ ("fxrstor %0" : /* no out */ : "m"(*fpenv)); return p; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] 2006-11-03 12:21 ` Jan Kiszka 2006-11-03 17:22 ` Jeff Webb @ 2006-11-04 14:30 ` Philippe Gerum 1 sibling, 0 replies; 11+ messages in thread From: Philippe Gerum @ 2006-11-04 14:30 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai help On Fri, 2006-11-03 at 13:21 +0100, Jan Kiszka wrote: [...] > This patch fixes the issue for me. > > Jan > plain text document attachment (disable-mmx_memcpy.patch) > --- > arch/i386/lib/mmx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6.17.13/arch/i386/lib/mmx.c > =================================================================== > --- linux-2.6.17.13.orig/arch/i386/lib/mmx.c > +++ linux-2.6.17.13/arch/i386/lib/mmx.c > @@ -32,7 +32,7 @@ void *_mmx_memcpy(void *to, const void * > void *p; > int i; > > - if (unlikely(in_interrupt())) > + if (unlikely(!ipipe_root_domain_p || in_interrupt())) > return __memcpy(to, from, len); > > p = to; Merged, thanks. -- Philippe. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-11-06 9:04 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-01 15:46 [Fwd: Re: [Xenomai-help] invalid use of FPU in Xenomai context] Jeff Webb 2006-11-01 19:42 ` Jan Kiszka 2006-11-01 20:21 ` Jeff Webb 2006-11-03 9:45 ` Jan Kiszka 2006-11-03 10:01 ` Gilles Chanteperdrix 2006-11-03 10:11 ` Jan Kiszka 2006-11-03 10:19 ` Gilles Chanteperdrix 2006-11-03 12:21 ` Jan Kiszka 2006-11-03 17:22 ` Jeff Webb 2006-11-06 9:04 ` Gilles Chanteperdrix 2006-11-04 14:30 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.