* 2.4.x vs 2.6.x: denormal handling and audio performance
@ 2004-08-09 19:19 Fernando Pablo Lopez-Lezcano
2004-08-10 1:00 ` Lee Revell
0 siblings, 1 reply; 5+ messages in thread
From: Fernando Pablo Lopez-Lezcano @ 2004-08-09 19:19 UTC (permalink / raw)
To: linux-kernel; +Cc: jackit-devel, Lee Revell
Hi all, I've been trying to track weird behavior I'm experiencing when
trying to use 2.6.x for "pro audio" applications and I think I have
something to report (and some questions).
First, the environment. I'm running the Jack low latency server on top
of two different software installs on the same hardware, one is FC1 +
2.4.26 + low latency and preemption patches, the other is FC2 + 2.6.7
rc2-mm2 + voluntary preemption O3. They are different hard disks swapped
into the same P4 laptop. Both are running the same source code versions
of all the audio programs that I use to test (but _not_ the same
binaries, each one is built in the environment it runs on).
One example app that illustrates the problem well is Freqtweak, a
frequency domain sound processing application that includes controlled
feedback of the frequency domain bins. If I connect a source to it and
feed samples and then disconnect it (ie: feed silence), and let the
processed samples inside Freqtweak slowly decay to zero, at some point
the processing load of the app goes up _drastically_. It eventually
gobbles up all the cpu if left alone. If I feed samples to it again, the
load immediately goes back to normal. This only happens on the 2.6.x
based environment. It does not happen on the 2.4.x case.
My guess is that the processing of denormals is different in my two
environments, in the 2.4.x case they are converted to zeros, in the
2.6.x case they stay as denormals. To check the load denormals create
Steve Harris wrote a small app and tested on several processors:
On Mon, 2004-08-09 at 03:03, Steve Harris wrote:
> Source code + binary can be found at:
> http://www.ecs.soton.ac.uk/~swh/denormal-finder/
>
> I tried on a few different machines:
> PIII running 2.6 (glibc 2.3.3)
> Pentium M (PIII derived) running 2.6 (glibc 2.3.3)
> Athlon XP running 2.4 (glibc 2.3.2)
> Xeon P4 running 2.4 (glibc 2.3.2)
> Opteron running 2.6 (glibc 2.3.2)
>
> In all cases the denormal values are the same, apparently regardless of
> whether SSE or 387 instructions are used:
> lower bound on normals = 1.17549e-38
> upper bound on denormals = 1.17549e-38
> lower bound on denormals = 7.00649e-46
> upper bound on zeros = 7.00649e-46
>
> tried with -march=i686 -msse and without, so I guess gcc doesn't disbale
> the denormal handling by default with SSE.
>
> what does vary is the time taken to process denormals relative to normals:
> PIII 38x
> AthlonXP 63x
> Opteron 71x (32bit binary)
> PM 78x (SSE / i387)
> PM 95x (SSE2)
> Opteron 104x (64bit binary)
> Xeon 191x
>
> So, this doesnt really shed any light on Fernando's problem, but its still
> interesting.
I added these two cases this morning:
On Mon, 2004-08-09 at 10:49, Fernando Pablo Lopez-Lezcano wrote:
> Athlon64 3000+, 2.6 (glibc 2.3.3, FC2): 27.5x
>
> P4 Mobile, 2.6 (glibc 2.3.3, FC2): 191x
> P4 Mobile, 2.4 (glibc 2.3.2, FC1): 315x
>
> So, there is a difference in the runtime configuration of the FPU. Both
> P4 Mobile examples are exactly the same hardware :-)
And here is what I think is happening:
> But the problem, I think, is not the time it takes to process denormals
> but whether those denormals get converted to zeros by the FPU or not.
>
> Apparently there is a "denormals-are-zero" flag in the MXCSR register
> (introduced in the later P4 and Xeon processors)[*]. My guess is that is
> being initialized differently. In 2.4.x/FC1 denormals get zeroed and
> don't generate extra cpu load, in 2.6.x/FC2 denormals stay denormals and
> are recirculated in algorithms that have feedback (and that's why the
> load stays high in my tests).
>
> As far as I can tell this is being set in arch/i386/kernel/i387.c and
> the code for 2.4 and 2.6 _is_ different... I also don't know if
> something else is changing that setting later.
>
> A good test would be to be able to set and reset this setting
> globally...
So, is there a way to do this? (toggle denormals-are-zero)
Is this setting indeed different on 2.4 and 2.6? (denormals-are-zero).
If the default setting is the same on both 2.4 and 2.6, where would this
be changed if the kernel is not responsible for the change in behavior
I observe?
-- Fernando
> [*] See this:
> http://gcc.gnu.org/ml/gcc/2001-07/msg02162.html
> http://lkml.org/lkml/2003/5/9/144
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.x vs 2.6.x: denormal handling and audio performance
2004-08-09 19:19 2.4.x vs 2.6.x: denormal handling and audio performance Fernando Pablo Lopez-Lezcano
@ 2004-08-10 1:00 ` Lee Revell
2004-08-10 1:09 ` Lee Revell
2004-08-10 2:35 ` Fernando Pablo Lopez-Lezcano
0 siblings, 2 replies; 5+ messages in thread
From: Lee Revell @ 2004-08-10 1:00 UTC (permalink / raw)
To: Fernando Pablo Lopez-Lezcano; +Cc: linux-kernel, jackit-devel
On Mon, 2004-08-09 at 15:19, Fernando Pablo Lopez-Lezcano wrote:
> Hi all, I've been trying to track weird behavior I'm experiencing when
> trying to use 2.6.x for "pro audio" applications and I think I have
> something to report (and some questions).
>
> First, the environment. I'm running the Jack low latency server on top
> of two different software installs on the same hardware, one is FC1 +
> 2.4.26 + low latency and preemption patches, the other is FC2 + 2.6.7
> rc2-mm2 + voluntary preemption O3. They are different hard disks swapped
> into the same P4 laptop. Both are running the same source code versions
> of all the audio programs that I use to test (but _not_ the same
> binaries, each one is built in the environment it runs on).
>
Have you tried using the exact same binaries under both 2.4 and 2.6?
This would rule out a compiler issue.
In case anyone thinks this is an application bug, here are some links
pertaining to the P4 denormals-are-zero issue, these were at the bottom
of Fernando's post:
http://gcc.gnu.org/ml/gcc/2001-07/msg02162.html
http://lkml.org/lkml/2003/5/9/144
Lee
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.x vs 2.6.x: denormal handling and audio performance
2004-08-10 1:00 ` Lee Revell
@ 2004-08-10 1:09 ` Lee Revell
2004-08-10 2:35 ` Fernando Pablo Lopez-Lezcano
1 sibling, 0 replies; 5+ messages in thread
From: Lee Revell @ 2004-08-10 1:09 UTC (permalink / raw)
To: Fernando Pablo Lopez-Lezcano; +Cc: linux-kernel, jackit-devel
On Mon, 2004-08-09 at 21:00, Lee Revell wrote:
> On Mon, 2004-08-09 at 15:19, Fernando Pablo Lopez-Lezcano wrote:
> > Hi all, I've been trying to track weird behavior I'm experiencing when
> > trying to use 2.6.x for "pro audio" applications
>
> In case anyone thinks this is an application bug, here are some links
> pertaining to the P4 denormals-are-zero issue
If the denormals-are-zero bit is the issue then I believe this patch
should fix the problem.
Lee
--- arch/i386/kernel/i387_orig.c 2004-08-09 21:06:00.000000000 -0400
+++ arch/i386/kernel/i387.c 2004-08-09 21:06:05.000000000 -0400
@@ -34,7 +34,7 @@
memset(¤t->thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (current->thread.i387.fxsave));
mask = current->thread.i387.fxsave.mxcsr_mask;
- if (mask == 0) mask = 0x0000ffbf;
+ if (mask == 0) mask = 0x0000ffff;
}
mxcsr_feature_mask &= mask;
stts();
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: 2.4.x vs 2.6.x: denormal handling and audio performance
2004-08-10 1:00 ` Lee Revell
2004-08-10 1:09 ` Lee Revell
@ 2004-08-10 2:35 ` Fernando Pablo Lopez-Lezcano
2004-08-10 5:28 ` Lee Revell
1 sibling, 1 reply; 5+ messages in thread
From: Fernando Pablo Lopez-Lezcano @ 2004-08-10 2:35 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel, jackit-devel
On Mon, 2004-08-09 at 18:00, Lee Revell wrote:
> On Mon, 2004-08-09 at 15:19, Fernando Pablo Lopez-Lezcano wrote:
> > Hi all, I've been trying to track weird behavior I'm experiencing when
> > trying to use 2.6.x for "pro audio" applications and I think I have
> > something to report (and some questions).
> >
> > First, the environment. I'm running the Jack low latency server on top
> > of two different software installs on the same hardware, one is FC1 +
> > 2.4.26 + low latency and preemption patches, the other is FC2 + 2.6.7
> > rc2-mm2 + voluntary preemption O3. They are different hard disks swapped
> > into the same P4 laptop. Both are running the same source code versions
> > of all the audio programs that I use to test (but _not_ the same
> > binaries, each one is built in the environment it runs on).
>
> Have you tried using the exact same binaries under both 2.4 and 2.6?
> This would rule out a compiler issue.
I finally managed to boot the 2.6 kernel on top of FC1, but I can't run
jack with realtime priority, somehow the kernel is not happy with
something in the system, some jack client applications just hang.
Anyway, running under 2.6 without realtime makes the numbers fluctuate a
lot more but I see the same effect. After a while freqtweak samples
decay to close to zero and it starts using a lot of cpu. Starting
hydrogen again (the drum machine feeding freqtweak in my tests)
immediately solves the problem.
[some time later]
_BUT_
[frantically trying to find a big brown bag to hide in... sigh...]
I retested with 2.4.x under FC1 yet again and I do see the same
effect... argh... [*]
> In case anyone thinks this is an application bug, here are some links
> pertaining to the P4 denormals-are-zero issue, these were at the bottom
> of Fernando's post:
>
> http://gcc.gnu.org/ml/gcc/2001-07/msg02162.html
> http://lkml.org/lkml/2003/5/9/144
So, this is good in a way. I've been trying to find some code snippet
that would enable me to change the DAZ flag in MXCSR, without luck so
far... I know this is probably OT but would anyone out there know how to
do this, or where to find useful information?
-- Fernando
[*] the test conditions in my two systems were not _exactly_ the same (I
should know by now that is bad), so that in the FC1 case the decay into
denormals was taking a lot longer. I have to learn to be more patient.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.x vs 2.6.x: denormal handling and audio performance
2004-08-10 2:35 ` Fernando Pablo Lopez-Lezcano
@ 2004-08-10 5:28 ` Lee Revell
0 siblings, 0 replies; 5+ messages in thread
From: Lee Revell @ 2004-08-10 5:28 UTC (permalink / raw)
To: Fernando Pablo Lopez-Lezcano; +Cc: linux-kernel, jackit-devel
On Mon, 2004-08-09 at 22:35, Fernando Pablo Lopez-Lezcano wrote:
> _BUT_
>
> [frantically trying to find a big brown bag to hide in... sigh...]
>
> I retested with 2.4.x under FC1 yet again and I do see the same
> effect... argh... [*]
>
Regardless, these numbers still are interesting. Would any kernel
developer care to explain them? It looks like a 50% difference in
performance from 2.4 to 2.6:
On Mon, 2004-08-09 at 10:49, Fernando Pablo Lopez-Lezcano wrote:
> Athlon64 3000+, 2.6 (glibc 2.3.3, FC2): 27.5x
>
> P4 Mobile, 2.6 (glibc 2.3.3, FC2): 191x
> P4 Mobile, 2.4 (glibc 2.3.2, FC1): 315x
Lee
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-08-10 5:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-09 19:19 2.4.x vs 2.6.x: denormal handling and audio performance Fernando Pablo Lopez-Lezcano
2004-08-10 1:00 ` Lee Revell
2004-08-10 1:09 ` Lee Revell
2004-08-10 2:35 ` Fernando Pablo Lopez-Lezcano
2004-08-10 5:28 ` Lee Revell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.