* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 12:11 ` [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more) Will Deacon
@ 2015-01-05 15:34 ` Pavel Machek
2015-01-05 16:24 ` Nicolas Pitre
2015-01-06 4:09 ` Nicolas Pitre
2 siblings, 0 replies; 9+ messages in thread
From: Pavel Machek @ 2015-01-05 15:34 UTC (permalink / raw)
To: linux-arm-kernel
On Mon 2015-01-05 12:11:36, Will Deacon wrote:
> Hi all,
>
> Sorry for the late reply, it seems that neither myself or the
> arm-linux-kernel list were on CC for this thread.
Sorry about that. I was pretty sure I cc-ed you, but apparently did
not.
> Pavel: do you have something we can run to observe the problem?
Debian 7.7: mpg123, anything pyaudio based, at least. (install
python-pyaudio, try to run examples).
I hit it with:
#!/usr/bin/env python
# Written by Yu-Jie Lin
# Public Domain
#
# Deps: PyAudio, NumPy, and Matplotlib
# Blog:
# http://blog.yjl.im/2012/11/frequency-spectrum-of-sound-using.html
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import pyaudio
import struct
import wave
...
When I decided to investigate.
> Finally, the revert doesn't have a Cc stable tag which is probably needed
> if it's going to land in 3.19 final, irrespective of whether we agree that
> it's the right way forward.
Yes, pushing is to stable is good idea. Will you take care?
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 12:11 ` [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more) Will Deacon
2015-01-05 15:34 ` Pavel Machek
@ 2015-01-05 16:24 ` Nicolas Pitre
2015-01-06 4:09 ` Nicolas Pitre
2 siblings, 0 replies; 9+ messages in thread
From: Nicolas Pitre @ 2015-01-05 16:24 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 5 Jan 2015, Will Deacon wrote:
> On Mon, Jan 05, 2015 at 04:51:31AM +0000, Nicolas Pitre wrote:
> > On Sun, 4 Jan 2015, Russell King - ARM Linux wrote:
> > > I encourage you *not* to back down like this. Linus is right in so far
> > > as the regressions issue, but he is *totally* wrong to do the revert,
> > > which IMHO has been done out of nothing more than spite.
> > >
> > > Either *with or without* the revert, the issue still remains, and needs
> > > to be addressed properly.
> > >
> > > With the revert in place, we now have insanely small bogomips values
> > > reported via /proc/cpuinfo when hardware timers are used. That needs
> > > fixing.
> >
> > Here's my take on it. Taking a step back, it was stupid to mix bogomips
> > with timer based delays.
>
> Well, bogomips is directly related to loops_per_jiffy so I don't think the
> mechanism is "stupid";
It is stupid to use loops_per_jiffy for timer based delay loops. The
timer based loop simply polls the timer until the desired time has
passed. Adding a loop count on top is completely artificial (may be
justified to avoid timer wraparounds) but bares no relationship with
loops_per_jiffy. Therefore determining loops_per_jiffy based on a timer
frequency is wrong.
> the issue is that userspace makes assumptions
> (bogus or otherwise) about the relation of bogomips to cpu frequency which
> have historically been good enough. More below...
Absolutely. And that's what my patch is all about: restoring that "good
enough" for user space (mis)purpose.
> > ----- >8
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Sun, 4 Jan 2015 22:28:58 -0500
> > Subject: [PATCH] ARM: disentangle timer based delays and bogomips calibration
> >
> > The bogomips value is a pseudo CPU speed value originally used to calibrate
> > loop-based small delays. It is also exported to user space through the
> > /proc filesystem and some user space apps started relying on it.
> >
> > Modern hardware can vary their CPU clock at run time making the bogomips
> > value less reliable for delay purposes. With the advent of high resolution
> > timers, small delays migrated to timer polling loops instead. Strangely
> > enough, the bogomips value calibration became timer based too, making it
> > way more bogus than it already was as a CPU speed representation and people
> > using it via /proc/cpuinfo started complaining.
>
> As you pointed out previously, these complaints were what prompted us to
> revisit the bogomips reporting. The class of application using the value
> was very much things like "How fast is my AwesomePhone-9000?":
>
> https://play.google.com/store/apps/details?id=com.securiteinfo.android.bogomips&hl=en_GB
> https://bugs.launchpad.net/ubuntu/+source/byobu/+bug/675442
>
> so actually, having *these* applications either exit early with an
> "unable to calculate CPU frequency" message or print something like "CPU
> freq: unknown" is arguably the right thing to do.
Don't you dare! Linus will shut you up. The sacred rule: "We don't
break user space, period" irrespective of the nefarious application
purpose for bogomips.
> What Pavel is now reporting is different; some useful piece of
> software has lost core functionality.
>
> Now, even with the loop-based bogomips values we have the following
> (user-visible) problems:
>
> (1) It's not portable between microarchitectures (for example, some
> CPUs can give you double the value if they predict the backwards
> branch in the calibration loop)
Who cares?
> (2) It's not reliable in the face of frequency scaling
loops_per_jiffy is already scaled accordingly. Sure it is racy but
that's what non timer based delay loop using platforms have to live with
already. For /proc/cpuinfo purposes that ought to be more than good
enough. The MHz on X86 that some applications use in place of the
bogomips when available has the same issue.
> (3) It's not reliable in the face of heterogeneous systems (big.LITTLE)
Actually, it is. With my patch I do get different values in
/proc/cpuinfo for the A15's and the A7's which is kind of expected.
> (4) The lpj calculation is susceptible to overflow, limiting the maximum
> delay value depending on the CPU performance
That's an orthogonal issue that can be fixed separately.
> Essentially, the value is only really useful on relatively low-clocked,
> in-order, uniprocessor systems (like the one where Pavel reported the bug).
Sure. Still on other systems it is some kind of ballpark figure that
prevents applications from breaking.
> > Since it was wrong for user space to rely on a "bogus" mips value to start
> > with, the initial responce from kernel people was to remove it. This broke
> > user space even more as some applications then refused to run altogether.
> > The bogomips export was therefore reinstated in commit 4bf9636c39 ("Revert
> > 'ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo'").
>
> Actually, our initial response was to report a dummy value iirc. I remember
> even making it selectable in kconfig, but it bordered on the absurd. It's
> worth noting that, with the current revert in place, the value reported
> is now basically selectable via the "clock-frequency" property on the
> arch_timer node for systems using the timer implementation.
Which is even more absurd, hence my patch.
> > Because the reported bogomips is orders of magnitude away from the
> > traditionally expected value for a given CPU when timer based delays are
> > in use, and because lumping bogomips and timer based delay loops is rather
> > senseless anyway, let's calibrate bogomips using a CPU loop all the time
> > even when timer based delays are available. Timer based delays don't
> > need any calibration and /proc/cpuinfo will provide somewhat sensible
> > values again.
> >
> > In practice, calls to __delay() will now always use the CPU based loop.
> > Things remain unchanged for udelay() and its derivatives.
>
> Given that we have a hard limit of 3355 bogomips in our calibration code,
> could we not just report that instead? We already have all of the issues I
> highlighted above and the systems that are going to be hit by this are the
> more recent (more performant) cores that will be approaching this maximum
> value anyway.
I suggested 1.00 before in this thread. I also asked if 10, 100 or 1000
were any better. Apparently none of them were. The least controvertial
value is certainly a runtime determined one. The hard limit is
a rather weak excuse that can be fixed.
> We also need something we can port to the arm64 compat layer, so a constant
> would be easier there too, doesn't require the calibration delay at boot
> and doesn't break __delay.
That's a weak excuse too.
> One thing we're missing from all of this is what happens if Pavel's testcase
> is executed on a system using a timer-backed delay? If the program chokes
> on the next line anyway, then we could consider only advertising the
> bogomips line when the loop-based delay is in use.
Won't fix the current user space issue on timer-based-delay systems so
this brings no good.
Nicolas
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 12:11 ` [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more) Will Deacon
2015-01-05 15:34 ` Pavel Machek
2015-01-05 16:24 ` Nicolas Pitre
@ 2015-01-06 4:09 ` Nicolas Pitre
2015-01-06 13:01 ` Arnd Bergmann
2 siblings, 1 reply; 9+ messages in thread
From: Nicolas Pitre @ 2015-01-06 4:09 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 5 Jan 2015, Will Deacon wrote:
> On Mon, Jan 05, 2015 at 04:51:31AM +0000, Nicolas Pitre wrote:
> > Because the reported bogomips is orders of magnitude away from the
> > traditionally expected value for a given CPU when timer based delays are
> > in use, and because lumping bogomips and timer based delay loops is rather
> > senseless anyway, let's calibrate bogomips using a CPU loop all the time
> > even when timer based delays are available. Timer based delays don't
> > need any calibration and /proc/cpuinfo will provide somewhat sensible
> > values again.
> >
> > In practice, calls to __delay() will now always use the CPU based loop.
> > Things remain unchanged for udelay() and its derivatives.
>
> Given that we have a hard limit of 3355 bogomips in our calibration code,
> could we not just report that instead? We already have all of the issues I
> highlighted above and the systems that are going to be hit by this are the
> more recent (more performant) cores that will be approaching this maximum
> value anyway.
I already replied to the other issues in my previous email.
Now here's the bogomips hard limit gone:
----- >8
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Mon, 5 Jan 2015 22:43:56 -0500
Subject: [PATCH] ARM: loop_udelay: remove bogomips value limitation
Now that we don't support ARMv3 anymore, the loop based delay code can
convert microsecs into number of loops using a 64-bit multiplication.
This allows us to lift the hard limit of 3355 on the bogomips value with
loops_per_jiffy that can safely span the full 32-bit range.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7feeb163f5..287c96ad93 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -10,8 +10,8 @@
#include <asm/param.h> /* HZ */
#define MAX_UDELAY_MS 2
-#define UDELAY_MULT ((UL(2199023) * HZ) >> 11)
-#define UDELAY_SHIFT 30
+#define UDELAY_MULT UL(2047 * HZ + 483648 * HZ / 1000000)
+#define UDELAY_SHIFT 31
#ifndef __ASSEMBLY__
@@ -33,7 +33,7 @@ extern struct arm_delay_ops {
* it, it means that you're calling udelay() with an out of range value.
*
* With currently imposed limits, this means that we support a max delay
- * of 2000us. Further limits: HZ<=1000 and bogomips<=3355
+ * of 2000us. Further limits: HZ<=1000
*/
extern void __bad_udelay(void);
diff --git a/arch/arm/lib/delay-loop.S b/arch/arm/lib/delay-loop.S
index 518bf6e93f..bcc23c5760 100644
--- a/arch/arm/lib/delay-loop.S
+++ b/arch/arm/lib/delay-loop.S
@@ -17,7 +17,6 @@
/*
* r0 <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
* HZ <= 1000
*/
@@ -25,16 +24,11 @@ ENTRY(__loop_udelay)
ldr r2, .LC1
mul r0, r2, r0
ENTRY(__loop_const_udelay) @ 0 <= r0 <= 0x7fffff06
- mov r1, #-1
ldr r2, .LC0
- ldr r2, [r2] @ max = 0x01ffffff
- add r0, r0, r1, lsr #32-14
- mov r0, r0, lsr #14 @ max = 0x0001ffff
- add r2, r2, r1, lsr #32-10
- mov r2, r2, lsr #10 @ max = 0x00007fff
- mul r0, r2, r0 @ max = 2^32-1
- add r0, r0, r1, lsr #32-6
- movs r0, r0, lsr #6
+ ldr r2, [r2]
+ umull r1, r0, r2, r0
+ adds r1, r1, #0xffffffff
+ adcs r0, r0, r0
reteq lr
/*
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-06 4:09 ` Nicolas Pitre
@ 2015-01-06 13:01 ` Arnd Bergmann
2015-01-06 20:50 ` Nicolas Pitre
0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2015-01-06 13:01 UTC (permalink / raw)
To: linux-arm-kernel
On Monday 05 January 2015 23:09:00 Nicolas Pitre wrote:
>
> Now here's the bogomips hard limit gone:
>
> ----- >8
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Mon, 5 Jan 2015 22:43:56 -0500
> Subject: [PATCH] ARM: loop_udelay: remove bogomips value limitation
>
> Now that we don't support ARMv3 anymore, the loop based delay code can
> convert microsecs into number of loops using a 64-bit multiplication.
> This allows us to lift the hard limit of 3355 on the bogomips value with
> loops_per_jiffy that can safely span the full 32-bit range.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
I think we still build RPC with gcc -march=armv3. Is that a problem
with this patch?
Arnd
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-06 13:01 ` Arnd Bergmann
@ 2015-01-06 20:50 ` Nicolas Pitre
2015-01-06 21:27 ` Nicolas Pitre
0 siblings, 1 reply; 9+ messages in thread
From: Nicolas Pitre @ 2015-01-06 20:50 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 6 Jan 2015, Arnd Bergmann wrote:
> On Monday 05 January 2015 23:09:00 Nicolas Pitre wrote:
> >
> > Now here's the bogomips hard limit gone:
> >
> > ----- >8
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Mon, 5 Jan 2015 22:43:56 -0500
> > Subject: [PATCH] ARM: loop_udelay: remove bogomips value limitation
> >
> > Now that we don't support ARMv3 anymore, the loop based delay code can
> > convert microsecs into number of loops using a 64-bit multiplication.
> > This allows us to lift the hard limit of 3355 on the bogomips value with
> > loops_per_jiffy that can safely span the full 32-bit range.
> >
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
>
> I think we still build RPC with gcc -march=armv3. Is that a problem
> with this patch?
It is.
Tangential question: does anyone still own a working RPC?
For sure we no longer support the RPC unless it is fitted with a SA110.
And IIRC the reason why -march=armv3 is used in that case has to do with
the RPC memory bus not able to accommodate the SA110's LDRH/STRH
instructions. However it should be able to execute UMULL regardless.
Now I could add ".arch armv7-a" in the file to make it compile for RPC.
I can't just make it ".arch armv4" as this prevents Thumb2 compilation
for that file.
Other ideas?
Nicolas
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-06 20:50 ` Nicolas Pitre
@ 2015-01-06 21:27 ` Nicolas Pitre
2015-01-06 21:37 ` Arnd Bergmann
0 siblings, 1 reply; 9+ messages in thread
From: Nicolas Pitre @ 2015-01-06 21:27 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 6 Jan 2015, Nicolas Pitre wrote:
> On Tue, 6 Jan 2015, Arnd Bergmann wrote:
>
> > I think we still build RPC with gcc -march=armv3. Is that a problem
> > with this patch?
>
> It is.
>
> Tangential question: does anyone still own a working RPC?
>
> For sure we no longer support the RPC unless it is fitted with a SA110.
> And IIRC the reason why -march=armv3 is used in that case has to do with
> the RPC memory bus not able to accommodate the SA110's LDRH/STRH
> instructions. However it should be able to execute UMULL regardless.
>
> Now I could add ".arch armv7-a" in the file to make it compile for RPC.
> I can't just make it ".arch armv4" as this prevents Thumb2 compilation
> for that file.
>
> Other ideas?
I settled on the following:
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 0573faab96..0eb8de1c6f 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -40,7 +40,10 @@ else
lib-y += io-readsw-armv4.o io-writesw-armv4.o
endif
-lib-$(CONFIG_ARCH_RPC) += ecard.o io-acorn.o floppydma.o
+ifeq ($(CONFIG_ARCH_RPC),y)
+ lib-y += ecard.o io-acorn.o floppydma.o
+ AFLAGS_delay-loop.o += -march=armv4
+endif
$(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
$(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
That'll make things easier to sort out in the context of
$tangential_question.
Nicolas
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-06 21:27 ` Nicolas Pitre
@ 2015-01-06 21:37 ` Arnd Bergmann
0 siblings, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2015-01-06 21:37 UTC (permalink / raw)
To: linux-arm-kernel
On Tuesday 06 January 2015 16:27:05 Nicolas Pitre wrote:
> Tangential question: does anyone still own a working RPC?
I'm pretty sure that Russell still has one. As far as I know there
are quite a lot of RPC in working condition, but most of them are
running RiscOS.
> -lib-$(CONFIG_ARCH_RPC) += ecard.o io-acorn.o floppydma.o
> +ifeq ($(CONFIG_ARCH_RPC),y)
> + lib-y += ecard.o io-acorn.o floppydma.o
> + AFLAGS_delay-loop.o += -march=armv4
> +endif
Looks good to me.
Arnd
^ permalink raw reply [flat|nested] 9+ messages in thread