* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 4:51 ` Nicolas Pitre
@ 2015-01-05 12:11 ` Will Deacon
2015-01-05 15:34 ` Pavel Machek
2015-01-05 16:24 ` Nicolas Pitre
2015-01-07 18:11 ` Catalin Marinas
2015-01-08 22:49 ` Pavel Machek
2 siblings, 2 replies; 66+ messages in thread
From: Will Deacon @ 2015-01-05 12:11 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Russell King - ARM Linux, Linus Torvalds, Pavel Machek,
Marc Zyngier, kernel list, linux-arm-kernel
Hi all,
Sorry for the late reply, it seems that neither myself or the
arm-linux-kernel list were on CC for this thread.
On Mon, Jan 05, 2015 at 04:51:31AM +0000, Nicolas Pitre wrote:
> On Sun, 4 Jan 2015, Russell King - ARM Linux wrote:
> > I encourage you *not* to back down like this. Linus is right in so far
> > as the regressions issue, but he is *totally* wrong to do the revert,
> > which IMHO has been done out of nothing more than spite.
> >
> > Either *with or without* the revert, the issue still remains, and needs
> > to be addressed properly.
> >
> > With the revert in place, we now have insanely small bogomips values
> > reported via /proc/cpuinfo when hardware timers are used. That needs
> > fixing.
>
> Here's my take on it. Taking a step back, it was stupid to mix bogomips
> with timer based delays.
Well, bogomips is directly related to loops_per_jiffy so I don't think the
mechanism is "stupid"; the issue is that userspace makes assumptions
(bogus or otherwise) about the relation of bogomips to cpu frequency which
have historically been good enough. More below...
> ----- >8
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Sun, 4 Jan 2015 22:28:58 -0500
> Subject: [PATCH] ARM: disentangle timer based delays and bogomips calibration
>
> The bogomips value is a pseudo CPU speed value originally used to calibrate
> loop-based small delays. It is also exported to user space through the
> /proc filesystem and some user space apps started relying on it.
>
> Modern hardware can vary their CPU clock at run time making the bogomips
> value less reliable for delay purposes. With the advent of high resolution
> timers, small delays migrated to timer polling loops instead. Strangely
> enough, the bogomips value calibration became timer based too, making it
> way more bogus than it already was as a CPU speed representation and people
> using it via /proc/cpuinfo started complaining.
As you pointed out previously, these complaints were what prompted us to
revisit the bogomips reporting. The class of application using the value
was very much things like "How fast is my AwesomePhone-9000?":
https://play.google.com/store/apps/details?id=com.securiteinfo.android.bogomips&hl=en_GB
https://bugs.launchpad.net/ubuntu/+source/byobu/+bug/675442
so actually, having *these* applications either exit early with an
"unable to calculate CPU frequency" message or print something like "CPU
freq: unknown" is arguably the right thing to do. What Pavel is now
reporting is different; some useful piece of software has lost core
functionality.
Now, even with the loop-based bogomips values we have the following
(user-visible) problems:
(1) It's not portable between microarchitectures (for example, some
CPUs can give you double the value if they predict the backwards
branch in the calibration loop)
(2) It's not reliable in the face of frequency scaling
(3) It's not reliable in the face of heterogeneous systems (big.LITTLE)
(4) The lpj calculation is susceptible to overflow, limiting the maximum
delay value depending on the CPU performance
Essentially, the value is only really useful on relatively low-clocked,
in-order, uniprocessor systems (like the one where Pavel reported the bug).
> Since it was wrong for user space to rely on a "bogus" mips value to start
> with, the initial responce from kernel people was to remove it. This broke
> user space even more as some applications then refused to run altogether.
> The bogomips export was therefore reinstated in commit 4bf9636c39 ("Revert
> 'ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo'").
Actually, our initial response was to report a dummy value iirc. I remember
even making it selectable in kconfig, but it bordered on the absurd. It's
worth noting that, with the current revert in place, the value reported
is now basically selectable via the "clock-frequency" property on the
arch_timer node for systems using the timer implementation.
> Because the reported bogomips is orders of magnitude away from the
> traditionally expected value for a given CPU when timer based delays are
> in use, and because lumping bogomips and timer based delay loops is rather
> senseless anyway, let's calibrate bogomips using a CPU loop all the time
> even when timer based delays are available. Timer based delays don't
> need any calibration and /proc/cpuinfo will provide somewhat sensible
> values again.
>
> In practice, calls to __delay() will now always use the CPU based loop.
> Things remain unchanged for udelay() and its derivatives.
Given that we have a hard limit of 3355 bogomips in our calibration code,
could we not just report that instead? We already have all of the issues I
highlighted above and the systems that are going to be hit by this are the
more recent (more performant) cores that will be approaching this maximum
value anyway.
We also need something we can port to the arm64 compat layer, so a constant
would be easier there too, doesn't require the calibration delay at boot
and doesn't break __delay.
One thing we're missing from all of this is what happens if Pavel's testcase
is executed on a system using a timer-backed delay? If the program chokes
on the next line anyway, then we could consider only advertising the
bogomips line when the loop-based delay is in use.
Pavel: do you have something we can run to observe the problem?
Finally, the revert doesn't have a Cc stable tag which is probably needed
if it's going to land in 3.19 final, irrespective of whether we agree that
it's the right way forward.
Will
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 12:11 ` Will Deacon
@ 2015-01-05 15:34 ` Pavel Machek
2015-01-05 16:24 ` Nicolas Pitre
1 sibling, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2015-01-05 15:34 UTC (permalink / raw)
To: Will Deacon
Cc: Nicolas Pitre, Russell King - ARM Linux, Linus Torvalds,
Marc Zyngier, kernel list, linux-arm-kernel
On Mon 2015-01-05 12:11:36, Will Deacon wrote:
> Hi all,
>
> Sorry for the late reply, it seems that neither myself or the
> arm-linux-kernel list were on CC for this thread.
Sorry about that. I was pretty sure I cc-ed you, but apparently did
not.
> Pavel: do you have something we can run to observe the problem?
Debian 7.7: mpg123, anything pyaudio based, at least. (install
python-pyaudio, try to run examples).
I hit it with:
#!/usr/bin/env python
# Written by Yu-Jie Lin
# Public Domain
#
# Deps: PyAudio, NumPy, and Matplotlib
# Blog:
# http://blog.yjl.im/2012/11/frequency-spectrum-of-sound-using.html
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import pyaudio
import struct
import wave
...
When I decided to investigate.
> Finally, the revert doesn't have a Cc stable tag which is probably needed
> if it's going to land in 3.19 final, irrespective of whether we agree that
> it's the right way forward.
Yes, pushing is to stable is good idea. Will you take care?
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 12:11 ` Will Deacon
2015-01-05 15:34 ` Pavel Machek
@ 2015-01-05 16:24 ` Nicolas Pitre
1 sibling, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-05 16:24 UTC (permalink / raw)
To: Will Deacon
Cc: Russell King - ARM Linux, Linus Torvalds, Pavel Machek,
Marc Zyngier, kernel list, linux-arm-kernel
On Mon, 5 Jan 2015, Will Deacon wrote:
> On Mon, Jan 05, 2015 at 04:51:31AM +0000, Nicolas Pitre wrote:
> > On Sun, 4 Jan 2015, Russell King - ARM Linux wrote:
> > > I encourage you *not* to back down like this. Linus is right in so far
> > > as the regressions issue, but he is *totally* wrong to do the revert,
> > > which IMHO has been done out of nothing more than spite.
> > >
> > > Either *with or without* the revert, the issue still remains, and needs
> > > to be addressed properly.
> > >
> > > With the revert in place, we now have insanely small bogomips values
> > > reported via /proc/cpuinfo when hardware timers are used. That needs
> > > fixing.
> >
> > Here's my take on it. Taking a step back, it was stupid to mix bogomips
> > with timer based delays.
>
> Well, bogomips is directly related to loops_per_jiffy so I don't think the
> mechanism is "stupid";
It is stupid to use loops_per_jiffy for timer based delay loops. The
timer based loop simply polls the timer until the desired time has
passed. Adding a loop count on top is completely artificial (may be
justified to avoid timer wraparounds) but bares no relationship with
loops_per_jiffy. Therefore determining loops_per_jiffy based on a timer
frequency is wrong.
> the issue is that userspace makes assumptions
> (bogus or otherwise) about the relation of bogomips to cpu frequency which
> have historically been good enough. More below...
Absolutely. And that's what my patch is all about: restoring that "good
enough" for user space (mis)purpose.
> > ----- >8
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Sun, 4 Jan 2015 22:28:58 -0500
> > Subject: [PATCH] ARM: disentangle timer based delays and bogomips calibration
> >
> > The bogomips value is a pseudo CPU speed value originally used to calibrate
> > loop-based small delays. It is also exported to user space through the
> > /proc filesystem and some user space apps started relying on it.
> >
> > Modern hardware can vary their CPU clock at run time making the bogomips
> > value less reliable for delay purposes. With the advent of high resolution
> > timers, small delays migrated to timer polling loops instead. Strangely
> > enough, the bogomips value calibration became timer based too, making it
> > way more bogus than it already was as a CPU speed representation and people
> > using it via /proc/cpuinfo started complaining.
>
> As you pointed out previously, these complaints were what prompted us to
> revisit the bogomips reporting. The class of application using the value
> was very much things like "How fast is my AwesomePhone-9000?":
>
> https://play.google.com/store/apps/details?id=com.securiteinfo.android.bogomips&hl=en_GB
> https://bugs.launchpad.net/ubuntu/+source/byobu/+bug/675442
>
> so actually, having *these* applications either exit early with an
> "unable to calculate CPU frequency" message or print something like "CPU
> freq: unknown" is arguably the right thing to do.
Don't you dare! Linus will shut you up. The sacred rule: "We don't
break user space, period" irrespective of the nefarious application
purpose for bogomips.
> What Pavel is now reporting is different; some useful piece of
> software has lost core functionality.
>
> Now, even with the loop-based bogomips values we have the following
> (user-visible) problems:
>
> (1) It's not portable between microarchitectures (for example, some
> CPUs can give you double the value if they predict the backwards
> branch in the calibration loop)
Who cares?
> (2) It's not reliable in the face of frequency scaling
loops_per_jiffy is already scaled accordingly. Sure it is racy but
that's what non timer based delay loop using platforms have to live with
already. For /proc/cpuinfo purposes that ought to be more than good
enough. The MHz on X86 that some applications use in place of the
bogomips when available has the same issue.
> (3) It's not reliable in the face of heterogeneous systems (big.LITTLE)
Actually, it is. With my patch I do get different values in
/proc/cpuinfo for the A15's and the A7's which is kind of expected.
> (4) The lpj calculation is susceptible to overflow, limiting the maximum
> delay value depending on the CPU performance
That's an orthogonal issue that can be fixed separately.
> Essentially, the value is only really useful on relatively low-clocked,
> in-order, uniprocessor systems (like the one where Pavel reported the bug).
Sure. Still on other systems it is some kind of ballpark figure that
prevents applications from breaking.
> > Since it was wrong for user space to rely on a "bogus" mips value to start
> > with, the initial responce from kernel people was to remove it. This broke
> > user space even more as some applications then refused to run altogether.
> > The bogomips export was therefore reinstated in commit 4bf9636c39 ("Revert
> > 'ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo'").
>
> Actually, our initial response was to report a dummy value iirc. I remember
> even making it selectable in kconfig, but it bordered on the absurd. It's
> worth noting that, with the current revert in place, the value reported
> is now basically selectable via the "clock-frequency" property on the
> arch_timer node for systems using the timer implementation.
Which is even more absurd, hence my patch.
> > Because the reported bogomips is orders of magnitude away from the
> > traditionally expected value for a given CPU when timer based delays are
> > in use, and because lumping bogomips and timer based delay loops is rather
> > senseless anyway, let's calibrate bogomips using a CPU loop all the time
> > even when timer based delays are available. Timer based delays don't
> > need any calibration and /proc/cpuinfo will provide somewhat sensible
> > values again.
> >
> > In practice, calls to __delay() will now always use the CPU based loop.
> > Things remain unchanged for udelay() and its derivatives.
>
> Given that we have a hard limit of 3355 bogomips in our calibration code,
> could we not just report that instead? We already have all of the issues I
> highlighted above and the systems that are going to be hit by this are the
> more recent (more performant) cores that will be approaching this maximum
> value anyway.
I suggested 1.00 before in this thread. I also asked if 10, 100 or 1000
were any better. Apparently none of them were. The least controvertial
value is certainly a runtime determined one. The hard limit is
a rather weak excuse that can be fixed.
> We also need something we can port to the arm64 compat layer, so a constant
> would be easier there too, doesn't require the calibration delay at boot
> and doesn't break __delay.
That's a weak excuse too.
> One thing we're missing from all of this is what happens if Pavel's testcase
> is executed on a system using a timer-backed delay? If the program chokes
> on the next line anyway, then we could consider only advertising the
> bogomips line when the loop-based delay is in use.
Won't fix the current user space issue on timer-based-delay systems so
this brings no good.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 4:51 ` Nicolas Pitre
2015-01-05 12:11 ` Will Deacon
@ 2015-01-07 18:11 ` Catalin Marinas
2015-01-07 18:47 ` Linus Torvalds
2015-01-07 18:50 ` Nicolas Pitre
2015-01-08 22:49 ` Pavel Machek
2 siblings, 2 replies; 66+ messages in thread
From: Catalin Marinas @ 2015-01-07 18:11 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Russell King - ARM Linux, Linus Torvalds, Pavel Machek,
Marc Zyngier, kernel list
On 5 January 2015 at 04:51, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Sun, 4 Jan 2015 22:28:58 -0500
> Subject: [PATCH] ARM: disentangle timer based delays and bogomips calibration
>
> The bogomips value is a pseudo CPU speed value originally used to calibrate
> loop-based small delays. It is also exported to user space through the
> /proc filesystem and some user space apps started relying on it.
>
> Modern hardware can vary their CPU clock at run time making the bogomips
> value less reliable for delay purposes. With the advent of high resolution
> timers, small delays migrated to timer polling loops instead. Strangely
> enough, the bogomips value calibration became timer based too, making it
> way more bogus than it already was as a CPU speed representation and people
> using it via /proc/cpuinfo started complaining.
>
> Since it was wrong for user space to rely on a "bogus" mips value to start
> with, the initial responce from kernel people was to remove it. This broke
> user space even more as some applications then refused to run altogether.
> The bogomips export was therefore reinstated in commit 4bf9636c39 ("Revert
> 'ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo'").
>
> Because the reported bogomips is orders of magnitude away from the
> traditionally expected value for a given CPU when timer based delays are
> in use, and because lumping bogomips and timer based delay loops is rather
> senseless anyway, let's calibrate bogomips using a CPU loop all the time
> even when timer based delays are available. Timer based delays don't
> need any calibration and /proc/cpuinfo will provide somewhat sensible
> values again.
>
> In practice, calls to __delay() will now always use the CPU based loop.
> Things remain unchanged for udelay() and its derivatives.
I think this makes sense since __delay() expects the number of loops
as argument rather than a duration scaled by some factor (based on the
generic timer frequency).
FWIW:
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Minor comment below:
> unsigned long calibrate_delay_is_known(void)
> {
> delay_calibrated = true;
> - return lpj_fine;
> +
> + /* calibrate bogomips even when timer based delays are used */
> + return 0;
> }
Do we need to remove delay_calibrated = true as well? We do it further
down again in calibration_delay_done() .
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 18:11 ` Catalin Marinas
@ 2015-01-07 18:47 ` Linus Torvalds
2015-01-07 19:00 ` Nicolas Pitre
2015-01-07 18:50 ` Nicolas Pitre
1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2015-01-07 18:47 UTC (permalink / raw)
To: Catalin Marinas
Cc: Nicolas Pitre, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, Jan 7, 2015 at 10:11 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> I think this makes sense since __delay() expects the number of loops
> as argument rather than a duration scaled by some factor (based on the
> generic timer frequency).
Gaah. You guys make no sense at all.
No, __delay() does not expect "number of loops".
It doesn't do that on x86, and it doesn't even do it on arm.
It *literally* does exactly the reverse of what you say it does.
__delay() very much expects a "duration scaled by some factor". The
factor being "loops_per_jiffy" (ok, so it's a "reverse factor", but
you get the idea).
And this is very much exactly why bogomips is that "2 *
loops_per_jiffy * HZ / 1000000".
Let's break it down:
- "loops_per_jiffy" is just the scale factor for __udelay(). Ignore
the "loops" part of the name, since it's purely historical. It comes
from the original "decrement and jump" implementation, but that hasn't
been true on x86 for over 15 years.
- the "2x" factor is also purely historical, and comes from the same
"decrement and jump" thing - it used to be two instructions. But
again, it hasn't been true for a *looong* time, but it is part of the
bogosity of bogomips.
- the "*HZ" is the "per jiffy" part.
- the "1000000" is the "MHz" part. The clock had better tick at a
megahertz or more, since the whole point of this is to do "udelay()".
Really. The original commit that removed bogomips from ARM was
*wrong*. Because that
per_cpu(cpu_data, i).loops_per_jiffy / (500000UL/HZ
calculation is in fact the very DEFINITION of bogomips (it's just
written in a way to not overflow).
Seriously.
The reason I reverted that commit is because it was crap. When Pavel
made the report, I looked at the code, and immediately reverted it.
For good reason. Exactly because that original patch was WRONG.
The old bogomips computation in arch/arm/kernel/setup.c is exactly the
right thing to do.
And every time somebody thinks it has to be about "loops" or about
"cpu frequency", I can only say "STOP IT". It hasn't been about loops
or CPU frequency for ages. I don't know when we switched it to "wait
for the timer", because it predates my history that starts in early
2002. So it's at *least* 13 years old.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 18:47 ` Linus Torvalds
@ 2015-01-07 19:00 ` Nicolas Pitre
2015-01-07 19:36 ` Linus Torvalds
0 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-07 19:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, 7 Jan 2015, Linus Torvalds wrote:
> On Wed, Jan 7, 2015 at 10:11 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> >
> > I think this makes sense since __delay() expects the number of loops
> > as argument rather than a duration scaled by some factor (based on the
> > generic timer frequency).
>
> Gaah. You guys make no sense at all.
>
> No, __delay() does not expect "number of loops".
>
> It doesn't do that on x86, and it doesn't even do it on arm.
>
> It *literally* does exactly the reverse of what you say it does.
>
> __delay() very much expects a "duration scaled by some factor". The
> factor being "loops_per_jiffy" (ok, so it's a "reverse factor", but
> you get the idea).
>
> And this is very much exactly why bogomips is that "2 *
> loops_per_jiffy * HZ / 1000000".
I think we're all saying more or less the same thing.
The bogomips *reporting* is back on ARM so user space won't break.
We'll make sure it is scaled properly so not to have orders of magnitude
discrepancy whether the timer based or the CPU based loop is used for
the purpose of making people feel good.
Any disagreement?
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 19:00 ` Nicolas Pitre
@ 2015-01-07 19:36 ` Linus Torvalds
2015-01-07 20:34 ` Nicolas Pitre
2015-01-07 22:24 ` Catalin Marinas
0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2015-01-07 19:36 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, Jan 7, 2015 at 11:00 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>
> We'll make sure it is scaled properly so not to have orders of magnitude
> discrepancy whether the timer based or the CPU based loop is used for
> the purpose of making people feel good.
Why?
You'd basically be lying. And it might actually hide real problems.
If the scaling hides the fact that the timer source cannot do a good
job at microsecond resolution delays, then it's not just lying, it's
lying in ways that hide real issues. So why should that kind of
behavior be encouraged? The actual *real* unscaled resolution of the
timer is valid and real information.
Random scaling like that would be *bad*, in other words.
This whole thread has wasted more time than the whole original
argument for wasted time ever was. I still have no idea what the
original argument was, and why you guys want some made-up and
incorrect feel-good number rather than just document the fact that the
bogomips is simple dependent on the clocksource you use for delays.
That kind of documentation wouldn't be lying, wouldn't be misleading,
and wouldn't waste peoples time.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 19:36 ` Linus Torvalds
@ 2015-01-07 20:34 ` Nicolas Pitre
2015-01-07 20:53 ` Russell King - ARM Linux
[not found] ` <CA+55aFwuO2g1S-bY96V28crMWj+dKXWANzbP28JQjBdTg0rV0w@mail.gmail.com>
2015-01-07 22:24 ` Catalin Marinas
1 sibling, 2 replies; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-07 20:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, 7 Jan 2015, Linus Torvalds wrote:
> On Wed, Jan 7, 2015 at 11:00 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >
> > We'll make sure it is scaled properly so not to have orders of magnitude
> > discrepancy whether the timer based or the CPU based loop is used for
> > the purpose of making people feel good.
>
> Why?
>
> You'd basically be lying. And it might actually hide real problems.
> If the scaling hides the fact that the timer source cannot do a good
> job at microsecond resolution delays, then it's not just lying, it's
> lying in ways that hide real issues. So why should that kind of
> behavior be encouraged? The actual *real* unscaled resolution of the
> timer is valid and real information.
I think you are missing something fundamental in this thread.
On ARM, when the timer is used to provide small delays, it is typically
the ARM architected timer which by definition must have a constant input
clock in the MHz range. This timer clock has *nothing* to do with
whatever CPU clock you might be using. On the system I have here, the
CPU clock is 2GHz and the timer used for delays is 24MHz. If the CPU
clock is scaled down to 180MHz the timer clock remains at 24MHz.
The implementation of udelay() in this case is basically doing:
void udelay(unsigned long usecs)
{
unsigned long timer_ticks = usecs * (24000000 / 1000000);
unsigned long start = read_timer_count();
while (read_timer_count() - start < timer_ticks);
}
Some other systems might as well have a completely different timer clock
based on what its hardware designers thought was best, or based on what
they smoked the night before. There is no calibrating of the timer
input clock: it is just set and the timer clock rate for a given
hardware implementation is advertised by the firmware. No calibration
is necessary. No calibration would even be possible if that's the only
time source on the system.
Now tell me what this means for bogomips? Nothing. Absolutely nothing.
Deriving a bogomips number from a 24MHz timer clock that bares no
relationship with the CPU clock is completely useless. We might as well
hardcode a constant 1.00 and be done with it. Or hash the machine name
and add the RTC time and report that. At least the later would have
some entertainment value.
What I'm suggesting is to leave the timer based udelay() alone as it
doesn't need any loops_per_jiffy or whatnot to operate. Then, for the
semi-entertaining value of having *something* displayed alongside
"bogomips" in /proc/cpuinfo I'm suggesting to simply calibrate
loops_per_jiffy the traditional way in all cases, whether or not a
timer based udelay is in use.
Why you might have having a problem with that is beyond my
understanding.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 20:34 ` Nicolas Pitre
@ 2015-01-07 20:53 ` Russell King - ARM Linux
2015-01-07 21:15 ` Nicolas Pitre
2015-01-07 22:14 ` Catalin Marinas
[not found] ` <CA+55aFwuO2g1S-bY96V28crMWj+dKXWANzbP28JQjBdTg0rV0w@mail.gmail.com>
1 sibling, 2 replies; 66+ messages in thread
From: Russell King - ARM Linux @ 2015-01-07 20:53 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Linus Torvalds, Catalin Marinas, Pavel Machek, Marc Zyngier,
kernel list
On Wed, Jan 07, 2015 at 03:34:42PM -0500, Nicolas Pitre wrote:
> On Wed, 7 Jan 2015, Linus Torvalds wrote:
>
> > On Wed, Jan 7, 2015 at 11:00 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > >
> > > We'll make sure it is scaled properly so not to have orders of magnitude
> > > discrepancy whether the timer based or the CPU based loop is used for
> > > the purpose of making people feel good.
> >
> > Why?
> >
> > You'd basically be lying. And it might actually hide real problems.
> > If the scaling hides the fact that the timer source cannot do a good
> > job at microsecond resolution delays, then it's not just lying, it's
> > lying in ways that hide real issues. So why should that kind of
> > behavior be encouraged? The actual *real* unscaled resolution of the
> > timer is valid and real information.
>
> I think you are missing something fundamental in this thread.
I think what Linus is trying to tell us is that:
1. Where the kernel uses a software loop for implementing delays,
the kernel bogomips gives us a calibration of that loop.
2. Where the kernel uses a hardware timer for implementing delays,
the kernel bogomips gives us a calibration of that hardware timer.
And it doesn't matter whether or not that timer has anything to do with
the raw CPU speed.
In other words, bogomips is a statement about the accuracy of the
internal kernel mechanism being used for delays, nothing more, nothing
less.
Now, if I understand Linus correctly, what irks him is when someone
upgrades a kernel on a platform, and some userland breaks. That's
something which I've said multiple times I don't have a problem
agreeing with, and I suspect no one in this thread would disagree
that this is a serious failing, and one which needs fixing ASAP.
However, if running userland on platform A works, and but it doesn't
work on platform B. The breakage may well be due to platform A reporting
300 bogomips because it's using the kernel software loop, and platform
B reporting 6 bogomips because its using a hardware timer, but the CPU
is actually faster. However, this is not a kernel problem, and it
certainly is not a regression. It's a userspace bug which needs
userspace to fix.
Does that make the difference clear?
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 20:53 ` Russell King - ARM Linux
@ 2015-01-07 21:15 ` Nicolas Pitre
2015-01-09 22:54 ` Steven Rostedt
2015-01-07 22:14 ` Catalin Marinas
1 sibling, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-07 21:15 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Linus Torvalds, Catalin Marinas, Pavel Machek, Marc Zyngier,
kernel list
On Wed, 7 Jan 2015, Russell King - ARM Linux wrote:
> On Wed, Jan 07, 2015 at 03:34:42PM -0500, Nicolas Pitre wrote:
> > On Wed, 7 Jan 2015, Linus Torvalds wrote:
> >
> > > On Wed, Jan 7, 2015 at 11:00 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > >
> > > > We'll make sure it is scaled properly so not to have orders of magnitude
> > > > discrepancy whether the timer based or the CPU based loop is used for
> > > > the purpose of making people feel good.
> > >
> > > Why?
> > >
> > > You'd basically be lying. And it might actually hide real problems.
> > > If the scaling hides the fact that the timer source cannot do a good
> > > job at microsecond resolution delays, then it's not just lying, it's
> > > lying in ways that hide real issues. So why should that kind of
> > > behavior be encouraged? The actual *real* unscaled resolution of the
> > > timer is valid and real information.
> >
> > I think you are missing something fundamental in this thread.
>
> I think what Linus is trying to tell us is that:
>
> 1. Where the kernel uses a software loop for implementing delays,
> the kernel bogomips gives us a calibration of that loop.
>
> 2. Where the kernel uses a hardware timer for implementing delays,
> the kernel bogomips gives us a calibration of that hardware timer.
>
> And it doesn't matter whether or not that timer has anything to do with
> the raw CPU speed.
>
> In other words, bogomips is a statement about the accuracy of the
> internal kernel mechanism being used for delays, nothing more, nothing
> less.
I'm not clear if that's actually what Linus is trying to tell us.
But that statement makes no sense at all. Why would user space care
about kernel internal mechanism for small delays? This hardly has any
influence on user space and I just can't imagine any user space code
consuming the bogomips value for that purpose.
What user space did with bogomips, though, is to get some relative
measure of the CPU speed for whatever calibration purposes, or simply
for pretty printing.
> Now, if I understand Linus correctly, what irks him is when someone
> upgrades a kernel on a platform, and some userland breaks. That's
> something which I've said multiple times I don't have a problem
> agreeing with, and I suspect no one in this thread would disagree
> that this is a serious failing, and one which needs fixing ASAP.
I agree too. And that is fixed in mainline with commit 4bf9636c39.
> However, if running userland on platform A works, and but it doesn't
> work on platform B. The breakage may well be due to platform A reporting
> 300 bogomips because it's using the kernel software loop, and platform
> B reporting 6 bogomips because its using a hardware timer, but the CPU
> is actually faster. However, this is not a kernel problem, and it
> certainly is not a regression. It's a userspace bug which needs
> userspace to fix.
There I disagree. In the spirit of "the kernel shall never break user
space ever" I'd say that the kernel is simply doing a poor job at
providing user space with a value that won't break user space
expectations. And since it is not that hard to do (I made a patch
already) I'd say we have less to lose by fixing it than keeping a
totally senseless value around.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 21:15 ` Nicolas Pitre
@ 2015-01-09 22:54 ` Steven Rostedt
0 siblings, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2015-01-09 22:54 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Russell King - ARM Linux, Linus Torvalds, Catalin Marinas,
Pavel Machek, Marc Zyngier, kernel list
On Wed, Jan 07, 2015 at 04:15:00PM -0500, Nicolas Pitre wrote:
>
> > However, if running userland on platform A works, and but it doesn't
> > work on platform B. The breakage may well be due to platform A reporting
> > 300 bogomips because it's using the kernel software loop, and platform
> > B reporting 6 bogomips because its using a hardware timer, but the CPU
> > is actually faster. However, this is not a kernel problem, and it
> > certainly is not a regression. It's a userspace bug which needs
> > userspace to fix.
>
> There I disagree. In the spirit of "the kernel shall never break user
> space ever" I'd say that the kernel is simply doing a poor job at
> providing user space with a value that won't break user space
> expectations. And since it is not that hard to do (I made a patch
> already) I'd say we have less to lose by fixing it than keeping a
> totally senseless value around.
It's not that the kernel shall never break userspace. It must not cause
userspace regressions.
If application A worked on box X and they upgrade the kernel and then
application A no longer works. That's a regression, and must be fixed.
Now if I understand what Russell stated, if application A works on box X
and you move to box Y using the same kernel, and application A no longer
works, that's not a regression with the kernel (unless it use to work
on box Y). If it never worked on box Y, it's a platform issue and
application A is not robust enough to deal with it. AKA, not a kernel
bug.
Now, it gets interesting if a fix was made that lets application A work
on box Y, but that fix broked application B on box X. Reverting that
fix will cause a regression for A or Y, but the fix itself caused a
regression for B on X. In that case we need a new fix (which may be
the case we are here).
-- Steve
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 20:53 ` Russell King - ARM Linux
2015-01-07 21:15 ` Nicolas Pitre
@ 2015-01-07 22:14 ` Catalin Marinas
2015-01-08 0:05 ` Linus Torvalds
2015-01-08 10:39 ` Russell King - ARM Linux
1 sibling, 2 replies; 66+ messages in thread
From: Catalin Marinas @ 2015-01-07 22:14 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Nicolas Pitre, Linus Torvalds, Pavel Machek, Marc Zyngier,
kernel list
On 7 January 2015 at 20:53, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> I think what Linus is trying to tell us is that:
>
> 1. Where the kernel uses a software loop for implementing delays,
> the kernel bogomips gives us a calibration of that loop.
Fine.
> 2. Where the kernel uses a hardware timer for implementing delays,
> the kernel bogomips gives us a calibration of that hardware timer.
Fine as well.
> And it doesn't matter whether or not that timer has anything to do with
> the raw CPU speed.
Indeed.
> In other words, bogomips is a statement about the accuracy of the
> internal kernel mechanism being used for delays, nothing more, nothing
> less.
And for whatever reason, some user space program thinks it can read
something meaningful out of this number and use it in user space. But
the consensus is that even if user space is badly implemented, we do
*not* break it. I agree.
> Now, if I understand Linus correctly, what irks him is when someone
> upgrades a kernel on a platform, and some userland breaks. That's
> something which I've said multiple times I don't have a problem
> agreeing with, and I suspect no one in this thread would disagree
> that this is a serious failing, and one which needs fixing ASAP.
Agree. But I assume you refer to the fact that we removed the BogoMIPS
reporting. It's fine to have it reverted.
> However, if running userland on platform A works, and but it doesn't
> work on platform B. The breakage may well be due to platform A reporting
> 300 bogomips because it's using the kernel software loop, and platform
> B reporting 6 bogomips because its using a hardware timer, but the CPU
> is actually faster. However, this is not a kernel problem, and it
> certainly is not a regression. It's a userspace bug which needs
> userspace to fix.
We need to look back at the point we added timer-based delay about 2.5
years ago. Prior to commit d0a533b18235d362, platform A reported
bogomips 300. After that commit, the *same* platform A (not B),
started reported 6.
Is the above considered user breakage? That's what Nico is trying to
solve. If we are fine with it, than we can close this thread, no
further changes needed.
We can document it as Linus suggests and say that prior to whatever
version we had 2.5 years ago, BogoMIPS was based on a busy loop. In
more recent kernels, it is based on a timer delay. User space should
make use of such information when interpreting BogoMIPS.
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 22:14 ` Catalin Marinas
@ 2015-01-08 0:05 ` Linus Torvalds
2015-01-08 0:45 ` Nicolas Pitre
2015-01-08 10:39 ` Russell King - ARM Linux
1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2015-01-08 0:05 UTC (permalink / raw)
To: Catalin Marinas
Cc: Russell King - ARM Linux, Nicolas Pitre, Pavel Machek,
Marc Zyngier, kernel list
On Wed, Jan 7, 2015 at 2:14 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> We need to look back at the point we added timer-based delay about 2.5
> years ago. Prior to commit d0a533b18235d362, platform A reported
> bogomips 300. After that commit, the *same* platform A (not B),
> started reported 6.
>
> Is the above considered user breakage?
Things change. The only thing that is considered "user breakage" is
when something actually doesn't work any more.
That has always been the rule. It's not that the kernel ABI (with all
the system calls, all the /proc files, all the ioctl's, etc) is set in
stone and "sacred". Absolutely anything can be changed, wildly.
But if it turns out that applications (or hardware) that people use
end up breaking noticeably, then *that* is a regression.
And the important part there are those weasel-words: "that people use"
and "noticeably".
For example, a test-suite giving a different result is *not* a
regression, although it should obviously be considered a big red flag.
So if somebody tells you that some test-suite shows that some ABI
changed, at the very least you should be very nervous about things.
But if that same test-suite result is then used in a production
environment as part of some actual user flow, and it breaks that user
model, then it suddenly becomes a regression. So the very definition
of "regression" is not really about the API changing, but about
breaking peoples existing setups. Of course, if you never change any
API that is visible to user space, you can never create that kind of
regression, so they are _related_, but some people confuse the two.
They are still very different.
Similarly, theoretical arguments of "so-and-so wouldn't work after
this change" are just that - theoretical arguments. It's something to
worry about, but it's not an actual *regression* until it causes
problems.
For an extreme example of this: we can remove support for whole
platforms and architectures, and sometimes we do. It clearly
completely breaks support for the hardware in question - but it only
counts as a regression if anybody notices and cares. There may still
be active users of that platform that provably cannot possibly work at
all any more, but if they never upgrade the kernel, then it's still
not a regression.
In this case, pretty much all of /proc/cpuinfo is mainly
"informational". Maybe there are applications that show it, but more
likely you have people who ssh in and just do
cat /proc/cpuinfo
to see what kind of system they are running on. That's the main point
of much of /proc, and things like /proc/cpuinfo in particular.
Now *main* point doesn't necessarily mean "only point". There clearly
are binaries parsing it. Some do it to figure out how many CPU's the
system has, often simply because using /proc is simple from various
scripting environments, for example. So while most of /proc/cpuinfo is
clearly for human consumption, it's also understandable that some
parts of it might matter for people.
And quite frankly, I personally think that any program that parses
/proc/cpuinfo in order to find the bogomips value and use it for
anything is just clearly insane. Why would you ever do that? It makes
no sense. It's crazy. Apparently the crazy audio library didn't even
do it in a meaningful way, and the use of that value seems to be
pretty much random, and the actual value likely doesn't really even
*matter*.
But the rule for "regression" has never been about sanity, or about
whether the ABI changes. There are tons of horribly insane user
programs. Parsing /proc to find bogomips may be insane and odd, but
it's certainly not the worst kind of diseased code I've ever heard
about. We have had major programs that literally depended on totally
insane small details that were never intentional, and just happened to
have some particular implementation detail. And then the
implementation changed, and the interface ostensibly did exactly the
same thing, but because it did it with some meaningless difference
that couldn't *possibly* matter in any sane situation, it caused a
regression.
So the kernel regression rules are very strict in that it's the
absolute #1 rule in kernel development, but at the same time, they are
about as lax as they can possibly be: an interface change is only a
regression if somebody notices.
Changing the bogomips value - even radically - or removing it entirely
isn't a regression in itself.
And in this case, I do suspect that the *actual* value really almost
doesn't matter. It looks more like some internal badly done hint for
some buffer size or other. It is possible that wild fluctuations could
be noticeable, but it's fairly unlikely.
The other "good news" in this area is that I suspect that the random
ARM platforms that actually changed 2.5 years ago are not very widely
used any more. So not only does the actual real value probably not
matter much to begin with, but the platforms where it really changed
are probably not a major issue.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 0:05 ` Linus Torvalds
@ 2015-01-08 0:45 ` Nicolas Pitre
2015-01-08 0:57 ` Linus Torvalds
0 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-08 0:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, 7 Jan 2015, Linus Torvalds wrote:
> In this case, pretty much all of /proc/cpuinfo is mainly
> "informational". Maybe there are applications that show it, but more
> likely you have people who ssh in and just do
>
> cat /proc/cpuinfo
>
> to see what kind of system they are running on. That's the main point
> of much of /proc, and things like /proc/cpuinfo in particular.
Would you mind if on ARM we used the bogomips line as an informative
approximation for the CPU speed? After all, this is the meaning it had
for nearly 20 years. And unlike on X86 we don't have the actual CPU
clock in there. And that's still the meaning it has on most ARM systems
out there.
> Changing the bogomips value - even radically - or removing it entirely
> isn't a regression in itself.
>
> And in this case, I do suspect that the *actual* value really almost
> doesn't matter. It looks more like some internal badly done hint for
> some buffer size or other. It is possible that wild fluctuations could
> be noticeable, but it's fairly unlikely.
In that case nobody would complain if on some systems we make it back to
the traditional value. It took over a year before problem report about
its disappearance reached us.
On other systems it will remain the same because its meaning still
hasn't changed. But on the whole it'll have a coherent meaning.
> The other "good news" in this area is that I suspect that the random
> ARM platforms that actually changed 2.5 years ago are not very widely
> used any more.
I beg to differ. My primary test platform is one of them and it has the
latest incarnation of the ARM 32-bit CPU architecture.
Rather, the good news is that not that many user space things rely on
the bogomips value. Or maybe people don't upgrade their kernels that
often e.g. the latest OpenWRT stable version from 3 months ago ships
with a 3.10 kernel. Oh and all the Ubuntu releases for ARM in the last
year shipped with kernels that don't advertise the bogomips and passed
QA. So if we bring it back now, better make it usefully informative at
least for humans.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 0:45 ` Nicolas Pitre
@ 2015-01-08 0:57 ` Linus Torvalds
2015-01-08 4:56 ` Nicolas Pitre
0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2015-01-08 0:57 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, Jan 7, 2015 at 4:45 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>
> Would you mind if on ARM we used the bogomips line as an informative
> approximation for the CPU speed? After all, this is the meaning it had
> for nearly 20 years. And unlike on X86 we don't have the actual CPU
> clock in there. And that's still the meaning it has on most ARM systems
> out there.
Yes, I actually would mind, unless you have a damn good reason for it.
On x86, we have bogomips, but we also have this line:
cpu MHz : 2275.109
and I really don't see why you should lie in your /proc/cpuinfo.
Really.
Just give the real information. Don't lie.
Quite frankly, the *only* actual real reason I've heard from you for
not having the real bogomips there is "waste of time".
And this whole thread has been *nothing* but waste of time. But it has
been *you* wasting time, and that original commit. If you had just
left it alone, and had just let the revert do its job, none of this
waste of time would have happened.
So quite frankly, my patience for you arguing "wasting time" is pretty
damn low. I think your arguments are crap, I still think your NAK was
*way* out of line, and I think it's completely *insane* to lie about
bogomips. It's disasteful, it's dishonest, and there's no reason for
it.
If you want to give people a sense of the CPU frequency, then dammit,
just *do* that. Add that "cpu MHz" line that we already have on x86.
Seriously, what kind of *insane* argument can you really marshal for
lying to users?
And if you want users to know the CPU frequency, then why the f*ck
would you call it "bogomips" and confuse them?
Christ, this whole thing is annoying. I really find it *offensive* how
you want to basically lie to users.
Stop this idiocy. Really. There is no excuse.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 0:57 ` Linus Torvalds
@ 2015-01-08 4:56 ` Nicolas Pitre
2015-01-08 5:04 ` Linus Torvalds
0 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-08 4:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, 7 Jan 2015, Linus Torvalds wrote:
> On Wed, Jan 7, 2015 at 4:45 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >
> > Would you mind if on ARM we used the bogomips line as an informative
> > approximation for the CPU speed? After all, this is the meaning it had
> > for nearly 20 years. And unlike on X86 we don't have the actual CPU
> > clock in there. And that's still the meaning it has on most ARM systems
> > out there.
>
> Yes, I actually would mind, unless you have a damn good reason for it.
Consistency.
> On x86, we have bogomips, but we also have this line:
>
> cpu MHz : 2275.109
On ARM we just can't find the CPU clock in a generic way. Yes, this is
part of the ARM hardware environment where every implementer can do
whatever they want with things that are not architected. That's not
something we kernel developers can do anything about.
What we can do in a generic way, though, is counting the number of loops
the CPU can perform during a jiffy.
> and I really don't see why you should lie in your /proc/cpuinfo.
You keep harping on with that statement. Could you just tell me _what_
we would be lying about and to _whom_? It's probably the third time I'm
asking this with no rational answer so far.
What I'm telling you is that a kernel on a machine that used to show 800
bogomips and suddenly start showing 6 bogomips is lying. But for some
contrived reasons that's fine with you.
> Quite frankly, the *only* actual real reason I've heard from you for
> not having the real bogomips there is "waste of time".
Quite the reverse. In fact this is getting hilarious. Either you keep
on understanding all I say only half way, or you purposely twist my
words. What kind of game are you playing?
What I said is that:
1) The user space bogomips reporting on ARM, for the best part of the
last 20 years, was based on the number of loops the CPU can perform
during a jiffy. Given its history that's what I'd call the real
bogomips.
2) Because of some relatively recent internal kernel changes, the
bogomips reporting became totally useless for its consumers as it was
orders of magnitude smaller than it used to be. Like moving from 800
to 6 on the same hardware. Those consumers started complaining.
3) We told those consumers: bogomips is evil, stop wasting everyone's
time and don't use it, because a value of 8 is no longer the real
bogomips. It was unreliable before, now it is meaningless. Go consume
another metric instead.
4) Complaints still came by, so we decided to solve the issue by simply
removing the meaningless bogomips reporting altogether. This worked
wonderfully for more than a year, at least from our perspective.
5) The lack of a bogomips reporting broke some user space applications.
This is an unacceptable regression and the meaningless bogomips was
reinstated as in (3).
6) Now I'm claiming that if we're going to need the bogomips reporting
not to break user space, we should at least go back to the real
bogomips as it has been for the best part of the last 20 years i.e
like in (1), not the meaningless one.
For (6) I have the patch, it is non intrusive, and doesn't touch generic
code at all. The diffstat for my latest version is:
2 files changed, 2 insertions(+), 15 deletions(-)
So, instead of telling me _why_ the above is not the sanest thing to do,
you come up with diatribes like:
> And this whole thread has been *nothing* but waste of time. But it has
> been *you* wasting time, and that original commit. If you had just
> left it alone, and had just let the revert do its job, none of this
> waste of time would have happened.
You are just full of b/s. I'm bringing rational arguments to this
discussion, and when you can't debunk them you have nothing better than
accusing me of wasting time with a stream of emotions. Sorry but I'm
holding you up to better standards.
Sure my NAK on the revert was premature. I was envisioning a revert
that would also include (6) above. But the issue is no longer about the
revert. We can do (6) separately.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 4:56 ` Nicolas Pitre
@ 2015-01-08 5:04 ` Linus Torvalds
2015-01-08 5:54 ` Nicolas Pitre
0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2015-01-08 5:04 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, Jan 7, 2015 at 8:56 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>>
>> Yes, I actually would mind, unless you have a damn good reason for it.
>
> Consistency.
Fuck no.
"Completely made up number that you cannot explain" is not consistency.
The *current* situation is consistency. I can - and have in this very
thread - explained what bogomips is in just a sentence or two.
You are just making shit up. Bad shit. Get off the drugs, because it's
not the good kind.
> On ARM we just can't find the CPU clock in a generic way.
Cry me a river.
So you want to make bogomips a totally random number, that has no
meaning, no correlation to any clocksource, and no correlation to cpu
frequency either?
And you argue for this though exactly WHAT? "Consistency".
Bullshit.
This whole thread is now marked as "muted" for me, because I can't
take the BS any more. You make no sense.
You guys playing games with bogomips was what broke things in the
first place, and now you want to play *more* games?
You're crazy. Go away. Or don't. I won't be seeing your emails anyway,
so why would I care?
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 5:04 ` Linus Torvalds
@ 2015-01-08 5:54 ` Nicolas Pitre
0 siblings, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-08 5:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Catalin Marinas, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On Wed, 7 Jan 2015, Linus Torvalds wrote:
> On Wed, Jan 7, 2015 at 8:56 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >>
> >> Yes, I actually would mind, unless you have a damn good reason for it.
> >
> > Consistency.
>
> Fuck no.
>
> "Completely made up number that you cannot explain" is not consistency.
Again this statement. I'm not against it as this is a true statement.
I really wonder how you can describe my intent with that statement
though. We must be living in different universes. This is so wrong as
not to be funny anymore.
If you don't want to see my reply that's fine. It'll be there for the
posterity at least. If That's because you're just too proud to concede
I'm right then this is very sad.
All I'm trying to tell you is that 6 *is* a "Completely made up number
that you cannot explain" for user space and that's what we have right
now in mainline for some ARM machines.
What I'm advocating for is a number that is _not_ completely made up.
I'm advocating for a bogomips which meaning is well known and dates back
to early Linux releases: the number of loops performed by the CPU during
a jiffy, scaled to a second worth of jiffies, divided by 2 because there
are 2 instructions in that loop. Incidentally this very definition is
the one you provided yourself in this thread. That's what I want, yet
you qualify this as a "Completely made up number that you cannot
explain".
> So you want to make bogomips a totally random number, that has no
> meaning, no correlation to any clocksource, and no correlation to cpu
> frequency either?
Yeah right. I challenge you to quote anything I said in my previous
email where I clearly explained everything in 6 points what I want.
Anything you may quote to substantiate that statement of yours about
what I want. Everyone following this thread knows you can't.
But you probably didn't even read it. It might hurt too much.
Sheesh.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 22:14 ` Catalin Marinas
2015-01-08 0:05 ` Linus Torvalds
@ 2015-01-08 10:39 ` Russell King - ARM Linux
2015-01-08 15:44 ` Vince Weaver
1 sibling, 1 reply; 66+ messages in thread
From: Russell King - ARM Linux @ 2015-01-08 10:39 UTC (permalink / raw)
To: Catalin Marinas
Cc: Nicolas Pitre, Linus Torvalds, Pavel Machek, Marc Zyngier,
kernel list
On Wed, Jan 07, 2015 at 10:14:07PM +0000, Catalin Marinas wrote:
> On 7 January 2015 at 20:53, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > Now, if I understand Linus correctly, what irks him is when someone
> > upgrades a kernel on a platform, and some userland breaks. That's
> > something which I've said multiple times I don't have a problem
> > agreeing with, and I suspect no one in this thread would disagree
> > that this is a serious failing, and one which needs fixing ASAP.
>
> Agree. But I assume you refer to the fact that we removed the BogoMIPS
> reporting. It's fine to have it reverted.
>
> > However, if running userland on platform A works, and but it doesn't
> > work on platform B. The breakage may well be due to platform A reporting
> > 300 bogomips because it's using the kernel software loop, and platform
> > B reporting 6 bogomips because its using a hardware timer, but the CPU
> > is actually faster. However, this is not a kernel problem, and it
> > certainly is not a regression. It's a userspace bug which needs
> > userspace to fix.
>
> We need to look back at the point we added timer-based delay about 2.5
> years ago. Prior to commit d0a533b18235d362, platform A reported
> bogomips 300. After that commit, the *same* platform A (not B),
> started reported 6.
Correct.
> Is the above considered user breakage? That's what Nico is trying to
> solve. If we are fine with it, than we can close this thread, no
> further changes needed.
It's not a regression - yet. No one has shown that userspace has broken
according to the definition of the first quote above, and that's the
whole point.
> We can document it as Linus suggests and say that prior to whatever
> version we had 2.5 years ago, BogoMIPS was based on a busy loop. In
> more recent kernels, it is based on a timer delay. User space should
> make use of such information when interpreting BogoMIPS.
We don't need to document it - we just need to point people at this
URL:
http://en.wikipedia.org/wiki/BogoMips
which already describes it fully, including the existing ARM timer
behaviour.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 10:39 ` Russell King - ARM Linux
@ 2015-01-08 15:44 ` Vince Weaver
2015-01-08 16:19 ` Catalin Marinas
2015-01-08 16:32 ` Russell King - ARM Linux
0 siblings, 2 replies; 66+ messages in thread
From: Vince Weaver @ 2015-01-08 15:44 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Catalin Marinas, Nicolas Pitre, Linus Torvalds, Pavel Machek,
Marc Zyngier, kernel list, vincent.weaver
On Thu, 8 Jan 2015, Russell King - ARM Linux wrote:
> It's not a regression - yet. No one has shown that userspace has broken
> according to the definition of the first quote above, and that's the
> whole point.
How much does one have to regress before it is a problem? I have two
projects I've worked on that "broke" due to this issue.
They were minor breakages though.
The "linux_logo" userspace sysinfo tool broke to the extent that it
was parsing for the bogomips string in /proc/cpuinfo and printed poorly
formatted and/or corrupted text info to screen when it couldn't find it.
The "PAPI" library had some really ancient (and poorly
thought-out) fallback code that would try to estimate MHz from bogomips
if a MHz value was not available via the traditional methods. This
failed after the change too, but not many people use PAPI on ARM so it
wasn't that big an issue.
I noticed these problems early, even before the change hit mainline.
But when I complained I was told in no uncertain terms that the ARM
maintainers were tired of hearing about bogomips issues and nothing
could be done to stop the change from getting in.
Vince
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 15:44 ` Vince Weaver
@ 2015-01-08 16:19 ` Catalin Marinas
2015-01-08 16:34 ` Russell King - ARM Linux
2015-01-08 16:32 ` Russell King - ARM Linux
1 sibling, 1 reply; 66+ messages in thread
From: Catalin Marinas @ 2015-01-08 16:19 UTC (permalink / raw)
To: Vince Weaver
Cc: Russell King - ARM Linux, nicolas.pitre@linaro.org,
Linus Torvalds, Pavel Machek, Marc Zyngier, kernel list,
vincent.weaver@maine.edu
On Thu, Jan 08, 2015 at 03:44:53PM +0000, Vince Weaver wrote:
> On Thu, 8 Jan 2015, Russell King - ARM Linux wrote:
>
> > It's not a regression - yet. No one has shown that userspace has broken
> > according to the definition of the first quote above, and that's the
> > whole point.
>
> How much does one have to regress before it is a problem? I have two
> projects I've worked on that "broke" due to this issue.
With the revert in place, now you get the bogomips value. Just don't
assume anything about it.
> They were minor breakages though.
>
> The "linux_logo" userspace sysinfo tool broke to the extent that it
> was parsing for the bogomips string in /proc/cpuinfo and printed poorly
> formatted and/or corrupted text info to screen when it couldn't find it.
We now have the bogomips string back, so this problem is solved.
> The "PAPI" library had some really ancient (and poorly
> thought-out) fallback code that would try to estimate MHz from bogomips
> if a MHz value was not available via the traditional methods. This
> failed after the change too, but not many people use PAPI on ARM so it
> wasn't that big an issue.
>
> I noticed these problems early, even before the change hit mainline.
> But when I complained I was told in no uncertain terms that the ARM
> maintainers were tired of hearing about bogomips issues and nothing
> could be done to stop the change from getting in.
There were many complaints, from marketing people to some Linux users
who had a "feeling" that their platform just got much slower after the
delay loop change. Since this patch didn't gain much traction:
https://lkml.org/lkml/2013/5/3/405
we decided to remove it completely so that we stop complaints that
bogomips does not match the CPU frequency. Unfortunately, we broke the
ABI.
Now the bogomips is back, but we are going to ignore anyone asking about
its value as it can be either the speed of a busy delay loop or the
generic timer frequency (completely independent; which one depends on
kernel version and SoC).
I really think we should stop this thread. User ABI breakage fixed now.
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 16:19 ` Catalin Marinas
@ 2015-01-08 16:34 ` Russell King - ARM Linux
2015-01-08 16:41 ` Catalin Marinas
0 siblings, 1 reply; 66+ messages in thread
From: Russell King - ARM Linux @ 2015-01-08 16:34 UTC (permalink / raw)
To: Catalin Marinas
Cc: Vince Weaver, nicolas.pitre@linaro.org, Linus Torvalds,
Pavel Machek, Marc Zyngier, kernel list, vincent.weaver@maine.edu
On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> I really think we should stop this thread. User ABI breakage fixed now.
No. Vince's email shows a more serious problem.
The ABI breakage issue was reported before the patches were apparently
merged, but that information seems to have been lost. We need to
understand how that happened so similar instances don't happen in the
future.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 16:34 ` Russell King - ARM Linux
@ 2015-01-08 16:41 ` Catalin Marinas
2015-01-08 16:57 ` Russell King - ARM Linux
0 siblings, 1 reply; 66+ messages in thread
From: Catalin Marinas @ 2015-01-08 16:41 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Vince Weaver, nicolas.pitre@linaro.org, Linus Torvalds,
Pavel Machek, Marc Zyngier, kernel list, vincent.weaver@maine.edu
On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > I really think we should stop this thread. User ABI breakage fixed now.
>
> No. Vince's email shows a more serious problem.
>
> The ABI breakage issue was reported before the patches were apparently
> merged, but that information seems to have been lost. We need to
> understand how that happened so similar instances don't happen in the
> future.
FYI, searching for "Vince Weaver" and "bogomips" found this:
https://lkml.org/lkml/2013/7/11/454
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 16:41 ` Catalin Marinas
@ 2015-01-08 16:57 ` Russell King - ARM Linux
2015-01-08 17:01 ` Catalin Marinas
` (2 more replies)
0 siblings, 3 replies; 66+ messages in thread
From: Russell King - ARM Linux @ 2015-01-08 16:57 UTC (permalink / raw)
To: Catalin Marinas
Cc: Vince Weaver, nicolas.pitre@linaro.org, Linus Torvalds,
Pavel Machek, Marc Zyngier, kernel list, vincent.weaver@maine.edu
On Thu, Jan 08, 2015 at 04:41:12PM +0000, Catalin Marinas wrote:
> On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> > On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > > I really think we should stop this thread. User ABI breakage fixed now.
> >
> > No. Vince's email shows a more serious problem.
> >
> > The ABI breakage issue was reported before the patches were apparently
> > merged, but that information seems to have been lost. We need to
> > understand how that happened so similar instances don't happen in the
> > future.
>
> FYI, searching for "Vince Weaver" and "bogomips" found this:
>
> https://lkml.org/lkml/2013/7/11/454
Gah.
It looks like the report was made by hanging it into a totally different
thread to the patches which caused the problem, and which went nowhere
near the ARM kernel mailing lists.
At least that explains why people like me who really need to know this
stuff were not aware of the issue.
One thing I still can't get my head around though is... Vince reported
the breakage on 11 July 2013, but the patch is dated 30 August and I
merged it 2 September. So, how did Vince know about it to report the
ABI breakage? Had it been sitting somewhere else?
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 16:57 ` Russell King - ARM Linux
@ 2015-01-08 17:01 ` Catalin Marinas
2015-01-08 17:39 ` Vince Weaver
2015-01-08 17:22 ` Vince Weaver
2015-01-08 22:46 ` Pavel Machek
2 siblings, 1 reply; 66+ messages in thread
From: Catalin Marinas @ 2015-01-08 17:01 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Vince Weaver, nicolas.pitre@linaro.org, Linus Torvalds,
Pavel Machek, Marc Zyngier, kernel list, vincent.weaver@maine.edu
On Thu, Jan 08, 2015 at 04:57:05PM +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 08, 2015 at 04:41:12PM +0000, Catalin Marinas wrote:
> > On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > > > I really think we should stop this thread. User ABI breakage fixed now.
> > >
> > > No. Vince's email shows a more serious problem.
> > >
> > > The ABI breakage issue was reported before the patches were apparently
> > > merged, but that information seems to have been lost. We need to
> > > understand how that happened so similar instances don't happen in the
> > > future.
> >
> > FYI, searching for "Vince Weaver" and "bogomips" found this:
> >
> > https://lkml.org/lkml/2013/7/11/454
>
> Gah.
>
> It looks like the report was made by hanging it into a totally different
> thread to the patches which caused the problem, and which went nowhere
> near the ARM kernel mailing lists.
>
> At least that explains why people like me who really need to know this
> stuff were not aware of the issue.
>
> One thing I still can't get my head around though is... Vince reported
> the breakage on 11 July 2013, but the patch is dated 30 August and I
> merged it 2 September. So, how did Vince know about it to report the
> ABI breakage? Had it been sitting somewhere else?
Well, the patch may have been on the list, acked, before July but it
only went to the patch system on the 30th of August. Vince probably
concluded that it will go in once acked on the list.
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 17:01 ` Catalin Marinas
@ 2015-01-08 17:39 ` Vince Weaver
0 siblings, 0 replies; 66+ messages in thread
From: Vince Weaver @ 2015-01-08 17:39 UTC (permalink / raw)
To: Catalin Marinas
Cc: Russell King - ARM Linux, Vince Weaver, nicolas.pitre@linaro.org,
Linus Torvalds, Pavel Machek, Marc Zyngier, kernel list,
vincent.weaver@maine.edu
On Thu, 8 Jan 2015, Catalin Marinas wrote:
> On Thu, Jan 08, 2015 at 04:57:05PM +0000, Russell King - ARM Linux wrote:
> > On Thu, Jan 08, 2015 at 04:41:12PM +0000, Catalin Marinas wrote:
> > > On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> > > > On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > > > > I really think we should stop this thread. User ABI breakage fixed now.
> > > >
> > > > No. Vince's email shows a more serious problem.
> > > >
> > > > The ABI breakage issue was reported before the patches were apparently
> > > > merged, but that information seems to have been lost. We need to
> > > > understand how that happened so similar instances don't happen in the
> > > > future.
> > >
> > > FYI, searching for "Vince Weaver" and "bogomips" found this:
> > >
> > > https://lkml.org/lkml/2013/7/11/454
> >
> > Gah.
> >
> > It looks like the report was made by hanging it into a totally different
> > thread to the patches which caused the problem, and which went nowhere
> > near the ARM kernel mailing lists.
> >
> > At least that explains why people like me who really need to know this
> > stuff were not aware of the issue.
> >
> > One thing I still can't get my head around though is... Vince reported
> > the breakage on 11 July 2013, but the patch is dated 30 August and I
> > merged it 2 September. So, how did Vince know about it to report the
> > ABI breakage? Had it been sitting somewhere else?
>
> Well, the patch may have been on the list, acked, before July but it
> only went to the patch system on the 30th of August. Vince probably
> concluded that it will go in once acked on the list.
Yes, it looks like it went out on the linux-arm-kernel list on
3 May 2013.
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-May/166728.html
I don't follow that list though, so someone must have pointed me to the
posting.
Though there must have been something else too, I definitely recall
reading a long rant about how annoying it was that everyone was reporting
bogomips value errors and this was the best way to handle things.
All of the postings of the patch I can find are much more laid back.
Mostly why I didn't push the issue though was all the pushback I've gotten
over the years for trying to get perf_event ABI breaks reverted (I've
never successfully managed to get that to happen).
Vince
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 16:57 ` Russell King - ARM Linux
2015-01-08 17:01 ` Catalin Marinas
@ 2015-01-08 17:22 ` Vince Weaver
2015-01-08 22:46 ` Pavel Machek
2 siblings, 0 replies; 66+ messages in thread
From: Vince Weaver @ 2015-01-08 17:22 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Catalin Marinas, Vince Weaver, nicolas.pitre@linaro.org,
Linus Torvalds, Pavel Machek, Marc Zyngier, kernel list,
vincent.weaver@maine.edu
On Thu, 8 Jan 2015, Russell King - ARM Linux wrote:
> On Thu, Jan 08, 2015 at 04:41:12PM +0000, Catalin Marinas wrote:
> > On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > > > I really think we should stop this thread. User ABI breakage fixed now.
> > >
> > > No. Vince's email shows a more serious problem.
> > >
> > > The ABI breakage issue was reported before the patches were apparently
> > > merged, but that information seems to have been lost. We need to
> > > understand how that happened so similar instances don't happen in the
> > > future.
> >
> > FYI, searching for "Vince Weaver" and "bogomips" found this:
> >
> > https://lkml.org/lkml/2013/7/11/454
>
> Gah.
>
> It looks like the report was made by hanging it into a totally different
> thread to the patches which caused the problem, and which went nowhere
> near the ARM kernel mailing lists.
yes, though by that point I had thought it was a done deal and so was
mostly complaining about it with regard to another issue.
I usually don't hang out on the ARM lists, but perf-event related ones,
which was why I was primarily discussing the issue with Will Deacon
in that case.
> One thing I still can't get my head around though is... Vince reported
> the breakage on 11 July 2013, but the patch is dated 30 August and I
> merged it 2 September. So, how did Vince know about it to report the
> ABI breakage? Had it been sitting somewhere else?
That's a good question. Someone definitely had told me that is was going
to happen earlier, but I can't remember where now. Definitely someone
with enough authority that I thought it wasn't worth the trouble
complaining (I already seemingly have half the perf_event people annoyed
with me on any given day, the last thing I wanted to do was get all of
the ARM devs mad at me too).
Vince
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 16:57 ` Russell King - ARM Linux
2015-01-08 17:01 ` Catalin Marinas
2015-01-08 17:22 ` Vince Weaver
@ 2015-01-08 22:46 ` Pavel Machek
2015-01-09 9:49 ` Catalin Marinas
2 siblings, 1 reply; 66+ messages in thread
From: Pavel Machek @ 2015-01-08 22:46 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Catalin Marinas, Vince Weaver, nicolas.pitre@linaro.org,
Linus Torvalds, Marc Zyngier, kernel list,
vincent.weaver@maine.edu
On Thu 2015-01-08 16:57:05, Russell King - ARM Linux wrote:
> On Thu, Jan 08, 2015 at 04:41:12PM +0000, Catalin Marinas wrote:
> > On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > > > I really think we should stop this thread. User ABI breakage fixed now.
> > >
> > > No. Vince's email shows a more serious problem.
> > >
> > > The ABI breakage issue was reported before the patches were apparently
> > > merged, but that information seems to have been lost. We need to
> > > understand how that happened so similar instances don't happen in the
> > > future.
> >
> > FYI, searching for "Vince Weaver" and "bogomips" found this:
> >
> > https://lkml.org/lkml/2013/7/11/454
>
> Gah.
>
> It looks like the report was made by hanging it into a totally different
> thread to the patches which caused the problem, and which went nowhere
> near the ARM kernel mailing lists.
This went at least to Will Deacon and Nicolas Pitre.
https://lkml.org/lkml/2013/5/15/217
So AFAICT yes, we knowingly broke at least
com.securiteinfo.android.bogomips application.
> At least that explains why people like me who really need to know this
> stuff were not aware of the issue.
>
> One thing I still can't get my head around though is... Vince reported
> the breakage on 11 July 2013, but the patch is dated 30 August and I
> merged it 2 September. So, how did Vince know about it to report the
> ABI breakage? Had it been sitting somewhere else?
There was discussion on lkml at May 2013, clearly marked "ARM:". I
don't see enough headers to know what else was copied.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 22:46 ` Pavel Machek
@ 2015-01-09 9:49 ` Catalin Marinas
0 siblings, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2015-01-09 9:49 UTC (permalink / raw)
To: Pavel Machek
Cc: Russell King - ARM Linux, Vince Weaver, nicolas.pitre@linaro.org,
Linus Torvalds, Marc Zyngier, kernel list,
vincent.weaver@maine.edu
On Thu, Jan 08, 2015 at 10:46:14PM +0000, Pavel Machek wrote:
> On Thu 2015-01-08 16:57:05, Russell King - ARM Linux wrote:
> > On Thu, Jan 08, 2015 at 04:41:12PM +0000, Catalin Marinas wrote:
> > > On Thu, Jan 08, 2015 at 04:34:50PM +0000, Russell King - ARM Linux wrote:
> > > > On Thu, Jan 08, 2015 at 04:19:08PM +0000, Catalin Marinas wrote:
> > > > > I really think we should stop this thread. User ABI breakage fixed now.
> > > >
> > > > No. Vince's email shows a more serious problem.
> > > >
> > > > The ABI breakage issue was reported before the patches were apparently
> > > > merged, but that information seems to have been lost. We need to
> > > > understand how that happened so similar instances don't happen in the
> > > > future.
> > >
> > > FYI, searching for "Vince Weaver" and "bogomips" found this:
> > >
> > > https://lkml.org/lkml/2013/7/11/454
> >
> > Gah.
> >
> > It looks like the report was made by hanging it into a totally different
> > thread to the patches which caused the problem, and which went nowhere
> > near the ARM kernel mailing lists.
>
> This went at least to Will Deacon and Nicolas Pitre.
>
> https://lkml.org/lkml/2013/5/15/217
>
> So AFAICT yes, we knowingly broke at least
> com.securiteinfo.android.bogomips application.
Hmm, "BogoMIPS is a tool used to create a web page about Android
Benchmarks". And a stupid table here:
https://www.securiteinfo.com/divers/android_bogomips_benchmark.shtml.
Even though it's against the user ABI kernel policy, I must say I'm glad
we broke this one ;).
pyaudio is a valid complaint though (and not about the bogomips value
but about the presence of the "bogomips" string).
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-08 15:44 ` Vince Weaver
2015-01-08 16:19 ` Catalin Marinas
@ 2015-01-08 16:32 ` Russell King - ARM Linux
1 sibling, 0 replies; 66+ messages in thread
From: Russell King - ARM Linux @ 2015-01-08 16:32 UTC (permalink / raw)
To: Vince Weaver
Cc: Catalin Marinas, Nicolas Pitre, Linus Torvalds, Pavel Machek,
Marc Zyngier, kernel list, vincent.weaver
On Thu, Jan 08, 2015 at 10:44:53AM -0500, Vince Weaver wrote:
> On Thu, 8 Jan 2015, Russell King - ARM Linux wrote:
>
> > It's not a regression - yet. No one has shown that userspace has broken
> > according to the definition of the first quote above, and that's the
> > whole point.
>
> How much does one have to regress before it is a problem? I have two
> projects I've worked on that "broke" due to this issue.
If it's something which worked before and fails after, that's enough.
> They were minor breakages though.
>
> The "linux_logo" userspace sysinfo tool broke to the extent that it
> was parsing for the bogomips string in /proc/cpuinfo and printed poorly
> formatted and/or corrupted text info to screen when it couldn't find it.
>
> The "PAPI" library had some really ancient (and poorly
> thought-out) fallback code that would try to estimate MHz from bogomips
> if a MHz value was not available via the traditional methods. This
> failed after the change too, but not many people use PAPI on ARM so it
> wasn't that big an issue.
>
> I noticed these problems early, even before the change hit mainline.
> But when I complained I was told in no uncertain terms that the ARM
> maintainers were tired of hearing about bogomips issues and nothing
> could be done to stop the change from getting in.
Sorry about that.
However, I'm having a hard time finding your report. Do you have a
message id for the report?
For the record, I have no knowledge of this report having been made.
My linux-arm-kernel web archives bring up nothing either.
My mailbox record going back to the start of 2013 (which includes all
linux-arm-kernel mailing list traffic and any traffic to _this_ address,
and covers six months before the commit happened), I only have messages
with these subject lines authored by you:
2013:
Subject: Re: [Ksummit-2013-discuss] [ARM ATTEND] catching up on exploit mitigations
Subject: Re: NSA310 + DT
2014:
Subject: Re: CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and bcm2835_defconfig
Subject: Re: CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and bcm2835_defconfig
Subject: Re: CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and bcm2835_defconfig
Subject: [bisected] broken make-kpkg kernel build in 3.15-rc1
Subject: Re: [bisected] broken make-kpkg kernel build in 3.15-rc1
Subject: Re: [bisected] broken make-kpkg kernel build in 3.15-rc1
Subject: Re: [bisected] broken make-kpkg kernel build in 3.15-rc1
Subject: Re: [bisected] broken make-kpkg kernel build in 3.15-rc1
Subject: Re: [bisected] broken make-kpkg kernel build in 3.15-rc1
Subject: [tip:perf/core] perf/ARM: Use common PMU interrupt disabled code
For the thread where the offending patch was posted in 2013, my mailbox
gives me:
4963 Jun 20 Will Deacon ( 36) [PATCH v2 0/2] ARM: Remove any correlatio
4964 Jun 20 Will Deacon ( 67) ├─>[PATCH v2 1/2] ARM: delay: don't bothe
4965 Jun 20 Will Deacon ( 50) ├─>[PATCH v2 2/2] init: calibrate: don't
4966 Jun 20 Nicolas Pitre ( 27) ├─>Re: [PATCH v2 0/2] ARM: Remove any cor
4967 Jun 21 Will Deacon ( 29) │ └─>
4968 Jun 21 Marc Zyngier ( 27) └─>
Did you copy linux-arm-kernel with your report?
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <CA+55aFwuO2g1S-bY96V28crMWj+dKXWANzbP28JQjBdTg0rV0w@mail.gmail.com>]
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
[not found] ` <CA+55aFwuO2g1S-bY96V28crMWj+dKXWANzbP28JQjBdTg0rV0w@mail.gmail.com>
@ 2015-01-07 21:29 ` Nicolas Pitre
[not found] ` <CA+55aFyrNE9qqBR9Khbj=TuAnjA+UzUhNxFz==SqKuiG5q3uMQ@mail.gmail.com>
0 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-07 21:29 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pavel Machek, Marc Zyngier, kernel list, Catalin Marinas,
Russell King - ARM Linux
On Wed, 7 Jan 2015, Linus Torvalds wrote:
> On Jan 7, 2015 12:34 PM, "Nicolas Pitre" <nicolas.pitre@linaro.org> wrote:
> >
> > I think you are missing something fundamental in this thread.
>
> No I'm not.
>
> Dammit, I understand. You guys are the ones who are confused. You have a
> clock, you use it, and you report *that* clock for bogomips.
>
> It doesn't matter that it's just 10MHz rather than the CPU clock.
>
> Lying about it is *not* fine, for all the reasons I've already wasted too
> much time trying to explain.
Lying to whom? And for what purpose?
> Christ, stop with the idiocy already.
Come on. You're the one coming up with idiotic reasoning for the user
space visible bogomips number. Give me at least one logical and
rational reason why user space should get a timer clock value
inconsequential to user space code through this interface and I'll shut
up. Otherwise I'll interpret your heating up as your inability to
provide such a reason.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 19:36 ` Linus Torvalds
2015-01-07 20:34 ` Nicolas Pitre
@ 2015-01-07 22:24 ` Catalin Marinas
1 sibling, 0 replies; 66+ messages in thread
From: Catalin Marinas @ 2015-01-07 22:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Nicolas Pitre, Russell King - ARM Linux, Pavel Machek,
Marc Zyngier, kernel list
On 7 January 2015 at 19:36, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> This whole thread has wasted more time than the whole original
> argument for wasted time ever was. I still have no idea what the
> original argument was, and why you guys want some made-up and
> incorrect feel-good number rather than just document the fact that the
> bogomips is simple dependent on the clocksource you use for delays.
One reason - consistency with kernels running on the *same* hardware
prior to commit d0a533b18235d362 (ARM: 7452/1: delay: allow
timer-based delay implementation to be selected). You may see BogoMIPS
of 800 prior to that commit dropping to 6 after that commit on the
*same* piece of hardware. If we don't break any user assumptions here,
no further changes needed.
--
Catalin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-07 18:11 ` Catalin Marinas
2015-01-07 18:47 ` Linus Torvalds
@ 2015-01-07 18:50 ` Nicolas Pitre
1 sibling, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2015-01-07 18:50 UTC (permalink / raw)
To: Catalin Marinas
Cc: Russell King - ARM Linux, Linus Torvalds, Pavel Machek,
Marc Zyngier, kernel list
On Wed, 7 Jan 2015, Catalin Marinas wrote:
> On 5 January 2015 at 04:51, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Sun, 4 Jan 2015 22:28:58 -0500
> > Subject: [PATCH] ARM: disentangle timer based delays and bogomips calibration
> >
> > The bogomips value is a pseudo CPU speed value originally used to calibrate
> > loop-based small delays. It is also exported to user space through the
> > /proc filesystem and some user space apps started relying on it.
> >
> > Modern hardware can vary their CPU clock at run time making the bogomips
> > value less reliable for delay purposes. With the advent of high resolution
> > timers, small delays migrated to timer polling loops instead. Strangely
> > enough, the bogomips value calibration became timer based too, making it
> > way more bogus than it already was as a CPU speed representation and people
> > using it via /proc/cpuinfo started complaining.
> >
> > Since it was wrong for user space to rely on a "bogus" mips value to start
> > with, the initial responce from kernel people was to remove it. This broke
> > user space even more as some applications then refused to run altogether.
> > The bogomips export was therefore reinstated in commit 4bf9636c39 ("Revert
> > 'ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo'").
> >
> > Because the reported bogomips is orders of magnitude away from the
> > traditionally expected value for a given CPU when timer based delays are
> > in use, and because lumping bogomips and timer based delay loops is rather
> > senseless anyway, let's calibrate bogomips using a CPU loop all the time
> > even when timer based delays are available. Timer based delays don't
> > need any calibration and /proc/cpuinfo will provide somewhat sensible
> > values again.
> >
> > In practice, calls to __delay() will now always use the CPU based loop.
> > Things remain unchanged for udelay() and its derivatives.
>
> I think this makes sense since __delay() expects the number of loops
> as argument rather than a duration scaled by some factor (based on the
> generic timer frequency).
>
> FWIW:
>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks.
> Minor comment below:
>
> > unsigned long calibrate_delay_is_known(void)
> > {
> > delay_calibrated = true;
> > - return lpj_fine;
> > +
> > + /* calibrate bogomips even when timer based delays are used */
> > + return 0;
> > }
>
> Do we need to remove delay_calibrated = true as well? We do it further
> down again in calibration_delay_done() .
The logic for delay_calibrated seemed to prevent changes to lpj in case
a better timer source got registered after boot, however commit 6f3d90e5
made that irrelevant. So perhaps delay_calibrated can go now unless
there are concerns about possible races if a better timer gets
registered and called with arguments for the previous one or the like.
In which case I think such a change would be best isolated in a separate
patch.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] Revert 9fc2105aeaaf56b0cf75296a84702d0f9e64437b to fix pyaudio (and probably more)
2015-01-05 4:51 ` Nicolas Pitre
2015-01-05 12:11 ` Will Deacon
2015-01-07 18:11 ` Catalin Marinas
@ 2015-01-08 22:49 ` Pavel Machek
2 siblings, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2015-01-08 22:49 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Russell King - ARM Linux, Linus Torvalds, Marc Zyngier,
kernel list
On Sun 2015-01-04 23:51:31, Nicolas Pitre wrote:
> On Sun, 4 Jan 2015, Russell King - ARM Linux wrote:
>
> > On Sun, Jan 04, 2015 at 04:20:57PM -0500, Nicolas Pitre wrote:
> > > On Sun, 4 Jan 2015, Linus Torvalds wrote:
> > >
> > > > On Sun, Jan 4, 2015 at 12:56 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > >
> > > > > It wasted a lot of people's time before by simply being there and wrong
> > > > > before it was removed. It's only a matter of whose time you want to
> > > > > waste. Really.
> > > >
> > > > Really. Shut up.
> > > >
> > > > The whole "no regressions" thing is very much about the fact that we
> > > > don't waste users time.
> > >
> > > I was talking about users time all along.
> > >
> > > Never mind. I'm sorry for the NAK and sorry for attempting to start a
> > > discussion to find a better replacement.
> >
> > Nico,
> >
> > I encourage you *not* to back down like this. Linus is right in so far
> > as the regressions issue, but he is *totally* wrong to do the revert,
> > which IMHO has been done out of nothing more than spite.
> >
> > Either *with or without* the revert, the issue still remains, and needs
> > to be addressed properly.
> >
> > With the revert in place, we now have insanely small bogomips values
> > reported via /proc/cpuinfo when hardware timers are used. That needs
> > fixing.
>
> Here's my take on it. Taking a step back, it was stupid to mix bogomips
> with timer based delays.
>
> ----- >8
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Sun, 4 Jan 2015 22:28:58 -0500
> Subject: [PATCH] ARM: disentangle timer based delays and bogomips calibration
>
> The bogomips value is a pseudo CPU speed value originally used to calibrate
> loop-based small delays. It is also exported to user space through the
> /proc filesystem and some user space apps started relying on it.
>
> Modern hardware can vary their CPU clock at run time making the bogomips
> value less reliable for delay purposes. With the advent of high resolution
> timers, small delays migrated to timer polling loops instead. Strangely
> enough, the bogomips value calibration became timer based too, making it
> way more bogus than it already was as a CPU speed representation and people
> using it via /proc/cpuinfo started complaining.
>
> Since it was wrong for user space to rely on a "bogus" mips value to start
> with, the initial responce from kernel people was to remove it. This broke
> user space even more as some applications then refused to run altogether.
> The bogomips export was therefore reinstated in commit 4bf9636c39 ("Revert
> 'ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo'").
>
> Because the reported bogomips is orders of magnitude away from the
> traditionally expected value for a given CPU when timer based delays are
> in use, and because lumping bogomips and timer based delay loops is rather
> senseless anyway, let's calibrate bogomips using a CPU loop all the time
> even when timer based delays are available. Timer based delays don't
> need any calibration and /proc/cpuinfo will provide somewhat sensible
> values again.
>
> In practice, calls to __delay() will now always use the CPU based loop.
> Things remain unchanged for udelay() and its derivatives.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
(No, I did not test it, but going to "historical" value should not
hurt, and it actually makes code shorter).
> diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
> index dff714d886..7feeb163f5 100644
> --- a/arch/arm/include/asm/delay.h
> +++ b/arch/arm/include/asm/delay.h
> @@ -21,13 +21,12 @@ struct delay_timer {
> };
>
> extern struct arm_delay_ops {
> - void (*delay)(unsigned long);
> void (*const_udelay)(unsigned long);
> void (*udelay)(unsigned long);
> unsigned long ticks_per_jiffy;
> } arm_delay_ops;
>
> -#define __delay(n) arm_delay_ops.delay(n)
> +#define __delay(n) __loop_delay(n)
>
> /*
> * This function intentionally does not exist; if you see references to
> @@ -63,7 +62,6 @@ extern void __loop_udelay(unsigned long usecs);
> extern void __loop_const_udelay(unsigned long);
>
> /* Delay-loop timer registration. */
> -#define ARCH_HAS_READ_CURRENT_TIMER
> extern void register_current_timer_delay(const struct delay_timer *timer);
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> index 312d43eb68..d958886874 100644
> --- a/arch/arm/lib/delay.c
> +++ b/arch/arm/lib/delay.c
> @@ -30,7 +30,6 @@
> * Default to the loop-based delay implementation.
> */
> struct arm_delay_ops arm_delay_ops = {
> - .delay = __loop_delay,
> .const_udelay = __loop_const_udelay,
> .udelay = __loop_udelay,
> };
> @@ -86,12 +85,8 @@ void __init register_current_timer_delay(const struct delay_timer *timer)
> if (!delay_calibrated && (!delay_res || (res < delay_res))) {
> pr_info("Switching to timer-based delay loop, resolution %lluns\n", res);
> delay_timer = timer;
> - lpj_fine = timer->freq / HZ;
> delay_res = res;
> -
> - /* cpufreq may scale loops_per_jiffy, so keep a private copy */
> - arm_delay_ops.ticks_per_jiffy = lpj_fine;
> - arm_delay_ops.delay = __timer_delay;
> + arm_delay_ops.ticks_per_jiffy = timer->freq / HZ;
> arm_delay_ops.const_udelay = __timer_const_udelay;
> arm_delay_ops.udelay = __timer_udelay;
> } else {
> @@ -102,7 +97,9 @@ void __init register_current_timer_delay(const struct delay_timer *timer)
> unsigned long calibrate_delay_is_known(void)
> {
> delay_calibrated = true;
> - return lpj_fine;
> +
> + /* calibrate bogomips even when timer based delays are used */
> + return 0;
> }
>
> void calibration_delay_done(void)
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 66+ messages in thread