* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 11:40 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 11:40 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>> Yes. It's because of interrupt and the CPU active-online
>>> race.
>>
>> I don't see that as a conclusion from this dump.
>>
>>> Here is the chash log..
>>> [ 21.025451] CPU1: Booted secondary processor
>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>
>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>> which is not yet online and not yet active. That's fine, we're just
>> spinning on the spinlock here, waiting for the other CPUs to release
>> it.
>>
>> But what this is saying is that the other CPUs aren't releasing it.
>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>> holding this lock, causing CPU1 to time out on it.
>>
>> The other thing is that this is only supposed to trigger after about
>> one second:
>>
>> u64 loops = loops_per_jiffy * HZ;
>> for (i = 0; i< loops; i++) {
>> if (arch_spin_trylock(&lock->raw_lock))
>> return;
>> __delay(1);
>> }
>>
>> which from the timings you have at the beginning of your printk lines
>> is clearly not the case - it's more like 61us.
>>
>> Are you running with those h/w timer delay patches?
> Nope.
Ok. So loops_per_jiffy must be too small. My guess is you're using an
older kernel without 71c696b1 (calibrate: extract fall-back calculation
into own helper).
The delay calibration code used to start out by setting:
loops_per_jiffy = (1<<12);
This will shorten the delay right down, and that's probably causing these
false spinlock lockup bug dumps.
Arranging for IRQs to be disabled across the delay calibration just avoids
the issue by preventing any spinlock being taken.
The reason that CPU#0 also complains about spinlock lockup is that for
some reason CPU#1 never finishes its calibration, and so the loop also
times out early on CPU#0.
Of course, fiddling with this global variable in this way is _not_ a good
idea while other CPUs are running and using that variable.
We could also do with implementing trigger_all_cpu_backtrace() to get
backtraces from the other CPUs when spinlock lockup happens...
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 11:40 ` Russell King - ARM Linux
@ 2011-06-20 11:51 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 11:51 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>> Yes. It's because of interrupt and the CPU active-online
>>>> race.
>>>
>>> I don't see that as a conclusion from this dump.
>>>
>>>> Here is the chash log..
>>>> [ 21.025451] CPU1: Booted secondary processor
>>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>
>>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>>> which is not yet online and not yet active. That's fine, we're just
>>> spinning on the spinlock here, waiting for the other CPUs to release
>>> it.
>>>
>>> But what this is saying is that the other CPUs aren't releasing it.
>>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>>> holding this lock, causing CPU1 to time out on it.
>>>
>>> The other thing is that this is only supposed to trigger after about
>>> one second:
>>>
>>> u64 loops = loops_per_jiffy * HZ;
>>> for (i = 0; i< loops; i++) {
>>> if (arch_spin_trylock(&lock->raw_lock))
>>> return;
>>> __delay(1);
>>> }
>>>
>>> which from the timings you have at the beginning of your printk lines
>>> is clearly not the case - it's more like 61us.
>>>
>>> Are you running with those h/w timer delay patches?
>> Nope.
>
> Ok. So loops_per_jiffy must be too small. My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).
>
I am on V3.0-rc3+(latest mainline) and the above commit is already
part of it.
> The delay calibration code used to start out by setting:
>
> loops_per_jiffy = (1<<12);
>
> This will shorten the delay right down, and that's probably causing these
> false spinlock lockup bug dumps.
>
> Arranging for IRQs to be disabled across the delay calibration just avoids
> the issue by preventing any spinlock being taken.
>
> The reason that CPU#0 also complains about spinlock lockup is that for
> some reason CPU#1 never finishes its calibration, and so the loop also
> times out early on CPU#0.
>
I am not sure but what I think is happening is as soon as interrupts
start firing, as part of IRQ handling, scheduler will try to
enqueue softIRQ thread for newly booted CPU since it sees that
it's active and ready. But that's failing and both CPU's
eventually lock-up. But I may be wrong here.
> Of course, fiddling with this global variable in this way is _not_ a good
> idea while other CPUs are running and using that variable.
>
> We could also do with implementing trigger_all_cpu_backtrace() to get
> backtraces from the other CPUs when spinlock lockup happens...
Any pointers on the other question about "why we need to enable
interrupts before the CPU is ready?"
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 11:51 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 11:51 UTC (permalink / raw)
To: linux-arm-kernel
On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>> Yes. It's because of interrupt and the CPU active-online
>>>> race.
>>>
>>> I don't see that as a conclusion from this dump.
>>>
>>>> Here is the chash log..
>>>> [ 21.025451] CPU1: Booted secondary processor
>>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>
>>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>>> which is not yet online and not yet active. That's fine, we're just
>>> spinning on the spinlock here, waiting for the other CPUs to release
>>> it.
>>>
>>> But what this is saying is that the other CPUs aren't releasing it.
>>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>>> holding this lock, causing CPU1 to time out on it.
>>>
>>> The other thing is that this is only supposed to trigger after about
>>> one second:
>>>
>>> u64 loops = loops_per_jiffy * HZ;
>>> for (i = 0; i< loops; i++) {
>>> if (arch_spin_trylock(&lock->raw_lock))
>>> return;
>>> __delay(1);
>>> }
>>>
>>> which from the timings you have at the beginning of your printk lines
>>> is clearly not the case - it's more like 61us.
>>>
>>> Are you running with those h/w timer delay patches?
>> Nope.
>
> Ok. So loops_per_jiffy must be too small. My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).
>
I am on V3.0-rc3+(latest mainline) and the above commit is already
part of it.
> The delay calibration code used to start out by setting:
>
> loops_per_jiffy = (1<<12);
>
> This will shorten the delay right down, and that's probably causing these
> false spinlock lockup bug dumps.
>
> Arranging for IRQs to be disabled across the delay calibration just avoids
> the issue by preventing any spinlock being taken.
>
> The reason that CPU#0 also complains about spinlock lockup is that for
> some reason CPU#1 never finishes its calibration, and so the loop also
> times out early on CPU#0.
>
I am not sure but what I think is happening is as soon as interrupts
start firing, as part of IRQ handling, scheduler will try to
enqueue softIRQ thread for newly booted CPU since it sees that
it's active and ready. But that's failing and both CPU's
eventually lock-up. But I may be wrong here.
> Of course, fiddling with this global variable in this way is _not_ a good
> idea while other CPUs are running and using that variable.
>
> We could also do with implementing trigger_all_cpu_backtrace() to get
> backtraces from the other CPUs when spinlock lockup happens...
Any pointers on the other question about "why we need to enable
interrupts before the CPU is ready?"
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 11:51 ` Santosh Shilimkar
@ 2011-06-20 12:19 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 12:19 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>>> Yes. It's because of interrupt and the CPU active-online
>>>>> race.
>>>>
>>>> I don't see that as a conclusion from this dump.
>>>>
>>>>> Here is the chash log..
>>>>> [ 21.025451] CPU1: Booted secondary processor
>>>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>>
>>>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>>>> which is not yet online and not yet active. That's fine, we're just
>>>> spinning on the spinlock here, waiting for the other CPUs to release
>>>> it.
>>>>
>>>> But what this is saying is that the other CPUs aren't releasing it.
>>>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>>>> holding this lock, causing CPU1 to time out on it.
>>>>
>>>> The other thing is that this is only supposed to trigger after about
>>>> one second:
>>>>
>>>> u64 loops = loops_per_jiffy * HZ;
>>>> for (i = 0; i< loops; i++) {
>>>> if (arch_spin_trylock(&lock->raw_lock))
>>>> return;
>>>> __delay(1);
>>>> }
>>>>
>>>> which from the timings you have at the beginning of your printk lines
>>>> is clearly not the case - it's more like 61us.
>>>>
>>>> Are you running with those h/w timer delay patches?
>>> Nope.
>>
>> Ok. So loops_per_jiffy must be too small. My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>>
> I am on V3.0-rc3+(latest mainline) and the above commit is already
> part of it.
>
>> The delay calibration code used to start out by setting:
>>
>> loops_per_jiffy = (1<<12);
>>
>> This will shorten the delay right down, and that's probably causing these
>> false spinlock lockup bug dumps.
>>
>> Arranging for IRQs to be disabled across the delay calibration just avoids
>> the issue by preventing any spinlock being taken.
>>
>> The reason that CPU#0 also complains about spinlock lockup is that for
>> some reason CPU#1 never finishes its calibration, and so the loop also
>> times out early on CPU#0.
>>
> I am not sure but what I think is happening is as soon as interrupts
> start firing, as part of IRQ handling, scheduler will try to
> enqueue softIRQ thread for newly booted CPU since it sees that
> it's active and ready. But that's failing and both CPU's
> eventually lock-up. But I may be wrong here.
Even if that happens, there is NO WAY that the spinlock lockup detector
should report lockup in anything under 1s.
>> Of course, fiddling with this global variable in this way is _not_ a good
>> idea while other CPUs are running and using that variable.
>>
>> We could also do with implementing trigger_all_cpu_backtrace() to get
>> backtraces from the other CPUs when spinlock lockup happens...
>
> Any pointers on the other question about "why we need to enable
> interrupts before the CPU is ready?"
To ensure that things like the delay loop calibration and twd calibration
can run, though that looks like it'll run happily enough with the boot
CPU updating jiffies.
However, I'm still not taking your patch because I believe its just
papering over the real issue, which is not as you describe.
You first need to work out why the spinlock lockup detection is firing
after just 61us rather than the full 1s and fix that.
You then need to work out whether you really do have spinlock lockup,
and if so, why. Implementing trigger_all_cpu_backtrace() may help to
find out what CPU#0 is doing, though we can only do that with IRQs on,
and so would be fragile.
We can test whether CPU#0 is going off to do something else while CPU#1
is being brought up, by adding a preempt_disable() / preempt_enable()
in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
other threads - I suspect you'll still see spinlock lockup on the
xtime seqlock on CPU#1 though. That would suggest a coherency issue.
Finally, how are you provoking this - and what kernel configuration are
you using?
^ permalink raw reply [flat|nested] 67+ messages in thread* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 12:19 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 12:19 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>>> Yes. It's because of interrupt and the CPU active-online
>>>>> race.
>>>>
>>>> I don't see that as a conclusion from this dump.
>>>>
>>>>> Here is the chash log..
>>>>> [ 21.025451] CPU1: Booted secondary processor
>>>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>>
>>>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>>>> which is not yet online and not yet active. That's fine, we're just
>>>> spinning on the spinlock here, waiting for the other CPUs to release
>>>> it.
>>>>
>>>> But what this is saying is that the other CPUs aren't releasing it.
>>>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>>>> holding this lock, causing CPU1 to time out on it.
>>>>
>>>> The other thing is that this is only supposed to trigger after about
>>>> one second:
>>>>
>>>> u64 loops = loops_per_jiffy * HZ;
>>>> for (i = 0; i< loops; i++) {
>>>> if (arch_spin_trylock(&lock->raw_lock))
>>>> return;
>>>> __delay(1);
>>>> }
>>>>
>>>> which from the timings you have at the beginning of your printk lines
>>>> is clearly not the case - it's more like 61us.
>>>>
>>>> Are you running with those h/w timer delay patches?
>>> Nope.
>>
>> Ok. So loops_per_jiffy must be too small. My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>>
> I am on V3.0-rc3+(latest mainline) and the above commit is already
> part of it.
>
>> The delay calibration code used to start out by setting:
>>
>> loops_per_jiffy = (1<<12);
>>
>> This will shorten the delay right down, and that's probably causing these
>> false spinlock lockup bug dumps.
>>
>> Arranging for IRQs to be disabled across the delay calibration just avoids
>> the issue by preventing any spinlock being taken.
>>
>> The reason that CPU#0 also complains about spinlock lockup is that for
>> some reason CPU#1 never finishes its calibration, and so the loop also
>> times out early on CPU#0.
>>
> I am not sure but what I think is happening is as soon as interrupts
> start firing, as part of IRQ handling, scheduler will try to
> enqueue softIRQ thread for newly booted CPU since it sees that
> it's active and ready. But that's failing and both CPU's
> eventually lock-up. But I may be wrong here.
Even if that happens, there is NO WAY that the spinlock lockup detector
should report lockup in anything under 1s.
>> Of course, fiddling with this global variable in this way is _not_ a good
>> idea while other CPUs are running and using that variable.
>>
>> We could also do with implementing trigger_all_cpu_backtrace() to get
>> backtraces from the other CPUs when spinlock lockup happens...
>
> Any pointers on the other question about "why we need to enable
> interrupts before the CPU is ready?"
To ensure that things like the delay loop calibration and twd calibration
can run, though that looks like it'll run happily enough with the boot
CPU updating jiffies.
However, I'm still not taking your patch because I believe its just
papering over the real issue, which is not as you describe.
You first need to work out why the spinlock lockup detection is firing
after just 61us rather than the full 1s and fix that.
You then need to work out whether you really do have spinlock lockup,
and if so, why. Implementing trigger_all_cpu_backtrace() may help to
find out what CPU#0 is doing, though we can only do that with IRQs on,
and so would be fragile.
We can test whether CPU#0 is going off to do something else while CPU#1
is being brought up, by adding a preempt_disable() / preempt_enable()
in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
other threads - I suspect you'll still see spinlock lockup on the
xtime seqlock on CPU#1 though. That would suggest a coherency issue.
Finally, how are you provoking this - and what kernel configuration are
you using?
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 12:19 ` Russell King - ARM Linux
@ 2011-06-20 12:27 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 12:27 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
[...]
>>
>> Any pointers on the other question about "why we need to enable
>> interrupts before the CPU is ready?"
>
> To ensure that things like the delay loop calibration and twd calibration
> can run, though that looks like it'll run happily enough with the boot
> CPU updating jiffies.
>
I guessed it and had same point as above. Calibration will still
work.
> However, I'm still not taking your patch because I believe its just
> papering over the real issue, which is not as you describe.
>
> You first need to work out why the spinlock lockup detection is firing
> after just 61us rather than the full 1s and fix that.
>
This is possibly because of my script which doesn't wait for 1
second.
> You then need to work out whether you really do have spinlock lockup,
> and if so, why. Implementing trigger_all_cpu_backtrace() may help to
> find out what CPU#0 is doing, though we can only do that with IRQs on,
> and so would be fragile.
>
> We can test whether CPU#0 is going off to do something else while CPU#1
> is being brought up, by adding a preempt_disable() / preempt_enable()
> in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
> other threads - I suspect you'll still see spinlock lockup on the
> xtime seqlock on CPU#1 though. That would suggest a coherency issue.
>
> Finally, how are you provoking this - and what kernel configuration are
> you using?
Latest mainline kernel with omap2plus_defconfig and below simple script
to trigger the failure.
-------------
while true
do
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
done
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 12:27 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 12:27 UTC (permalink / raw)
To: linux-arm-kernel
On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
[...]
>>
>> Any pointers on the other question about "why we need to enable
>> interrupts before the CPU is ready?"
>
> To ensure that things like the delay loop calibration and twd calibration
> can run, though that looks like it'll run happily enough with the boot
> CPU updating jiffies.
>
I guessed it and had same point as above. Calibration will still
work.
> However, I'm still not taking your patch because I believe its just
> papering over the real issue, which is not as you describe.
>
> You first need to work out why the spinlock lockup detection is firing
> after just 61us rather than the full 1s and fix that.
>
This is possibly because of my script which doesn't wait for 1
second.
> You then need to work out whether you really do have spinlock lockup,
> and if so, why. Implementing trigger_all_cpu_backtrace() may help to
> find out what CPU#0 is doing, though we can only do that with IRQs on,
> and so would be fragile.
>
> We can test whether CPU#0 is going off to do something else while CPU#1
> is being brought up, by adding a preempt_disable() / preempt_enable()
> in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
> other threads - I suspect you'll still see spinlock lockup on the
> xtime seqlock on CPU#1 though. That would suggest a coherency issue.
>
> Finally, how are you provoking this - and what kernel configuration are
> you using?
Latest mainline kernel with omap2plus_defconfig and below simple script
to trigger the failure.
-------------
while true
do
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
done
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 12:27 ` Santosh Shilimkar
@ 2011-06-20 12:57 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 12:57 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Mon, Jun 20, 2011 at 05:57:01PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>
> [...]
>
>>>
>>> Any pointers on the other question about "why we need to enable
>>> interrupts before the CPU is ready?"
>>
>> To ensure that things like the delay loop calibration and twd calibration
>> can run, though that looks like it'll run happily enough with the boot
>> CPU updating jiffies.
>>
> I guessed it and had same point as above. Calibration will still
> work.
>
>> However, I'm still not taking your patch because I believe its just
>> papering over the real issue, which is not as you describe.
>>
>> You first need to work out why the spinlock lockup detection is firing
>> after just 61us rather than the full 1s and fix that.
>>
> This is possibly because of my script which doesn't wait for 1
> second.
How could a userspace script affect the internal behaviour of
spin_lock() and the spinlock lockup detector?
> Latest mainline kernel with omap2plus_defconfig and below simple script
> to trigger the failure.
>
> -------------
> while true
> do
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> done
Thanks, I'll give it a go here and see if I can debug it further.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 12:57 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 12:57 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 20, 2011 at 05:57:01PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>
> [...]
>
>>>
>>> Any pointers on the other question about "why we need to enable
>>> interrupts before the CPU is ready?"
>>
>> To ensure that things like the delay loop calibration and twd calibration
>> can run, though that looks like it'll run happily enough with the boot
>> CPU updating jiffies.
>>
> I guessed it and had same point as above. Calibration will still
> work.
>
>> However, I'm still not taking your patch because I believe its just
>> papering over the real issue, which is not as you describe.
>>
>> You first need to work out why the spinlock lockup detection is firing
>> after just 61us rather than the full 1s and fix that.
>>
> This is possibly because of my script which doesn't wait for 1
> second.
How could a userspace script affect the internal behaviour of
spin_lock() and the spinlock lockup detector?
> Latest mainline kernel with omap2plus_defconfig and below simple script
> to trigger the failure.
>
> -------------
> while true
> do
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> done
Thanks, I'll give it a go here and see if I can debug it further.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 11:40 ` Russell King - ARM Linux
@ 2011-06-20 14:23 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:23 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
> Ok. So loops_per_jiffy must be too small. My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).
Right, this commit above helps show the problem - and it's fairly subtle.
It's a race condition. Let's first look at the spinlock debugging code.
It does this:
static void __spin_lock_debug(raw_spinlock_t *lock)
{
u64 i;
u64 loops = loops_per_jiffy * HZ;
for (;;) {
for (i = 0; i < loops; i++) {
if (arch_spin_trylock(&lock->raw_lock))
return;
__delay(1);
}
/* print warning */
}
}
If loops_per_jiffy is zero, we never try to grab the spinlock, because
we never enter the inner for loop. We immediately print a warning,
and re-execute the outer loop for ever, resulting in the CPU locking up
in this condition.
In theory, we should never see a zero loops_per_jiffy value, because it
represents the number of loops __delay() needs to delay by one jiffy and
clearly zero makes no sense.
However, calibrate_delay() does this (which x86 and ARM call on secondary
CPU startup):
calibrate_delay()
{
...
if (preset_lpj) {
} else if ((!printed) && lpj_fine) {
} else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
} else {
/* approximation/convergence stuff */
}
}
Now, before 71c696b, this used to be:
} else {
loops_per_jiffy = (1<<12);
So the window between calibrate_delay_direct() returning and setting
loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
was relatively short (maybe even the compiler optimized away the zero
write.)
However, after 71c696b, this now does:
} else {
if (!printed)
pr_info("Calibrating delay loop... ");
+ loops_per_jiffy = calibrate_delay_converge();
So, as loops_per_jiffy is not local to this function, the compiler has
to write out that zero value, before calling calibrate_delay_converge(),
and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
has returned. This opens the window and allows the spinlock debugging
code to explode.
This patch closes the window completely, by only writing to loops_per_jiffy
only when we have a real value for it.
This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
without this it fails with spinlock lockup and rcu problems.
init/calibrate.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/init/calibrate.c b/init/calibrate.c
index 2568d22..aae2f40 100644
--- a/init/calibrate.c
+++ b/init/calibrate.c
@@ -245,30 +245,32 @@ static unsigned long __cpuinit calibrate_delay_converge(void)
void __cpuinit calibrate_delay(void)
{
+ unsigned long lpj;
static bool printed;
if (preset_lpj) {
- loops_per_jiffy = preset_lpj;
+ lpj = preset_lpj;
if (!printed)
pr_info("Calibrating delay loop (skipped) "
"preset value.. ");
} else if ((!printed) && lpj_fine) {
- loops_per_jiffy = lpj_fine;
+ lpj = lpj_fine;
pr_info("Calibrating delay loop (skipped), "
"value calculated using timer frequency.. ");
- } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
+ } else if ((lpj = calibrate_delay_direct()) != 0) {
if (!printed)
pr_info("Calibrating delay using timer "
"specific routine.. ");
} else {
if (!printed)
pr_info("Calibrating delay loop... ");
- loops_per_jiffy = calibrate_delay_converge();
+ lpj = calibrate_delay_converge();
}
if (!printed)
pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n",
- loops_per_jiffy/(500000/HZ),
- (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
+ lpj/(500000/HZ),
+ (lpj/(5000/HZ)) % 100, lpj);
+ loops_per_jiffy = lpj;
printed = true;
}
^ permalink raw reply related [flat|nested] 67+ messages in thread* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 14:23 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:23 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
> Ok. So loops_per_jiffy must be too small. My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).
Right, this commit above helps show the problem - and it's fairly subtle.
It's a race condition. Let's first look at the spinlock debugging code.
It does this:
static void __spin_lock_debug(raw_spinlock_t *lock)
{
u64 i;
u64 loops = loops_per_jiffy * HZ;
for (;;) {
for (i = 0; i < loops; i++) {
if (arch_spin_trylock(&lock->raw_lock))
return;
__delay(1);
}
/* print warning */
}
}
If loops_per_jiffy is zero, we never try to grab the spinlock, because
we never enter the inner for loop. We immediately print a warning,
and re-execute the outer loop for ever, resulting in the CPU locking up
in this condition.
In theory, we should never see a zero loops_per_jiffy value, because it
represents the number of loops __delay() needs to delay by one jiffy and
clearly zero makes no sense.
However, calibrate_delay() does this (which x86 and ARM call on secondary
CPU startup):
calibrate_delay()
{
...
if (preset_lpj) {
} else if ((!printed) && lpj_fine) {
} else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
} else {
/* approximation/convergence stuff */
}
}
Now, before 71c696b, this used to be:
} else {
loops_per_jiffy = (1<<12);
So the window between calibrate_delay_direct() returning and setting
loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
was relatively short (maybe even the compiler optimized away the zero
write.)
However, after 71c696b, this now does:
} else {
if (!printed)
pr_info("Calibrating delay loop... ");
+ loops_per_jiffy = calibrate_delay_converge();
So, as loops_per_jiffy is not local to this function, the compiler has
to write out that zero value, before calling calibrate_delay_converge(),
and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
has returned. This opens the window and allows the spinlock debugging
code to explode.
This patch closes the window completely, by only writing to loops_per_jiffy
only when we have a real value for it.
This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
without this it fails with spinlock lockup and rcu problems.
init/calibrate.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/init/calibrate.c b/init/calibrate.c
index 2568d22..aae2f40 100644
--- a/init/calibrate.c
+++ b/init/calibrate.c
@@ -245,30 +245,32 @@ static unsigned long __cpuinit calibrate_delay_converge(void)
void __cpuinit calibrate_delay(void)
{
+ unsigned long lpj;
static bool printed;
if (preset_lpj) {
- loops_per_jiffy = preset_lpj;
+ lpj = preset_lpj;
if (!printed)
pr_info("Calibrating delay loop (skipped) "
"preset value.. ");
} else if ((!printed) && lpj_fine) {
- loops_per_jiffy = lpj_fine;
+ lpj = lpj_fine;
pr_info("Calibrating delay loop (skipped), "
"value calculated using timer frequency.. ");
- } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
+ } else if ((lpj = calibrate_delay_direct()) != 0) {
if (!printed)
pr_info("Calibrating delay using timer "
"specific routine.. ");
} else {
if (!printed)
pr_info("Calibrating delay loop... ");
- loops_per_jiffy = calibrate_delay_converge();
+ lpj = calibrate_delay_converge();
}
if (!printed)
pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n",
- loops_per_jiffy/(500000/HZ),
- (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
+ lpj/(500000/HZ),
+ (lpj/(5000/HZ)) % 100, lpj);
+ loops_per_jiffy = lpj;
printed = true;
}
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 14:23 ` Russell King - ARM Linux
@ 2011-06-20 14:54 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 14:54 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
>> Ok. So loops_per_jiffy must be too small. My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>
> Right, this commit above helps show the problem - and it's fairly subtle.
>
> It's a race condition. Let's first look at the spinlock debugging code.
> It does this:
>
> static void __spin_lock_debug(raw_spinlock_t *lock)
> {
> u64 i;
> u64 loops = loops_per_jiffy * HZ;
>
> for (;;) {
> for (i = 0; i< loops; i++) {
> if (arch_spin_trylock(&lock->raw_lock))
> return;
> __delay(1);
> }
> /* print warning */
> }
> }
>
> If loops_per_jiffy is zero, we never try to grab the spinlock, because
> we never enter the inner for loop. We immediately print a warning,
> and re-execute the outer loop for ever, resulting in the CPU locking up
> in this condition.
>
> In theory, we should never see a zero loops_per_jiffy value, because it
> represents the number of loops __delay() needs to delay by one jiffy and
> clearly zero makes no sense.
>
> However, calibrate_delay() does this (which x86 and ARM call on secondary
> CPU startup):
>
> calibrate_delay()
> {
> ...
> if (preset_lpj) {
> } else if ((!printed)&& lpj_fine) {
> } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
> } else {
> /* approximation/convergence stuff */
> }
> }
>
> Now, before 71c696b, this used to be:
>
> } else {
> loops_per_jiffy = (1<<12);
>
> So the window between calibrate_delay_direct() returning and setting
> loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
> was relatively short (maybe even the compiler optimized away the zero
> write.)
>
> However, after 71c696b, this now does:
>
> } else {
> if (!printed)
> pr_info("Calibrating delay loop... ");
> + loops_per_jiffy = calibrate_delay_converge();
>
> So, as loops_per_jiffy is not local to this function, the compiler has
> to write out that zero value, before calling calibrate_delay_converge(),
> and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
> has returned. This opens the window and allows the spinlock debugging
> code to explode.
>
> This patch closes the window completely, by only writing to loops_per_jiffy
> only when we have a real value for it.
>
> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
> without this it fails with spinlock lockup and rcu problems.
>
> init/calibrate.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
I am away from my board now. Will test this change.
btw, the online-active race is still open even with this patch close
and should be fixed.
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 14:54 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 14:54 UTC (permalink / raw)
To: linux-arm-kernel
On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
>> Ok. So loops_per_jiffy must be too small. My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>
> Right, this commit above helps show the problem - and it's fairly subtle.
>
> It's a race condition. Let's first look at the spinlock debugging code.
> It does this:
>
> static void __spin_lock_debug(raw_spinlock_t *lock)
> {
> u64 i;
> u64 loops = loops_per_jiffy * HZ;
>
> for (;;) {
> for (i = 0; i< loops; i++) {
> if (arch_spin_trylock(&lock->raw_lock))
> return;
> __delay(1);
> }
> /* print warning */
> }
> }
>
> If loops_per_jiffy is zero, we never try to grab the spinlock, because
> we never enter the inner for loop. We immediately print a warning,
> and re-execute the outer loop for ever, resulting in the CPU locking up
> in this condition.
>
> In theory, we should never see a zero loops_per_jiffy value, because it
> represents the number of loops __delay() needs to delay by one jiffy and
> clearly zero makes no sense.
>
> However, calibrate_delay() does this (which x86 and ARM call on secondary
> CPU startup):
>
> calibrate_delay()
> {
> ...
> if (preset_lpj) {
> } else if ((!printed)&& lpj_fine) {
> } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
> } else {
> /* approximation/convergence stuff */
> }
> }
>
> Now, before 71c696b, this used to be:
>
> } else {
> loops_per_jiffy = (1<<12);
>
> So the window between calibrate_delay_direct() returning and setting
> loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
> was relatively short (maybe even the compiler optimized away the zero
> write.)
>
> However, after 71c696b, this now does:
>
> } else {
> if (!printed)
> pr_info("Calibrating delay loop... ");
> + loops_per_jiffy = calibrate_delay_converge();
>
> So, as loops_per_jiffy is not local to this function, the compiler has
> to write out that zero value, before calling calibrate_delay_converge(),
> and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
> has returned. This opens the window and allows the spinlock debugging
> code to explode.
>
> This patch closes the window completely, by only writing to loops_per_jiffy
> only when we have a real value for it.
>
> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
> without this it fails with spinlock lockup and rcu problems.
>
> init/calibrate.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
I am away from my board now. Will test this change.
btw, the online-active race is still open even with this patch close
and should be fixed.
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 14:54 ` Santosh Shilimkar
@ 2011-06-20 15:01 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 15:01 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote:
> I am away from my board now. Will test this change.
> btw, the online-active race is still open even with this patch close
> and should be fixed.
I have yet to see any evidence of that race - I've been running your
test loop for about an hour so far on Versatile Express and nothing
yet.
That's not to say that we shouldn't wait for the active mask to become
true before calling schedule(), but I don't think its as big a deal as
you're suggesting it is.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 15:01 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 15:01 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote:
> I am away from my board now. Will test this change.
> btw, the online-active race is still open even with this patch close
> and should be fixed.
I have yet to see any evidence of that race - I've been running your
test loop for about an hour so far on Versatile Express and nothing
yet.
That's not to say that we shouldn't wait for the active mask to become
true before calling schedule(), but I don't think its as big a deal as
you're suggesting it is.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 15:01 ` Russell King - ARM Linux
@ 2011-06-20 15:10 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 15:10 UTC (permalink / raw)
To: Russell King - ARM Linux, Thomas Gleixner
Cc: Peter Zijlstra, linux-omap, linux-kernel, linux-arm-kernel
On 6/20/2011 8:31 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote:
>> I am away from my board now. Will test this change.
>> btw, the online-active race is still open even with this patch close
>> and should be fixed.
>
> I have yet to see any evidence of that race - I've been running your
> test loop for about an hour so far on Versatile Express and nothing
> yet.
>
In that case my script was just exposing the calibrate() code race
condition.
> That's not to say that we shouldn't wait for the active mask to become
> true before calling schedule(), but I don't think its as big a deal as
> you're suggesting it is.
I am not expert to really trigger that exact race online-to-active but
am sure the race needs to be fixed.
May be Thomas can suggest a test-case to expose that race. From
his fix for x86, it appeared that the race was indeed hit in some
sequence.
Regards,
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-20 15:10 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-20 15:10 UTC (permalink / raw)
To: linux-arm-kernel
On 6/20/2011 8:31 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote:
>> I am away from my board now. Will test this change.
>> btw, the online-active race is still open even with this patch close
>> and should be fixed.
>
> I have yet to see any evidence of that race - I've been running your
> test loop for about an hour so far on Versatile Express and nothing
> yet.
>
In that case my script was just exposing the calibrate() code race
condition.
> That's not to say that we shouldn't wait for the active mask to become
> true before calling schedule(), but I don't think its as big a deal as
> you're suggesting it is.
I am not expert to really trigger that exact race online-to-active but
am sure the race needs to be fixed.
May be Thomas can suggest a test-case to expose that race. From
his fix for x86, it appeared that the race was indeed hit in some
sequence.
Regards,
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-20 14:54 ` Santosh Shilimkar
@ 2011-06-21 9:08 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-21 9:08 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
Russell,
On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
[....]
>> So, as loops_per_jiffy is not local to this function, the compiler has
>> to write out that zero value, before calling calibrate_delay_converge(),
>> and loops_per_jiffy only becomes non-zero _after_
>> calibrate_delay_converge()
>> has returned. This opens the window and allows the spinlock debugging
>> code to explode.
>>
>> This patch closes the window completely, by only writing to
>> loops_per_jiffy
>> only when we have a real value for it.
>>
>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>> without this it fails with spinlock lockup and rcu problems.
>>
>> init/calibrate.c | 14 ++++++++------
>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>
> I am away from my board now. Will test this change.
Have tested your change and it seems to fix the crash I
was observing. Are you planning to send this fix for rc5?
> btw, the online-active race is still open even with this patch close
> and should be fixed.
>
The only problem remains is waiting for active mask before
marking CPU online. Shall I refresh my patch with only
this change then ?
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 9:08 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-21 9:08 UTC (permalink / raw)
To: linux-arm-kernel
Russell,
On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
[....]
>> So, as loops_per_jiffy is not local to this function, the compiler has
>> to write out that zero value, before calling calibrate_delay_converge(),
>> and loops_per_jiffy only becomes non-zero _after_
>> calibrate_delay_converge()
>> has returned. This opens the window and allows the spinlock debugging
>> code to explode.
>>
>> This patch closes the window completely, by only writing to
>> loops_per_jiffy
>> only when we have a real value for it.
>>
>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>> without this it fails with spinlock lockup and rcu problems.
>>
>> init/calibrate.c | 14 ++++++++------
>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>
> I am away from my board now. Will test this change.
Have tested your change and it seems to fix the crash I
was observing. Are you planning to send this fix for rc5?
> btw, the online-active race is still open even with this patch close
> and should be fixed.
>
The only problem remains is waiting for active mask before
marking CPU online. Shall I refresh my patch with only
this change then ?
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 9:08 ` Santosh Shilimkar
@ 2011-06-21 10:00 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 10:00 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
> Russell,
>
> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>> to write out that zero value, before calling calibrate_delay_converge(),
>>> and loops_per_jiffy only becomes non-zero _after_
>>> calibrate_delay_converge()
>>> has returned. This opens the window and allows the spinlock debugging
>>> code to explode.
>>>
>>> This patch closes the window completely, by only writing to
>>> loops_per_jiffy
>>> only when we have a real value for it.
>>>
>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>> without this it fails with spinlock lockup and rcu problems.
>>>
>>> init/calibrate.c | 14 ++++++++------
>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>
>> I am away from my board now. Will test this change.
> Have tested your change and it seems to fix the crash I
> was observing. Are you planning to send this fix for rc5?
Yes. I think sending CPUs into infinite loops in the spinlock code is
definitely sufficiently serious that it needs to go to Linus ASAP.
It'd be nice to have a tested-by line though.
>> btw, the online-active race is still open even with this patch close
>> and should be fixed.
>>
> The only problem remains is waiting for active mask before
> marking CPU online. Shall I refresh my patch with only
> this change then ?
I already have that as a separate change.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 10:00 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 10:00 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
> Russell,
>
> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>> to write out that zero value, before calling calibrate_delay_converge(),
>>> and loops_per_jiffy only becomes non-zero _after_
>>> calibrate_delay_converge()
>>> has returned. This opens the window and allows the spinlock debugging
>>> code to explode.
>>>
>>> This patch closes the window completely, by only writing to
>>> loops_per_jiffy
>>> only when we have a real value for it.
>>>
>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>> without this it fails with spinlock lockup and rcu problems.
>>>
>>> init/calibrate.c | 14 ++++++++------
>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>
>> I am away from my board now. Will test this change.
> Have tested your change and it seems to fix the crash I
> was observing. Are you planning to send this fix for rc5?
Yes. I think sending CPUs into infinite loops in the spinlock code is
definitely sufficiently serious that it needs to go to Linus ASAP.
It'd be nice to have a tested-by line though.
>> btw, the online-active race is still open even with this patch close
>> and should be fixed.
>>
> The only problem remains is waiting for active mask before
> marking CPU online. Shall I refresh my patch with only
> this change then ?
I already have that as a separate change.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 10:00 ` Russell King - ARM Linux
@ 2011-06-21 10:17 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-21 10:17 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
>> Russell,
>>
>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>>> to write out that zero value, before calling calibrate_delay_converge(),
>>>> and loops_per_jiffy only becomes non-zero _after_
>>>> calibrate_delay_converge()
>>>> has returned. This opens the window and allows the spinlock debugging
>>>> code to explode.
>>>>
>>>> This patch closes the window completely, by only writing to
>>>> loops_per_jiffy
>>>> only when we have a real value for it.
>>>>
>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>>> without this it fails with spinlock lockup and rcu problems.
>>>>
>>>> init/calibrate.c | 14 ++++++++------
>>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>>
>>> I am away from my board now. Will test this change.
>> Have tested your change and it seems to fix the crash I
>> was observing. Are you planning to send this fix for rc5?
>
> Yes. I think sending CPUs into infinite loops in the spinlock code is
> definitely sufficiently serious that it needs to go to Linus ASAP.
> It'd be nice to have a tested-by line though.
>
Sure.
>>> btw, the online-active race is still open even with this patch close
>>> and should be fixed.
>>>
>> The only problem remains is waiting for active mask before
>> marking CPU online. Shall I refresh my patch with only
>> this change then ?
>
> I already have that as a separate change.
Can you point me to both of these commits so that I have
them in my tree for testing.
Thanks for help.
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 10:17 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-21 10:17 UTC (permalink / raw)
To: linux-arm-kernel
On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
>> Russell,
>>
>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>>> to write out that zero value, before calling calibrate_delay_converge(),
>>>> and loops_per_jiffy only becomes non-zero _after_
>>>> calibrate_delay_converge()
>>>> has returned. This opens the window and allows the spinlock debugging
>>>> code to explode.
>>>>
>>>> This patch closes the window completely, by only writing to
>>>> loops_per_jiffy
>>>> only when we have a real value for it.
>>>>
>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>>> without this it fails with spinlock lockup and rcu problems.
>>>>
>>>> init/calibrate.c | 14 ++++++++------
>>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>>
>>> I am away from my board now. Will test this change.
>> Have tested your change and it seems to fix the crash I
>> was observing. Are you planning to send this fix for rc5?
>
> Yes. I think sending CPUs into infinite loops in the spinlock code is
> definitely sufficiently serious that it needs to go to Linus ASAP.
> It'd be nice to have a tested-by line though.
>
Sure.
>>> btw, the online-active race is still open even with this patch close
>>> and should be fixed.
>>>
>> The only problem remains is waiting for active mask before
>> marking CPU online. Shall I refresh my patch with only
>> this change then ?
>
> I already have that as a separate change.
Can you point me to both of these commits so that I have
them in my tree for testing.
Thanks for help.
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 10:17 ` Santosh Shilimkar
@ 2011-06-21 10:19 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 10:19 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
> On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
>>> Russell,
>>>
>>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>>>> to write out that zero value, before calling calibrate_delay_converge(),
>>>>> and loops_per_jiffy only becomes non-zero _after_
>>>>> calibrate_delay_converge()
>>>>> has returned. This opens the window and allows the spinlock debugging
>>>>> code to explode.
>>>>>
>>>>> This patch closes the window completely, by only writing to
>>>>> loops_per_jiffy
>>>>> only when we have a real value for it.
>>>>>
>>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>>>> without this it fails with spinlock lockup and rcu problems.
>>>>>
>>>>> init/calibrate.c | 14 ++++++++------
>>>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>> I am away from my board now. Will test this change.
>>> Have tested your change and it seems to fix the crash I
>>> was observing. Are you planning to send this fix for rc5?
>>
>> Yes. I think sending CPUs into infinite loops in the spinlock code is
>> definitely sufficiently serious that it needs to go to Linus ASAP.
>> It'd be nice to have a tested-by line though.
>>
> Sure.
>
>>>> btw, the online-active race is still open even with this patch close
>>>> and should be fixed.
>>>>
>>> The only problem remains is waiting for active mask before
>>> marking CPU online. Shall I refresh my patch with only
>>> this change then ?
>>
>> I already have that as a separate change.
> Can you point me to both of these commits so that I have
> them in my tree for testing.
I won't be committing the init/calibrate.c change to a git tree - it
isn't ARM stuff so it goes in patch form.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 10:19 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 10:19 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
> On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
>>> Russell,
>>>
>>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>>>> to write out that zero value, before calling calibrate_delay_converge(),
>>>>> and loops_per_jiffy only becomes non-zero _after_
>>>>> calibrate_delay_converge()
>>>>> has returned. This opens the window and allows the spinlock debugging
>>>>> code to explode.
>>>>>
>>>>> This patch closes the window completely, by only writing to
>>>>> loops_per_jiffy
>>>>> only when we have a real value for it.
>>>>>
>>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>>>> without this it fails with spinlock lockup and rcu problems.
>>>>>
>>>>> init/calibrate.c | 14 ++++++++------
>>>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>> I am away from my board now. Will test this change.
>>> Have tested your change and it seems to fix the crash I
>>> was observing. Are you planning to send this fix for rc5?
>>
>> Yes. I think sending CPUs into infinite loops in the spinlock code is
>> definitely sufficiently serious that it needs to go to Linus ASAP.
>> It'd be nice to have a tested-by line though.
>>
> Sure.
>
>>>> btw, the online-active race is still open even with this patch close
>>>> and should be fixed.
>>>>
>>> The only problem remains is waiting for active mask before
>>> marking CPU online. Shall I refresh my patch with only
>>> this change then ?
>>
>> I already have that as a separate change.
> Can you point me to both of these commits so that I have
> them in my tree for testing.
I won't be committing the init/calibrate.c change to a git tree - it
isn't ARM stuff so it goes in patch form.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 10:19 ` Russell King - ARM Linux
@ 2011-06-21 10:21 ` Santosh Shilimkar
-1 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-21 10:21 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
[...]
>>> I already have that as a separate change.
>> Can you point me to both of these commits so that I have
>> them in my tree for testing.
>
> I won't be committing the init/calibrate.c change to a git tree - it
> isn't ARM stuff so it goes in patch form.
Patches with change log would be fine as well.
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 10:21 ` Santosh Shilimkar
0 siblings, 0 replies; 67+ messages in thread
From: Santosh Shilimkar @ 2011-06-21 10:21 UTC (permalink / raw)
To: linux-arm-kernel
On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
[...]
>>> I already have that as a separate change.
>> Can you point me to both of these commits so that I have
>> them in my tree for testing.
>
> I won't be committing the init/calibrate.c change to a git tree - it
> isn't ARM stuff so it goes in patch form.
Patches with change log would be fine as well.
Regards
Santosh
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 10:21 ` Santosh Shilimkar
@ 2011-06-21 10:26 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 10:26 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Peter Zijlstra, Thomas Gleixner, linux-omap, linux-kernel,
linux-arm-kernel
On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
>
> [...]
>
>>>> I already have that as a separate change.
>>> Can you point me to both of these commits so that I have
>>> them in my tree for testing.
>>
>> I won't be committing the init/calibrate.c change to a git tree - it
>> isn't ARM stuff so it goes in patch form.
>
> Patches with change log would be fine as well.
The answer is not at the moment, but maybe soon.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 10:26 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 10:26 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
>
> [...]
>
>>>> I already have that as a separate change.
>>> Can you point me to both of these commits so that I have
>>> them in my tree for testing.
>>
>> I won't be committing the init/calibrate.c change to a git tree - it
>> isn't ARM stuff so it goes in patch form.
>
> Patches with change log would be fine as well.
The answer is not at the moment, but maybe soon.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 10:26 ` Russell King - ARM Linux
(?)
@ 2011-06-21 20:16 ` Stephen Boyd
-1 siblings, 0 replies; 67+ messages in thread
From: Stephen Boyd @ 2011-06-21 20:16 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, linux-kernel, Santosh Shilimkar, Thomas Gleixner,
linux-omap, linux-arm-kernel
On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>> I won't be committing the init/calibrate.c change to a git tree - it
>>> isn't ARM stuff so it goes in patch form.
>> Patches with change log would be fine as well.
> The answer is not at the moment, but maybe soon.
Should we send those two patches to the stable trees as well? They seem
to fix issues with cpu onlining that have existed for a long time.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 20:16 ` Stephen Boyd
0 siblings, 0 replies; 67+ messages in thread
From: Stephen Boyd @ 2011-06-21 20:16 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Santosh Shilimkar, Peter Zijlstra, Thomas Gleixner, linux-omap,
linux-kernel, linux-arm-kernel
On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>> I won't be committing the init/calibrate.c change to a git tree - it
>>> isn't ARM stuff so it goes in patch form.
>> Patches with change log would be fine as well.
> The answer is not at the moment, but maybe soon.
Should we send those two patches to the stable trees as well? They seem
to fix issues with cpu onlining that have existed for a long time.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 20:16 ` Stephen Boyd
0 siblings, 0 replies; 67+ messages in thread
From: Stephen Boyd @ 2011-06-21 20:16 UTC (permalink / raw)
To: linux-arm-kernel
On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>> I won't be committing the init/calibrate.c change to a git tree - it
>>> isn't ARM stuff so it goes in patch form.
>> Patches with change log would be fine as well.
> The answer is not at the moment, but maybe soon.
Should we send those two patches to the stable trees as well? They seem
to fix issues with cpu onlining that have existed for a long time.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 20:16 ` Stephen Boyd
@ 2011-06-21 23:10 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 23:10 UTC (permalink / raw)
To: Stephen Boyd
Cc: Santosh Shilimkar, Peter Zijlstra, Thomas Gleixner, linux-omap,
linux-kernel, linux-arm-kernel
On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> > On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> >> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> >>> I won't be committing the init/calibrate.c change to a git tree - it
> >>> isn't ARM stuff so it goes in patch form.
> >> Patches with change log would be fine as well.
> > The answer is not at the moment, but maybe soon.
>
> Should we send those two patches to the stable trees as well? They seem
> to fix issues with cpu onlining that have existed for a long time.
Looks to me like the problem was introduced for 2.6.39-rc1, so we
should probably get the fix into the 2.6.39-stable tree too.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-21 23:10 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-21 23:10 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> > On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> >> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> >>> I won't be committing the init/calibrate.c change to a git tree - it
> >>> isn't ARM stuff so it goes in patch form.
> >> Patches with change log would be fine as well.
> > The answer is not at the moment, but maybe soon.
>
> Should we send those two patches to the stable trees as well? They seem
> to fix issues with cpu onlining that have existed for a long time.
Looks to me like the problem was introduced for 2.6.39-rc1, so we
should probably get the fix into the 2.6.39-stable tree too.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-21 23:10 ` Russell King - ARM Linux
(?)
@ 2011-06-22 0:06 ` Stephen Boyd
-1 siblings, 0 replies; 67+ messages in thread
From: Stephen Boyd @ 2011-06-22 0:06 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Peter Zijlstra, linux-kernel, Santosh Shilimkar, Thomas Gleixner,
linux-omap, linux-arm-kernel
On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
>> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
>>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>>>> I won't be committing the init/calibrate.c change to a git tree - it
>>>>> isn't ARM stuff so it goes in patch form.
>>>> Patches with change log would be fine as well.
>>> The answer is not at the moment, but maybe soon.
>> Should we send those two patches to the stable trees as well? They seem
>> to fix issues with cpu onlining that have existed for a long time.
> Looks to me like the problem was introduced for 2.6.39-rc1, so we
> should probably get the fix into the 2.6.39-stable tree too.
Are we talking about the loops_per_jiffy problem or the cpu_active
problem? I would think the cpu_active problem has been there since SMP
support was added to ARM and the loops_per_jiffy problem has been there
(depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
their interferance with tsc based delay calibration, 2005-06-23).
So pretty much every stable tree would want both of these patches.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-22 0:06 ` Stephen Boyd
0 siblings, 0 replies; 67+ messages in thread
From: Stephen Boyd @ 2011-06-22 0:06 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Santosh Shilimkar, Peter Zijlstra, Thomas Gleixner, linux-omap,
linux-kernel, linux-arm-kernel
On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
>> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
>>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>>>> I won't be committing the init/calibrate.c change to a git tree - it
>>>>> isn't ARM stuff so it goes in patch form.
>>>> Patches with change log would be fine as well.
>>> The answer is not at the moment, but maybe soon.
>> Should we send those two patches to the stable trees as well? They seem
>> to fix issues with cpu onlining that have existed for a long time.
> Looks to me like the problem was introduced for 2.6.39-rc1, so we
> should probably get the fix into the 2.6.39-stable tree too.
Are we talking about the loops_per_jiffy problem or the cpu_active
problem? I would think the cpu_active problem has been there since SMP
support was added to ARM and the loops_per_jiffy problem has been there
(depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
their interferance with tsc based delay calibration, 2005-06-23).
So pretty much every stable tree would want both of these patches.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-22 0:06 ` Stephen Boyd
0 siblings, 0 replies; 67+ messages in thread
From: Stephen Boyd @ 2011-06-22 0:06 UTC (permalink / raw)
To: linux-arm-kernel
On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
>> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
>>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>>>> I won't be committing the init/calibrate.c change to a git tree - it
>>>>> isn't ARM stuff so it goes in patch form.
>>>> Patches with change log would be fine as well.
>>> The answer is not at the moment, but maybe soon.
>> Should we send those two patches to the stable trees as well? They seem
>> to fix issues with cpu onlining that have existed for a long time.
> Looks to me like the problem was introduced for 2.6.39-rc1, so we
> should probably get the fix into the 2.6.39-stable tree too.
Are we talking about the loops_per_jiffy problem or the cpu_active
problem? I would think the cpu_active problem has been there since SMP
support was added to ARM and the loops_per_jiffy problem has been there
(depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
their interferance with tsc based delay calibration, 2005-06-23).
So pretty much every stable tree would want both of these patches.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
2011-06-22 0:06 ` Stephen Boyd
@ 2011-06-22 10:06 ` Russell King - ARM Linux
-1 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-22 10:06 UTC (permalink / raw)
To: Stephen Boyd
Cc: Santosh Shilimkar, Peter Zijlstra, Thomas Gleixner, linux-omap,
linux-kernel, linux-arm-kernel
On Tue, Jun 21, 2011 at 05:06:58PM -0700, Stephen Boyd wrote:
> On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> > On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
> >> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> >>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> >>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> >>>>> I won't be committing the init/calibrate.c change to a git tree - it
> >>>>> isn't ARM stuff so it goes in patch form.
> >>>> Patches with change log would be fine as well.
> >>> The answer is not at the moment, but maybe soon.
> >> Should we send those two patches to the stable trees as well? They seem
> >> to fix issues with cpu onlining that have existed for a long time.
> > Looks to me like the problem was introduced for 2.6.39-rc1, so we
> > should probably get the fix into the 2.6.39-stable tree too.
>
> Are we talking about the loops_per_jiffy problem or the cpu_active
> problem? I would think the cpu_active problem has been there since SMP
> support was added to ARM and the loops_per_jiffy problem has been there
> (depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
> their interferance with tsc based delay calibration, 2005-06-23).
The cpu_active problem hasn't actually caused any symptoms on ARM, so
it's low priority. It's only a problem which should be sorted in
-stable _if_ someone reports that it has caused a problem. Up until
Santosh's patch, no one has done so, and I've not seen any problems
on any of my ARM SMP platforms coming from it.
As for the loops_per_jiffy, it isn't a problem before the commit ID
I pointed out - I've checked the assembly, and the compiler optimizes
away the initialization of loops_per_jiffy to zero - the first write
is when its set to (1<<12). Take a moment to think about this:
if ((loops_per_jiffy = 0) == 0) {
} else {
loops_per_jiffy = 1<<12;
...
}
Any compiler worth talking about is going to optimize away the initial
constant write to loops_per_jiffy there provided loops_per_jiffy is not
volatile.
So, although its not desirable for older kernels to have their lpj
overwritten in this way, it doesn't cause the spinlock debugging code
to explode.
This can be shown to be correct because there hasn't been any problem
with ARM secondary CPU bringup until recently.
Plus, the previous version of the code requires significant changes to
sort the problem out.
So, the lpj patch will only sensibly apply to 2.6.39-rc1 and later,
and so it's only going to be submitted for 2.6.39-stable. Previous
kernels, the risks of changing it outweighs by several orders of
magnitude any benefit coming from the change.
^ permalink raw reply [flat|nested] 67+ messages in thread* [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
@ 2011-06-22 10:06 ` Russell King - ARM Linux
0 siblings, 0 replies; 67+ messages in thread
From: Russell King - ARM Linux @ 2011-06-22 10:06 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jun 21, 2011 at 05:06:58PM -0700, Stephen Boyd wrote:
> On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> > On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
> >> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> >>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> >>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> >>>>> I won't be committing the init/calibrate.c change to a git tree - it
> >>>>> isn't ARM stuff so it goes in patch form.
> >>>> Patches with change log would be fine as well.
> >>> The answer is not at the moment, but maybe soon.
> >> Should we send those two patches to the stable trees as well? They seem
> >> to fix issues with cpu onlining that have existed for a long time.
> > Looks to me like the problem was introduced for 2.6.39-rc1, so we
> > should probably get the fix into the 2.6.39-stable tree too.
>
> Are we talking about the loops_per_jiffy problem or the cpu_active
> problem? I would think the cpu_active problem has been there since SMP
> support was added to ARM and the loops_per_jiffy problem has been there
> (depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
> their interferance with tsc based delay calibration, 2005-06-23).
The cpu_active problem hasn't actually caused any symptoms on ARM, so
it's low priority. It's only a problem which should be sorted in
-stable _if_ someone reports that it has caused a problem. Up until
Santosh's patch, no one has done so, and I've not seen any problems
on any of my ARM SMP platforms coming from it.
As for the loops_per_jiffy, it isn't a problem before the commit ID
I pointed out - I've checked the assembly, and the compiler optimizes
away the initialization of loops_per_jiffy to zero - the first write
is when its set to (1<<12). Take a moment to think about this:
if ((loops_per_jiffy = 0) == 0) {
} else {
loops_per_jiffy = 1<<12;
...
}
Any compiler worth talking about is going to optimize away the initial
constant write to loops_per_jiffy there provided loops_per_jiffy is not
volatile.
So, although its not desirable for older kernels to have their lpj
overwritten in this way, it doesn't cause the spinlock debugging code
to explode.
This can be shown to be correct because there hasn't been any problem
with ARM secondary CPU bringup until recently.
Plus, the previous version of the code requires significant changes to
sort the problem out.
So, the lpj patch will only sensibly apply to 2.6.39-rc1 and later,
and so it's only going to be submitted for 2.6.39-stable. Previous
kernels, the risks of changing it outweighs by several orders of
magnitude any benefit coming from the change.
^ permalink raw reply [flat|nested] 67+ messages in thread