2.6.32.21 - uptime related crashes?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.32.21 - uptime related crashes?
@ 2011-04-28  8:26 Nikola Ciprich
  2011-04-28 18:34 ` [stable] " Willy Tarreau
  0 siblings, 1 reply; 58+ messages in thread
From: Nikola Ciprich @ 2011-04-28  8:26 UTC (permalink / raw)
  To: linux-kernel mlist; +Cc: linux-stable mlist

[-- Attachment #1: Type: text/plain, Size: 3798 bytes --]

Hello everybody,

I'm trying to solve strange issue, today, my fourth machine running 2.6.32.21 just crashed. What makes the cases similar, apart fromn same kernel version is that all boxes had very similar uptimes: 214, 216, 216, and 224 days. This might just be a coincidence, but I think this might be important.

Unfortunately I only have backtraces of two crashes (and those are trimmed, sorry), and they do not look as similar as I'd like, but still maybe there is something in common:

[<ffffffff81120cc7>] pollwake+0x57/0x60 
[<ffffffff81046720>] ? default_wake_function+0x0/0x10 
[<ffffffff8103683a>] __wake_up_common+0x5a/0x90 
[<ffffffff8103a313>] __wake_up+0x43/0x70 
[<ffffffffa0321573>] process_masterspan+0x643/0x670 [dahdi] 
[<ffffffffa0326595>] coretimer_func+0x135/0x1d0 [dahdi] 
[<ffffffff8105d74d>] run_timer_softirq+0x15d/0x320 
[<ffffffffa0326460>] ? coretimer_func+0x0/0x1d0 [dahdi] 
[<ffffffff8105690c>] __do_softirq+0xcc/0x220 
[<ffffffff8100c40c>] call_softirq+0x1c/0x30 
[<ffffffff8100e3ba>] do_softirq+0x4a/0x80 
[<ffffffff810567c7>] irq_exit+0x87/0x90 
[<ffffffff8100d7b7>] do_IRQ+0x77/0xf0 
[<ffffffff8100bc53>] ret_from_intr+0x0/Oxa 
<EUI> [<ffffffffa019e556>] ? acpi_idle_enter_bm+0x273/0x2a1 [processor] 
[<ffffffffa019e54c>] ? acpi_idle_enter_bm+0x269/0x2a1 [processor] 
[<ffffffff81280095>] ? cpuidle_idle_call+0xa5/0x150 
[<ffffffff8100a18f>] ? cpu_idle+0x4f/0x90 
[<ffffffff81323c95>] ? rest_init+0x75/0x80 
[<ffffffff81582d7f>] ? start_kernel+0x2ef/0x390 
[<ffffffff81582271>] ? x86_64_start_reservations+0x81/0xc0 
[<ffffffff81582386>] ? x86_64_start_kernel+0xd6/0x100 

this box (actually two of the crashed ones) is using dahdi_dummy module to generate timing for asterisk SW pbx, so maybe it's related to it.


[<ffffffff810a5063>] handle_IRQ_event+0x63/0x1c0
[<ffffffff810a71ae>] handle_edge_irq+0xce/0x160
[<ffffffff8100e1bf>] handle_irq+0x1f/0x30                                                                                                                                              
[<ffffffff8100d7ae>] do_IRQ+0x6e/0xf0
[<ffffffff8100bc53>] ret_from_intr+0x0/Oxa
<EUI> [<ffffffff8133?f?f>] ? _spin_un1ock_irq+0xf/0x40
[<ffffffff81337f79>] ? _spin_un1ock_irq+0x9/0x40
[<ffffffff81064b9a>] ? exit_signals+0x8a/0x130
[<ffffffff8105372e>] ? do_exit+0x7e/0x7d0
[<ffffffff8100f8a7>] ? oops_end+0xa7/0xb0
[<ffffffff8100faa6>] ? die+0x56/0x90
[<ffffffff8100c810>] ? do_trap+0x130/0x150
[<ffffffff8100ccca>] ? do_divide_error+0x8a/0xa0
[<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
[<ffffffff8104400b>] ? cpuacct_charge+0x6b/0x90
[<ffffffff8100c045>] ? divide_error+0x15/0x20
[<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
[<ffffffff8103cfff>] ? find_busiest_group+0x1af/0xa00
[<ffffffff81335483>] ? thread_return+0x4ce/0x7bb
[<ffffffff8133bec5>] ? do_nanosleep+0x75/0x30
[<ffffffff810?1?4e>] ? hrtimer_nanosleep+0x9e/0x120
[<ffffffff810?08f0>] ? hrtimer_wakeup+0x0/0x30
[<ffffffff810?183f>] ? sys_nanosleep+0x6f/0x80

another two don't use it. only similarity I see here is that it seems to be IRQ handling related, but both issues don't have anything in common.
Does anybody have an idea on where should I look? Of course I should update all those boxes to (at least) latest 2.6.32.x, and I'll do it for sure, but still I'd first like to know where the problem was, and if it has been fixed, or how to fix it...
I'd be gratefull for any help...
BR
nik


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-28  8:26 2.6.32.21 - uptime related crashes? Nikola Ciprich
@ 2011-04-28 18:34 ` Willy Tarreau
  2011-04-29 10:02   ` Nikola Ciprich
  2011-05-13 22:08   ` Nicolas Carlier
  0 siblings, 2 replies; 58+ messages in thread
From: Willy Tarreau @ 2011-04-28 18:34 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: linux-kernel mlist, linux-stable mlist, Hervé Commowick

Hello Nikola,

On Thu, Apr 28, 2011 at 10:26:25AM +0200, Nikola Ciprich wrote:
> Hello everybody,
> 
> I'm trying to solve strange issue, today, my fourth machine running 2.6.32.21 just crashed. What makes the cases similar, apart fromn same kernel version is that all boxes had very similar uptimes: 214, 216, 216, and 224 days. This might just be a coincidence, but I think this might be important.

Interestingly, one of our customers just had two machines who crashed
yesterday after 212 days and 212+20h respectively. They were running
debian's 2.6.32-bpo.5-amd64 which is based on 2.6.32.23 AIUI.

The crash looks very similar to the following bug which we have updated :

   https://bugzilla.kernel.org/show_bug.cgi?id=16991

(bugzilla doesn't appear to respond as I'm posting this mail).

The top of your ouput is missing. In our case as in the reports on the bug
above, there was a divide by zero error. Did you happen to spot this one
too, or do you just not know ? I observe "divide_error+0x15/0x20" in one
of your reports, so it's possible that it matches the same pattern at least
for one trace. Just in case, it would be nice to feed the bugzilla entry
above.

> Unfortunately I only have backtraces of two crashes (and those are trimmed, sorry), and they do not look as similar as I'd like, but still maybe there is something in common:
> 
> [<ffffffff81120cc7>] pollwake+0x57/0x60 
> [<ffffffff81046720>] ? default_wake_function+0x0/0x10 
> [<ffffffff8103683a>] __wake_up_common+0x5a/0x90 
> [<ffffffff8103a313>] __wake_up+0x43/0x70 
> [<ffffffffa0321573>] process_masterspan+0x643/0x670 [dahdi] 
> [<ffffffffa0326595>] coretimer_func+0x135/0x1d0 [dahdi] 
> [<ffffffff8105d74d>] run_timer_softirq+0x15d/0x320 
> [<ffffffffa0326460>] ? coretimer_func+0x0/0x1d0 [dahdi] 
> [<ffffffff8105690c>] __do_softirq+0xcc/0x220 
> [<ffffffff8100c40c>] call_softirq+0x1c/0x30 
> [<ffffffff8100e3ba>] do_softirq+0x4a/0x80 
> [<ffffffff810567c7>] irq_exit+0x87/0x90 
> [<ffffffff8100d7b7>] do_IRQ+0x77/0xf0 
> [<ffffffff8100bc53>] ret_from_intr+0x0/Oxa 
> <EUI> [<ffffffffa019e556>] ? acpi_idle_enter_bm+0x273/0x2a1 [processor] 
> [<ffffffffa019e54c>] ? acpi_idle_enter_bm+0x269/0x2a1 [processor] 
> [<ffffffff81280095>] ? cpuidle_idle_call+0xa5/0x150 
> [<ffffffff8100a18f>] ? cpu_idle+0x4f/0x90 
> [<ffffffff81323c95>] ? rest_init+0x75/0x80 
> [<ffffffff81582d7f>] ? start_kernel+0x2ef/0x390 
> [<ffffffff81582271>] ? x86_64_start_reservations+0x81/0xc0 
> [<ffffffff81582386>] ? x86_64_start_kernel+0xd6/0x100 
> 
> this box (actually two of the crashed ones) is using dahdi_dummy module to generate timing for asterisk SW pbx, so maybe it's related to it.
> 
> 
> [<ffffffff810a5063>] handle_IRQ_event+0x63/0x1c0
> [<ffffffff810a71ae>] handle_edge_irq+0xce/0x160
> [<ffffffff8100e1bf>] handle_irq+0x1f/0x30                                                                                                                                              
> [<ffffffff8100d7ae>] do_IRQ+0x6e/0xf0
> [<ffffffff8100bc53>] ret_from_intr+0x0/Oxa
> <EUI> [<ffffffff8133?f?f>] ? _spin_un1ock_irq+0xf/0x40
> [<ffffffff81337f79>] ? _spin_un1ock_irq+0x9/0x40
> [<ffffffff81064b9a>] ? exit_signals+0x8a/0x130
> [<ffffffff8105372e>] ? do_exit+0x7e/0x7d0
> [<ffffffff8100f8a7>] ? oops_end+0xa7/0xb0
> [<ffffffff8100faa6>] ? die+0x56/0x90
> [<ffffffff8100c810>] ? do_trap+0x130/0x150
> [<ffffffff8100ccca>] ? do_divide_error+0x8a/0xa0
> [<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
> [<ffffffff8104400b>] ? cpuacct_charge+0x6b/0x90
> [<ffffffff8100c045>] ? divide_error+0x15/0x20
> [<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
> [<ffffffff8103cfff>] ? find_busiest_group+0x1af/0xa00
> [<ffffffff81335483>] ? thread_return+0x4ce/0x7bb
> [<ffffffff8133bec5>] ? do_nanosleep+0x75/0x30
> [<ffffffff810?1?4e>] ? hrtimer_nanosleep+0x9e/0x120
> [<ffffffff810?08f0>] ? hrtimer_wakeup+0x0/0x30
> [<ffffffff810?183f>] ? sys_nanosleep+0x6f/0x80
> 
> another two don't use it. only similarity I see here is that it seems to be IRQ handling related, but both issues don't have anything in common.
> Does anybody have an idea on where should I look? Of course I should update all those boxes to (at least) latest 2.6.32.x, and I'll do it for sure, but still I'd first like to know where the problem was, and if it has been fixed, or how to fix it...
> I'd be gratefull for any help...

There were quite a bunch of scheduler updates recently. We may be lucky and
hope for the bug to have vanished with the changes, but we may as well see
the same crash in 7 months :-/

My coworker Hervé (CC'd) who worked on the issue suggests that we might have
something which goes wrong past a certain uptime (eg: 212 days), which needs
a special event to be triggered (I/O, process exiting, etc...). I think this
makes quite some sense.

Could you check your CONFIG_HZ so that we could convert those uptimes to
jiffies ? Maybe this will ring a bell in someone's head :-/

Best regards,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-28 18:34 ` [stable] " Willy Tarreau
@ 2011-04-29 10:02   ` Nikola Ciprich
  2011-04-30  9:36     ` Willy Tarreau
  2011-05-06  3:12     ` [stable] " Hidetoshi Seto
  2011-05-13 22:08   ` Nicolas Carlier
  1 sibling, 2 replies; 58+ messages in thread
From: Nikola Ciprich @ 2011-04-29 10:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel mlist, linux-stable mlist, Hervé Commowick,
	seto.hidetoshi

[-- Attachment #1: Type: text/plain, Size: 8125 bytes --]

(another CC added)

Hello Willy!

I made some statistics of our servers regarding kernel version and uptime.
Here are some my thoughts:
- I'm 100% sure this problem wasn't present in kernels <= 2.6.30.x (we've got a lot of boxes with uptimes >600days)
- I'm 90% sure this problem also wasn't present in 2.6.32.16 (we've got 6 boxes running for 235 to 280days)

What I'm not sure is, whether this is present in 2.6.19, I have:
2 boxes running 2.6.32.19 for 238days and one 2.6.32.20 for 216days.
I also have a bunch ov 2.6.32.23 boxes, which are now getting close to 200days uptime.
But I suspect this really is first problematic version, more on it later. 
First regarding Your question about CONFIG_HZ - we use 250HZ setting, which leads me to following:
250 * 60 * 60 * 24 * 199 = 4298400000 which is value a little over 2**32! So maybe some unsingned long variable
might overflow? Does this make sense?

And to my suspicion about 2.6.32.19, there is one commit which maybe is related:

commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239
Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date:   Wed Dec 2 17:28:07 2009 +0900

    sched, cputime: Introduce thread_group_times()
    
    This is a real fix for problem of utime/stime values decreasing
    described in the thread:
    
       http://lkml.org/lkml/2009/11/3/522
    
    Now cputime is accounted in the following way:
    
     - {u,s}time in task_struct are increased every time when the thread
       is interrupted by a tick (timer interrupt).
    
     - When a thread exits, its {u,s}time are added to signal->{u,s}time,
       after adjusted by task_times().
    
     - When all threads in a thread_group exits, accumulated {u,s}time
       (and also c{u,s}time) in signal struct are added to c{u,s}time
       in signal struct of the group's parent.
.
.
.

I haven't studied this into detail yet, but it seems to me it might really be related. Hidetoshi-san - do You have some opinion about this?
Could this somehow either create or invoke the problem with overflow of some variable which would lead to division by zero or similar problems?

Any other thoughts?

best regards

nik





On Thu, Apr 28, 2011 at 08:34:34PM +0200, Willy Tarreau wrote:
> Hello Nikola,
> 
> On Thu, Apr 28, 2011 at 10:26:25AM +0200, Nikola Ciprich wrote:
> > Hello everybody,
> > 
> > I'm trying to solve strange issue, today, my fourth machine running 2.6.32.21 just crashed. What makes the cases similar, apart fromn same kernel version is that all boxes had very similar uptimes: 214, 216, 216, and 224 days. This might just be a coincidence, but I think this might be important.
> 
> Interestingly, one of our customers just had two machines who crashed
> yesterday after 212 days and 212+20h respectively. They were running
> debian's 2.6.32-bpo.5-amd64 which is based on 2.6.32.23 AIUI.
> 
> The crash looks very similar to the following bug which we have updated :
> 
>    https://bugzilla.kernel.org/show_bug.cgi?id=16991
> 
> (bugzilla doesn't appear to respond as I'm posting this mail).
> 
> The top of your ouput is missing. In our case as in the reports on the bug
> above, there was a divide by zero error. Did you happen to spot this one
> too, or do you just not know ? I observe "divide_error+0x15/0x20" in one
> of your reports, so it's possible that it matches the same pattern at least
> for one trace. Just in case, it would be nice to feed the bugzilla entry
> above.
> 
> > Unfortunately I only have backtraces of two crashes (and those are trimmed, sorry), and they do not look as similar as I'd like, but still maybe there is something in common:
> > 
> > [<ffffffff81120cc7>] pollwake+0x57/0x60 
> > [<ffffffff81046720>] ? default_wake_function+0x0/0x10 
> > [<ffffffff8103683a>] __wake_up_common+0x5a/0x90 
> > [<ffffffff8103a313>] __wake_up+0x43/0x70 
> > [<ffffffffa0321573>] process_masterspan+0x643/0x670 [dahdi] 
> > [<ffffffffa0326595>] coretimer_func+0x135/0x1d0 [dahdi] 
> > [<ffffffff8105d74d>] run_timer_softirq+0x15d/0x320 
> > [<ffffffffa0326460>] ? coretimer_func+0x0/0x1d0 [dahdi] 
> > [<ffffffff8105690c>] __do_softirq+0xcc/0x220 
> > [<ffffffff8100c40c>] call_softirq+0x1c/0x30 
> > [<ffffffff8100e3ba>] do_softirq+0x4a/0x80 
> > [<ffffffff810567c7>] irq_exit+0x87/0x90 
> > [<ffffffff8100d7b7>] do_IRQ+0x77/0xf0 
> > [<ffffffff8100bc53>] ret_from_intr+0x0/Oxa 
> > <EUI> [<ffffffffa019e556>] ? acpi_idle_enter_bm+0x273/0x2a1 [processor] 
> > [<ffffffffa019e54c>] ? acpi_idle_enter_bm+0x269/0x2a1 [processor] 
> > [<ffffffff81280095>] ? cpuidle_idle_call+0xa5/0x150 
> > [<ffffffff8100a18f>] ? cpu_idle+0x4f/0x90 
> > [<ffffffff81323c95>] ? rest_init+0x75/0x80 
> > [<ffffffff81582d7f>] ? start_kernel+0x2ef/0x390 
> > [<ffffffff81582271>] ? x86_64_start_reservations+0x81/0xc0 
> > [<ffffffff81582386>] ? x86_64_start_kernel+0xd6/0x100 
> > 
> > this box (actually two of the crashed ones) is using dahdi_dummy module to generate timing for asterisk SW pbx, so maybe it's related to it.
> > 
> > 
> > [<ffffffff810a5063>] handle_IRQ_event+0x63/0x1c0
> > [<ffffffff810a71ae>] handle_edge_irq+0xce/0x160
> > [<ffffffff8100e1bf>] handle_irq+0x1f/0x30                                                                                                                                              
> > [<ffffffff8100d7ae>] do_IRQ+0x6e/0xf0
> > [<ffffffff8100bc53>] ret_from_intr+0x0/Oxa
> > <EUI> [<ffffffff8133?f?f>] ? _spin_un1ock_irq+0xf/0x40
> > [<ffffffff81337f79>] ? _spin_un1ock_irq+0x9/0x40
> > [<ffffffff81064b9a>] ? exit_signals+0x8a/0x130
> > [<ffffffff8105372e>] ? do_exit+0x7e/0x7d0
> > [<ffffffff8100f8a7>] ? oops_end+0xa7/0xb0
> > [<ffffffff8100faa6>] ? die+0x56/0x90
> > [<ffffffff8100c810>] ? do_trap+0x130/0x150
> > [<ffffffff8100ccca>] ? do_divide_error+0x8a/0xa0
> > [<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
> > [<ffffffff8104400b>] ? cpuacct_charge+0x6b/0x90
> > [<ffffffff8100c045>] ? divide_error+0x15/0x20
> > [<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
> > [<ffffffff8103cfff>] ? find_busiest_group+0x1af/0xa00
> > [<ffffffff81335483>] ? thread_return+0x4ce/0x7bb
> > [<ffffffff8133bec5>] ? do_nanosleep+0x75/0x30
> > [<ffffffff810?1?4e>] ? hrtimer_nanosleep+0x9e/0x120
> > [<ffffffff810?08f0>] ? hrtimer_wakeup+0x0/0x30
> > [<ffffffff810?183f>] ? sys_nanosleep+0x6f/0x80
> > 
> > another two don't use it. only similarity I see here is that it seems to be IRQ handling related, but both issues don't have anything in common.
> > Does anybody have an idea on where should I look? Of course I should update all those boxes to (at least) latest 2.6.32.x, and I'll do it for sure, but still I'd first like to know where the problem was, and if it has been fixed, or how to fix it...
> > I'd be gratefull for any help...
> 
> There were quite a bunch of scheduler updates recently. We may be lucky and
> hope for the bug to have vanished with the changes, but we may as well see
> the same crash in 7 months :-/
> 
> My coworker Hervé (CC'd) who worked on the issue suggests that we might have
> something which goes wrong past a certain uptime (eg: 212 days), which needs
> a special event to be triggered (I/O, process exiting, etc...). I think this
> makes quite some sense.
> 
> Could you check your CONFIG_HZ so that we could convert those uptimes to
> jiffies ? Maybe this will ring a bell in someone's head :-/
> 
> Best regards,
> Willy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-29 10:02   ` Nikola Ciprich
@ 2011-04-30  9:36     ` Willy Tarreau
  2011-04-30 11:22       ` Henrique de Moraes Holschuh
                         ` (2 more replies)
  2011-05-06  3:12     ` [stable] " Hidetoshi Seto
  1 sibling, 3 replies; 58+ messages in thread
From: Willy Tarreau @ 2011-04-30  9:36 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: linux-kernel mlist, linux-stable mlist, Hervé Commowick,
	seto.hidetoshi

Hello Nikola,

On Fri, Apr 29, 2011 at 12:02:00PM +0200, Nikola Ciprich wrote:
> (another CC added)
> 
> Hello Willy!
> 
> I made some statistics of our servers regarding kernel version and uptime.
> Here are some my thoughts:
> - I'm 100% sure this problem wasn't present in kernels <= 2.6.30.x (we've got a lot of boxes with uptimes >600days)
> - I'm 90% sure this problem also wasn't present in 2.6.32.16 (we've got 6 boxes running for 235 to 280days)

OK those are all precious information.

> What I'm not sure is, whether this is present in 2.6.19, I have:
> 2 boxes running 2.6.32.19 for 238days and one 2.6.32.20 for 216days.
> I also have a bunch ov 2.6.32.23 boxes, which are now getting close to 200days uptime.
> But I suspect this really is first problematic version, more on it later. 
> First regarding Your question about CONFIG_HZ - we use 250HZ setting, which leads me to following:
> 250 * 60 * 60 * 24 * 199 = 4298400000 which is value a little over 2**32! So maybe some unsingned long variable
> might overflow? Does this make sense?

Yes of course it makes sense, that was also my worries. 2^32 jiffies at
250 Hz is slightly less than 199 days. Maybe an overflow somewhere keeps
propagating wrong results on some computations. I remember having encountered
a lot of funny things when trying to get 2.4 get past the 497 days limit
using the jiffies64 patch. So I would not be surprized at all that we're
in a similar situation here.

Also, I've checked the Debian kernel config where we had the divide overflow
and it was running at 250 Hz too.

> And to my suspicion about 2.6.32.19, there is one commit which maybe is related:
> 
> commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239
> Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> Date:   Wed Dec 2 17:28:07 2009 +0900
> 
>     sched, cputime: Introduce thread_group_times()
>     
>     This is a real fix for problem of utime/stime values decreasing
>     described in the thread:
>     
>        http://lkml.org/lkml/2009/11/3/522
>     
>     Now cputime is accounted in the following way:
>     
>      - {u,s}time in task_struct are increased every time when the thread
>        is interrupted by a tick (timer interrupt).
>     
>      - When a thread exits, its {u,s}time are added to signal->{u,s}time,
>        after adjusted by task_times().
>     
>      - When all threads in a thread_group exits, accumulated {u,s}time
>        (and also c{u,s}time) in signal struct are added to c{u,s}time
>        in signal struct of the group's parent.
> .
> .
> .
> 
> I haven't studied this into detail yet, but it seems to me it might really be related. Hidetoshi-san - do You have some opinion about this?
> Could this somehow either create or invoke the problem with overflow of some variable which would lead to division by zero or similar problems?
> 
> Any other thoughts?

There was a kernel parameter in the past that was used to make jiffies wrap
a few minutes after boot, maybe we should revive it to try to reproduce
without waiting 7 new months :-/

Last, the "advantage" with a suspected regression in a stable series is that 
there are a lot less patches to test.

Regards,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30  9:36     ` Willy Tarreau
@ 2011-04-30 11:22       ` Henrique de Moraes Holschuh
  2011-04-30 11:54         ` Willy Tarreau
  2011-04-30 12:02       ` Nikola Ciprich
  2011-04-30 17:39       ` Faidon Liambotis
  2 siblings, 1 reply; 58+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-04-30 11:22 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Nikola Ciprich, linux-kernel mlist, linux-stable mlist,
	Hervé Commowick, seto.hidetoshi

On Sat, 30 Apr 2011, Willy Tarreau wrote:
> There was a kernel parameter in the past that was used to make jiffies wrap
> a few minutes after boot, maybe we should revive it to try to reproduce
> without waiting 7 new months :-/

IMHO, such an option should be given a permanent home in KCONFIG, and it
should be enabled by default.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30 11:22       ` Henrique de Moraes Holschuh
@ 2011-04-30 11:54         ` Willy Tarreau
  2011-04-30 12:32           ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 58+ messages in thread
From: Willy Tarreau @ 2011-04-30 11:54 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: seto.hidetoshi, Hervé Commowick, linux-kernel mlist,
	linux-stable mlist

On Sat, Apr 30, 2011 at 08:22:44AM -0300, Henrique de Moraes Holschuh wrote:
> On Sat, 30 Apr 2011, Willy Tarreau wrote:
> > There was a kernel parameter in the past that was used to make jiffies wrap
> > a few minutes after boot, maybe we should revive it to try to reproduce
> > without waiting 7 new months :-/
> 
> IMHO, such an option should be given a permanent home in KCONFIG, and it
> should be enabled by default.

Not exactly as it affects uptime.

Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30 11:54         ` Willy Tarreau
@ 2011-04-30 12:32           ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 58+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-04-30 12:32 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: seto.hidetoshi, Hervé Commowick, linux-kernel mlist,
	linux-stable mlist

On Sat, 30 Apr 2011, Willy Tarreau wrote:
> On Sat, Apr 30, 2011 at 08:22:44AM -0300, Henrique de Moraes Holschuh wrote:
> > On Sat, 30 Apr 2011, Willy Tarreau wrote:
> > > There was a kernel parameter in the past that was used to make jiffies wrap
> > > a few minutes after boot, maybe we should revive it to try to reproduce
> > > without waiting 7 new months :-/
> > 
> > IMHO, such an option should be given a permanent home in KCONFIG, and it
> > should be enabled by default.
> 
> Not exactly as it affects uptime.

Well, the offset to the real uptime is known, is there a way to fudge it
back that affects only the uptime reporting?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30  9:36     ` Willy Tarreau
  2011-04-30 11:22       ` Henrique de Moraes Holschuh
@ 2011-04-30 12:02       ` Nikola Ciprich
  2011-04-30 15:57         ` Greg KH
  2011-04-30 17:39       ` Faidon Liambotis
  2 siblings, 1 reply; 58+ messages in thread
From: Nikola Ciprich @ 2011-04-30 12:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel mlist, linux-stable mlist, Hervé Commowick,
	seto.hidetoshi

[-- Attachment #1: Type: text/plain, Size: 1098 bytes --]

> There was a kernel parameter in the past that was used to make jiffies wrap
> a few minutes after boot, maybe we should revive it to try to reproduce
> without waiting 7 new months :-/
hmm, I've googled this up, and it seems to have been 2.5.x patch, so it will
certainly need some porting..
I'll try to have a look on it tonight and report...

> 
> Last, the "advantage" with a suspected regression in a stable series is that 
> there are a lot less patches to test.
> 
> Regards,
> Willy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30 12:02       ` Nikola Ciprich
@ 2011-04-30 15:57         ` Greg KH
  2011-04-30 16:08           ` Randy Dunlap
  0 siblings, 1 reply; 58+ messages in thread
From: Greg KH @ 2011-04-30 15:57 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Willy Tarreau, seto.hidetoshi, Hervé Commowick,
	linux-kernel mlist, linux-stable mlist

On Sat, Apr 30, 2011 at 02:02:05PM +0200, Nikola Ciprich wrote:
> > There was a kernel parameter in the past that was used to make jiffies wrap
> > a few minutes after boot, maybe we should revive it to try to reproduce
> > without waiting 7 new months :-/
> hmm, I've googled this up, and it seems to have been 2.5.x patch, so it will
> certainly need some porting..
> I'll try to have a look on it tonight and report...

I don't think any patch is needed, I thought we did that by default now,
but I can't seem to find the code where it happens...

odd.

greg k-h

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30 15:57         ` Greg KH
@ 2011-04-30 16:08           ` Randy Dunlap
  2011-04-30 16:49             ` Willy Tarreau
  0 siblings, 1 reply; 58+ messages in thread
From: Randy Dunlap @ 2011-04-30 16:08 UTC (permalink / raw)
  To: Greg KH
  Cc: Nikola Ciprich, Willy Tarreau, seto.hidetoshi,
	Hervé Commowick, linux-kernel mlist, linux-stable mlist

On Sat, 30 Apr 2011 08:57:07 -0700 Greg KH wrote:

> On Sat, Apr 30, 2011 at 02:02:05PM +0200, Nikola Ciprich wrote:
> > > There was a kernel parameter in the past that was used to make jiffies wrap
> > > a few minutes after boot, maybe we should revive it to try to reproduce
> > > without waiting 7 new months :-/
> > hmm, I've googled this up, and it seems to have been 2.5.x patch, so it will
> > certainly need some porting..
> > I'll try to have a look on it tonight and report...
> 
> I don't think any patch is needed, I thought we did that by default now,
> but I can't seem to find the code where it happens...
> 
> odd.

linux/jiffies.h:

/*
 * Have the 32 bit jiffies value wrap 5 minutes after boot
 * so jiffies wrap bugs show up earlier.
 */
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))


and kernel/timer.c:

u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30 16:08           ` Randy Dunlap
@ 2011-04-30 16:49             ` Willy Tarreau
  2011-04-30 18:14               ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 58+ messages in thread
From: Willy Tarreau @ 2011-04-30 16:49 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Greg KH, Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	linux-kernel mlist, linux-stable mlist

On Sat, Apr 30, 2011 at 09:08:05AM -0700, Randy Dunlap wrote:
> On Sat, 30 Apr 2011 08:57:07 -0700 Greg KH wrote:
> 
> > On Sat, Apr 30, 2011 at 02:02:05PM +0200, Nikola Ciprich wrote:
> > > > There was a kernel parameter in the past that was used to make jiffies wrap
> > > > a few minutes after boot, maybe we should revive it to try to reproduce
> > > > without waiting 7 new months :-/
> > > hmm, I've googled this up, and it seems to have been 2.5.x patch, so it will
> > > certainly need some porting..
> > > I'll try to have a look on it tonight and report...
> > 
> > I don't think any patch is needed, I thought we did that by default now,
> > but I can't seem to find the code where it happens...
> > 
> > odd.
> 
> linux/jiffies.h:
> 
> /*
>  * Have the 32 bit jiffies value wrap 5 minutes after boot
>  * so jiffies wrap bugs show up earlier.
>  */
> #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
> 
> 
> and kernel/timer.c:
> 
> u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;

Thanks Randy.

So that would mean that wrapping jiffies should be unrelated to the
reported panics. Let's wait for Hidetoshi-san's analysis then.

Regards,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-30 16:49             ` Willy Tarreau
@ 2011-04-30 18:14               ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 58+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-04-30 18:14 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Randy Dunlap, Greg KH, Nikola Ciprich, seto.hidetoshi,
	Hervé Commowick, linux-kernel mlist, linux-stable mlist

On Sat, 30 Apr 2011, Willy Tarreau wrote:
> So that would mean that wrapping jiffies should be unrelated to the
> reported panics. Let's wait for Hidetoshi-san's analysis then.

Err, it actually could be a problem when it wraps twice, OR it could be
related to conditions that didn't happen yet right after boot.

But that's less likely, of course.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-04-30  9:36     ` Willy Tarreau
  2011-04-30 11:22       ` Henrique de Moraes Holschuh
  2011-04-30 12:02       ` Nikola Ciprich
@ 2011-04-30 17:39       ` Faidon Liambotis
  2011-04-30 20:14         ` Willy Tarreau
  2011-06-28  2:25         ` john stultz
  2 siblings, 2 replies; 58+ messages in thread
From: Faidon Liambotis @ 2011-04-30 17:39 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	Willy Tarreau, Randy Dunlap, Greg KH, Ben Hutchings,
	Apollon Oikonomopoulos

If I may add some, hopefully useful, information to the thread:

At work we have an HP c7000 blade enclosure. It's populated with 8 ProLiant
BL460c G1 (Xeon E5405, constant_tsc but not nonstop_tsc) and 4 ProLiant BL460c
G6 (Xeon E5504, constant_tsc + nonstop_tsc). All were booted at the same day
with Debian's 2.6.32-23~bpo50+1 kernel, i.e. upstream 2.6.32.21.

We too experienced problems with just the G6 blades at near 215 days 
uptime (on the 19th of April), all at the same time. From our 
investigation, it seems that their cpu_clocks jumped suddenly far in the 
future and then almost immediately rolled over due to wrapping around 
64-bits.

Although all of their (G6s) clocks wrapped around *at the same time*, only one
of them actually crashed at the time, with a second one crashing just a few
days later, on the 28th.

Three of them had the following on their logs:
Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers present
Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 stuck for 17163091968s! [kvm:25913]
[...]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447275]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Apr 19 10:18:32 hn-05 kernel: [   31.587025] bond0.13: received packet with own address as source address
(full oops at the end of this mail)

Note that the 17163091968s time stuck was *the same* (+/- 1s) in *all
three of them*.

What's also very strange, although we're not very sure if related, is
that when we took the first line of the above log entry and substracted
17966378.581971 from Apr 18 20:56:07, this resulted in a boot-time that
differed several hours from the actual boot time (the latter being
verified both with syslog and /proc/bstat's btime, which were both in
agreement). This was verified post-mortem too, with the date checked to
be correct.

IOW, we have some serious clock drift (calcuated it at runtime to
~0.1s/min) on these machines that hasn't been made apparent probably
since they all run NTP. Moreover, we also saw that drift on other
machines (1U, different vendor, different data center, E5520 CPUs) but
not with the G1 blades. Our investigation showed that the drift is there
constantly (it's not a one-time event) but we're not really sure if it's
related with the Apr 18th jump event.  Note that the drift is there even
if we boot with "clocksource=hpet" but disappears when booted with
"notsc". Also note that we've verified with 2.6.38 and the drift is
still there.

Regards,
Faidon

1: The full backtrace is:
Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers present
Apr 19 06:25:02 hn-05 kernel: imklog 3.18.6, log source = /proc/kmsg started.
Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 stuck for 17163091968s! [kvm:25913]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447056] Modules linked in: xt_pkttype ext2 tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_
defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf loop snd_pcm snd_ti
mer snd soundcore bnx2x psmouse snd_page_alloc serio_raw sd_mod crc_t10dif hpilo ipmi_si ipmi_msghandler container evdev crc32c pcspkr shpchp pci_hotplug libcrc32c power_meter mdio
 button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod usbhid hid cciss ehci_hcd uhci_hcd qla2xxx scsi_transport_fc usbcore nls_base tg3 libphy scsi_
tgt scsi_mod thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447111] CPU 4:
Apr 19 10:15:42 hn-05 kernel: [18446743935.447112] Modules linked in: xt_pkttype ext2 tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_
defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf loop snd_pcm snd_timer snd soundcore bnx2x psmouse snd_page_alloc serio_raw sd_mod crc_t10dif hpilo ipmi_si ipmi_msghandler container evdev crc32c pcspkr shpchp pci_hotplug libcrc32c power_meter mdio button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod usbhid hid cciss ehci_hcd uhci_hcd qla2xxx scsi_transport_fc usbcore nls_base tg3 libphy scsi_tgt scsi_mod thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447154] Pid: 25913, comm: kvm Not tainted 2.6.32-bpo.5-amd64 #1 ProLiant BL460c G6
Apr 19 10:15:42 hn-05 kernel: [18446743935.447157] RIP: 0010:[<ffffffffa02e209a>]  [<ffffffffa02e209a>] kvm_arch_vcpu_ioctl_run+0x785/0xa44 [kvm]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447177] RSP: 0018:ffff88070065fd38  EFLAGS: 00000202
Apr 19 10:15:42 hn-05 kernel: [18446743935.447179] RAX: ffff88070065ffd8 RBX: ffff8804154ba860 RCX: ffff8804154ba860
Apr 19 10:15:42 hn-05 kernel: [18446743935.447182] RDX: ffff8804154ba8b9 RSI: ffff81004c818b10 RDI: ffff8100291a7d48
Apr 19 10:15:42 hn-05 kernel: [18446743935.447184] RBP: ffffffff8101166e R08: 0000000000000000 R09: 0000000000000000
Apr 19 10:15:42 hn-05 kernel: [18446743935.447187] R10: 00007f1b2a2af078 R11: ffffffff802f3405 R12: 0000000000000001
Apr 19 10:15:42 hn-05 kernel: [18446743935.447189] R13: 00000000154ba8b8 R14: ffff8804136ac000 R15: ffff8804154bbd48
Apr 19 10:15:42 hn-05 kernel: [18446743935.447192] FS:  0000000042cbb950(0000) GS:ffff88042e440000(0000) knlGS:0000000000000000
Apr 19 10:15:42 hn-05 kernel: [18446743935.447195] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Apr 19 10:15:42 hn-05 kernel: [18446743935.447197] CR2: 0000000000000008 CR3: 0000000260c41000 CR4: 00000000000026e0
Apr 19 10:15:42 hn-05 kernel: [18446743935.447200] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 19 10:15:42 hn-05 kernel: [18446743935.447202] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 19 10:15:42 hn-05 kernel: [18446743935.447205] Call Trace:
Apr 19 10:15:42 hn-05 kernel: [18446743935.447215]  [<ffffffffa02e2004>] ? kvm_arch_vcpu_ioctl_run+0x6ef/0xa44 [kvm]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447224]  [<ffffffff8100f79c>] ? __switch_to+0x285/0x297
Apr 19 10:15:42 hn-05 kernel: [18446743935.447231]  [<ffffffffa02d49d1>] ? kvm_vcpu_ioctl+0xf1/0x4e6 [kvm]
Apr 19 10:15:42 hn-05 kernel: [18446743935.447237]  [<ffffffff810240da>] ? lapic_next_event+0x18/0x1d
Apr 19 10:15:42 hn-05 kernel: [18446743935.447245]  [<ffffffff8106fb77>] ? tick_dev_program_event+0x2d/0x95
Apr 19 10:15:42 hn-05 kernel: [18446743935.447251]  [<ffffffff81047e29>] ? finish_task_switch+0x3a/0xaf
Apr 19 10:15:42 hn-05 kernel: [18446743935.447258]  [<ffffffff810f9f5a>] ? vfs_ioctl+0x21/0x6c
Apr 19 10:15:42 hn-05 kernel: [18446743935.447261]  [<ffffffff810fa4a8>] ? do_vfs_ioctl+0x48d/0x4cb
Apr 19 10:15:42 hn-05 kernel: [18446743935.447268]  [<ffffffff81063fcd>] ? sys_timer_settime+0x233/0x283
Apr 19 10:15:42 hn-05 kernel: [18446743935.447272]  [<ffffffff810fa537>] ? sys_ioctl+0x51/0x70
Apr 19 10:15:42 hn-05 kernel: [18446743935.447275]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Apr 19 10:18:32 hn-05 kernel: [   31.587025] bond0.13: received packet with own address as source address

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-04-30 17:39       ` Faidon Liambotis
@ 2011-04-30 20:14         ` Willy Tarreau
  2011-05-14 19:04           ` Nikola Ciprich
  2011-06-28  2:25         ` john stultz
  1 sibling, 1 reply; 58+ messages in thread
From: Willy Tarreau @ 2011-04-30 20:14 UTC (permalink / raw)
  To: Faidon Liambotis
  Cc: linux-kernel, stable, Nikola Ciprich, seto.hidetoshi,
	Hervé Commowick, Randy Dunlap, Greg KH, Ben Hutchings,
	Apollon Oikonomopoulos

Hi Faidon,

On Sat, Apr 30, 2011 at 08:39:05PM +0300, Faidon Liambotis wrote:
> If I may add some, hopefully useful, information to the thread:
> 
> At work we have an HP c7000 blade enclosure. It's populated with 8 ProLiant
> BL460c G1 (Xeon E5405, constant_tsc but not nonstop_tsc) and 4 ProLiant 
> BL460c
> G6 (Xeon E5504, constant_tsc + nonstop_tsc). All were booted at the same day
> with Debian's 2.6.32-23~bpo50+1 kernel, i.e. upstream 2.6.32.21.
> 
> We too experienced problems with just the G6 blades at near 215 days 
> uptime (on the 19th of April), all at the same time. From our 
> investigation, it seems that their cpu_clocks jumped suddenly far in the 
> future and then almost immediately rolled over due to wrapping around 
> 64-bits.
> 
> Although all of their (G6s) clocks wrapped around *at the same time*, only 
> one
> of them actually crashed at the time, with a second one crashing just a few
> days later, on the 28th.
> 
> Three of them had the following on their logs:
> Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers 
> present
> Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 
> stuck for 17163091968s! [kvm:25913]
> [...]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447275]  [<ffffffff81010b42>] ? 
> system_call_fastpath+0x16/0x1b
> Apr 19 10:18:32 hn-05 kernel: [   31.587025] bond0.13: received packet with 
> own address as source address
> (full oops at the end of this mail)
> 
> Note that the 17163091968s time stuck was *the same* (+/- 1s) in *all
> three of them*.
> 
> What's also very strange, although we're not very sure if related, is
> that when we took the first line of the above log entry and substracted
> 17966378.581971 from Apr 18 20:56:07, this resulted in a boot-time that
> differed several hours from the actual boot time (the latter being
> verified both with syslog and /proc/bstat's btime, which were both in
> agreement). This was verified post-mortem too, with the date checked to
> be correct.

Well, your report is by far the most complete of our 3 since you managed
to get this trace of wraping time.

I don't know if it means anything to anyone, but 17163091968 is exactly
0x3FF000000, or 1023*2^24. Too round to be a coincidence. This number
of seconds corresponds to 1000 * 2^32 jiffies at 250 Hz.

> IOW, we have some serious clock drift (calcuated it at runtime to
> ~0.1s/min) on these machines that hasn't been made apparent probably
> since they all run NTP. Moreover, we also saw that drift on other
> machines (1U, different vendor, different data center, E5520 CPUs) but
> not with the G1 blades. Our investigation showed that the drift is there
> constantly (it's not a one-time event) but we're not really sure if it's
> related with the Apr 18th jump event.  Note that the drift is there even
> if we boot with "clocksource=hpet" but disappears when booted with
> "notsc". Also note that we've verified with 2.6.38 and the drift is
> still there.
> 
> Regards,
> Faidon
> 
> 1: The full backtrace is:
> Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers 
> present
> Apr 19 06:25:02 hn-05 kernel: imklog 3.18.6, log source = /proc/kmsg 
> started.
> Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 
> stuck for 17163091968s! [kvm:25913]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447056] Modules linked in: 
> xt_pkttype ext2 tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter 
> ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_
> defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q 
> garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff 
> ipmi_devintf loop snd_pcm snd_ti
> mer snd soundcore bnx2x psmouse snd_page_alloc serio_raw sd_mod crc_t10dif 
> hpilo ipmi_si ipmi_msghandler container evdev crc32c pcspkr shpchp 
> pci_hotplug libcrc32c power_meter mdio
> button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log 
> dm_snapshot dm_mod usbhid hid cciss ehci_hcd uhci_hcd qla2xxx 
> scsi_transport_fc usbcore nls_base tg3 libphy scsi_
> tgt scsi_mod thermal fan thermal_sys [last unloaded: scsi_wait_scan]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447111] CPU 4:
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447112] Modules linked in: 
> xt_pkttype ext2 tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter 
> ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_
> defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q 
> garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff 
> ipmi_devintf loop snd_pcm snd_timer snd soundcore bnx2x psmouse 
> snd_page_alloc serio_raw sd_mod crc_t10dif hpilo ipmi_si ipmi_msghandler 
> container evdev crc32c pcspkr shpchp pci_hotplug libcrc32c power_meter mdio 
> button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log 
> dm_snapshot dm_mod usbhid hid cciss ehci_hcd uhci_hcd qla2xxx 
> scsi_transport_fc usbcore nls_base tg3 libphy scsi_tgt scsi_mod thermal fan 
> thermal_sys [last unloaded: scsi_wait_scan]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447154] Pid: 25913, comm: kvm 
> Not tainted 2.6.32-bpo.5-amd64 #1 ProLiant BL460c G6
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447157] RIP: 
> 0010:[<ffffffffa02e209a>]  [<ffffffffa02e209a>] 
> kvm_arch_vcpu_ioctl_run+0x785/0xa44 [kvm]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447177] RSP: 
> 0018:ffff88070065fd38  EFLAGS: 00000202
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447179] RAX: ffff88070065ffd8 
> RBX: ffff8804154ba860 RCX: ffff8804154ba860
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447182] RDX: ffff8804154ba8b9 
> RSI: ffff81004c818b10 RDI: ffff8100291a7d48
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447184] RBP: ffffffff8101166e 
> R08: 0000000000000000 R09: 0000000000000000
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447187] R10: 00007f1b2a2af078 
> R11: ffffffff802f3405 R12: 0000000000000001
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447189] R13: 00000000154ba8b8 
> R14: ffff8804136ac000 R15: ffff8804154bbd48
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447192] FS:  
> 0000000042cbb950(0000) GS:ffff88042e440000(0000) knlGS:0000000000000000
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447195] CS:  0010 DS: 002b ES: 
> 002b CR0: 0000000080050033
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447197] CR2: 0000000000000008 
> CR3: 0000000260c41000 CR4: 00000000000026e0
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447200] DR0: 0000000000000000 
> DR1: 0000000000000000 DR2: 0000000000000000
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447202] DR3: 0000000000000000 
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447205] Call Trace:
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447215]  [<ffffffffa02e2004>] ? 
> kvm_arch_vcpu_ioctl_run+0x6ef/0xa44 [kvm]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447224]  [<ffffffff8100f79c>] ? 
> __switch_to+0x285/0x297
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447231]  [<ffffffffa02d49d1>] ? 
> kvm_vcpu_ioctl+0xf1/0x4e6 [kvm]
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447237]  [<ffffffff810240da>] ? 
> lapic_next_event+0x18/0x1d
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447245]  [<ffffffff8106fb77>] ? 
> tick_dev_program_event+0x2d/0x95
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447251]  [<ffffffff81047e29>] ? 
> finish_task_switch+0x3a/0xaf
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447258]  [<ffffffff810f9f5a>] ? 
> vfs_ioctl+0x21/0x6c
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447261]  [<ffffffff810fa4a8>] ? 
> do_vfs_ioctl+0x48d/0x4cb
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447268]  [<ffffffff81063fcd>] ? 
> sys_timer_settime+0x233/0x283
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447272]  [<ffffffff810fa537>] ? 
> sys_ioctl+0x51/0x70
> Apr 19 10:15:42 hn-05 kernel: [18446743935.447275]  [<ffffffff81010b42>] ? 
> system_call_fastpath+0x16/0x1b
> Apr 19 10:18:32 hn-05 kernel: [   31.587025] bond0.13: received packet with 
> own address as source address

Regards,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-04-30 20:14         ` Willy Tarreau
@ 2011-05-14 19:04           ` Nikola Ciprich
  2011-05-14 20:45             ` Willy Tarreau
  2011-05-15 22:56             ` Faidon Liambotis
  0 siblings, 2 replies; 58+ messages in thread
From: Nikola Ciprich @ 2011-05-14 19:04 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Faidon Liambotis, linux-kernel, stable, seto.hidetoshi,
	Hervé Commowick, Randy Dunlap, Greg KH, Ben Hutchings,
	Apollon Oikonomopoulos, chronidev

[-- Attachment #1: Type: text/plain, Size: 10222 bytes --]

Hello gentlemans,
Nicolas, thanks for further report, it contradicts my theory that problem occured somewhere during 2.6.32.16.
Now I think I know why several of my other machines running 2.6.32.x for long time didn't crashed:

I checked bugzilla entry for (I believe the same) problem here:
https://bugzilla.kernel.org/show_bug.cgi?id=16991
and Peter Zijlstra asked there, whether reporters systems were running some RT tasks. Then I realised that all of my four crashed boxes were pacemaker/corosync clusters and pacemaker uses lots of RT priority tasks. So I believe this is important, and might be reason why other machines seem to be running rock solid - they are not running any RT tasks.
It also might help with hunting this bug. Is somebody of You also running some RT priority tasks on inflicted systems, or problem also occured without it?
Cheers!
n.

PS: Hidetoshi-san - btw, (late) thanks for Your reply and confirmation that Your patch should not be the cause of this problem. I'm now sure it must have emerged sooner, and I'm sorry for this false accusation :)

On Sat, Apr 30, 2011 at 10:14:36PM +0200, Willy Tarreau wrote:
> Hi Faidon,
> 
> On Sat, Apr 30, 2011 at 08:39:05PM +0300, Faidon Liambotis wrote:
> > If I may add some, hopefully useful, information to the thread:
> > 
> > At work we have an HP c7000 blade enclosure. It's populated with 8 ProLiant
> > BL460c G1 (Xeon E5405, constant_tsc but not nonstop_tsc) and 4 ProLiant 
> > BL460c
> > G6 (Xeon E5504, constant_tsc + nonstop_tsc). All were booted at the same day
> > with Debian's 2.6.32-23~bpo50+1 kernel, i.e. upstream 2.6.32.21.
> > 
> > We too experienced problems with just the G6 blades at near 215 days 
> > uptime (on the 19th of April), all at the same time. From our 
> > investigation, it seems that their cpu_clocks jumped suddenly far in the 
> > future and then almost immediately rolled over due to wrapping around 
> > 64-bits.
> > 
> > Although all of their (G6s) clocks wrapped around *at the same time*, only 
> > one
> > of them actually crashed at the time, with a second one crashing just a few
> > days later, on the 28th.
> > 
> > Three of them had the following on their logs:
> > Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers 
> > present
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 
> > stuck for 17163091968s! [kvm:25913]
> > [...]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447275]  [<ffffffff81010b42>] ? 
> > system_call_fastpath+0x16/0x1b
> > Apr 19 10:18:32 hn-05 kernel: [   31.587025] bond0.13: received packet with 
> > own address as source address
> > (full oops at the end of this mail)
> > 
> > Note that the 17163091968s time stuck was *the same* (+/- 1s) in *all
> > three of them*.
> > 
> > What's also very strange, although we're not very sure if related, is
> > that when we took the first line of the above log entry and substracted
> > 17966378.581971 from Apr 18 20:56:07, this resulted in a boot-time that
> > differed several hours from the actual boot time (the latter being
> > verified both with syslog and /proc/bstat's btime, which were both in
> > agreement). This was verified post-mortem too, with the date checked to
> > be correct.
> 
> Well, your report is by far the most complete of our 3 since you managed
> to get this trace of wraping time.
> 
> I don't know if it means anything to anyone, but 17163091968 is exactly
> 0x3FF000000, or 1023*2^24. Too round to be a coincidence. This number
> of seconds corresponds to 1000 * 2^32 jiffies at 250 Hz.
> 
> > IOW, we have some serious clock drift (calcuated it at runtime to
> > ~0.1s/min) on these machines that hasn't been made apparent probably
> > since they all run NTP. Moreover, we also saw that drift on other
> > machines (1U, different vendor, different data center, E5520 CPUs) but
> > not with the G1 blades. Our investigation showed that the drift is there
> > constantly (it's not a one-time event) but we're not really sure if it's
> > related with the Apr 18th jump event.  Note that the drift is there even
> > if we boot with "clocksource=hpet" but disappears when booted with
> > "notsc". Also note that we've verified with 2.6.38 and the drift is
> > still there.
> > 
> > Regards,
> > Faidon
> > 
> > 1: The full backtrace is:
> > Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers 
> > present
> > Apr 19 06:25:02 hn-05 kernel: imklog 3.18.6, log source = /proc/kmsg 
> > started.
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 
> > stuck for 17163091968s! [kvm:25913]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447056] Modules linked in: 
> > xt_pkttype ext2 tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter 
> > ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_
> > defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q 
> > garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff 
> > ipmi_devintf loop snd_pcm snd_ti
> > mer snd soundcore bnx2x psmouse snd_page_alloc serio_raw sd_mod crc_t10dif 
> > hpilo ipmi_si ipmi_msghandler container evdev crc32c pcspkr shpchp 
> > pci_hotplug libcrc32c power_meter mdio
> > button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log 
> > dm_snapshot dm_mod usbhid hid cciss ehci_hcd uhci_hcd qla2xxx 
> > scsi_transport_fc usbcore nls_base tg3 libphy scsi_
> > tgt scsi_mod thermal fan thermal_sys [last unloaded: scsi_wait_scan]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447111] CPU 4:
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447112] Modules linked in: 
> > xt_pkttype ext2 tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter 
> > ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_
> > defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q 
> > garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff 
> > ipmi_devintf loop snd_pcm snd_timer snd soundcore bnx2x psmouse 
> > snd_page_alloc serio_raw sd_mod crc_t10dif hpilo ipmi_si ipmi_msghandler 
> > container evdev crc32c pcspkr shpchp pci_hotplug libcrc32c power_meter mdio 
> > button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log 
> > dm_snapshot dm_mod usbhid hid cciss ehci_hcd uhci_hcd qla2xxx 
> > scsi_transport_fc usbcore nls_base tg3 libphy scsi_tgt scsi_mod thermal fan 
> > thermal_sys [last unloaded: scsi_wait_scan]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447154] Pid: 25913, comm: kvm 
> > Not tainted 2.6.32-bpo.5-amd64 #1 ProLiant BL460c G6
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447157] RIP: 
> > 0010:[<ffffffffa02e209a>]  [<ffffffffa02e209a>] 
> > kvm_arch_vcpu_ioctl_run+0x785/0xa44 [kvm]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447177] RSP: 
> > 0018:ffff88070065fd38  EFLAGS: 00000202
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447179] RAX: ffff88070065ffd8 
> > RBX: ffff8804154ba860 RCX: ffff8804154ba860
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447182] RDX: ffff8804154ba8b9 
> > RSI: ffff81004c818b10 RDI: ffff8100291a7d48
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447184] RBP: ffffffff8101166e 
> > R08: 0000000000000000 R09: 0000000000000000
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447187] R10: 00007f1b2a2af078 
> > R11: ffffffff802f3405 R12: 0000000000000001
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447189] R13: 00000000154ba8b8 
> > R14: ffff8804136ac000 R15: ffff8804154bbd48
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447192] FS:  
> > 0000000042cbb950(0000) GS:ffff88042e440000(0000) knlGS:0000000000000000
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447195] CS:  0010 DS: 002b ES: 
> > 002b CR0: 0000000080050033
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447197] CR2: 0000000000000008 
> > CR3: 0000000260c41000 CR4: 00000000000026e0
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447200] DR0: 0000000000000000 
> > DR1: 0000000000000000 DR2: 0000000000000000
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447202] DR3: 0000000000000000 
> > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447205] Call Trace:
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447215]  [<ffffffffa02e2004>] ? 
> > kvm_arch_vcpu_ioctl_run+0x6ef/0xa44 [kvm]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447224]  [<ffffffff8100f79c>] ? 
> > __switch_to+0x285/0x297
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447231]  [<ffffffffa02d49d1>] ? 
> > kvm_vcpu_ioctl+0xf1/0x4e6 [kvm]
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447237]  [<ffffffff810240da>] ? 
> > lapic_next_event+0x18/0x1d
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447245]  [<ffffffff8106fb77>] ? 
> > tick_dev_program_event+0x2d/0x95
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447251]  [<ffffffff81047e29>] ? 
> > finish_task_switch+0x3a/0xaf
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447258]  [<ffffffff810f9f5a>] ? 
> > vfs_ioctl+0x21/0x6c
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447261]  [<ffffffff810fa4a8>] ? 
> > do_vfs_ioctl+0x48d/0x4cb
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447268]  [<ffffffff81063fcd>] ? 
> > sys_timer_settime+0x233/0x283
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447272]  [<ffffffff810fa537>] ? 
> > sys_ioctl+0x51/0x70
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.447275]  [<ffffffff81010b42>] ? 
> > system_call_fastpath+0x16/0x1b
> > Apr 19 10:18:32 hn-05 kernel: [   31.587025] bond0.13: received packet with 
> > own address as source address
> 
> Regards,
> Willy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-05-14 19:04           ` Nikola Ciprich
@ 2011-05-14 20:45             ` Willy Tarreau
  2011-05-14 20:59               ` Ben Hutchings
  2011-05-14 23:13               ` Nicolas Carlier
  2011-05-15 22:56             ` Faidon Liambotis
  1 sibling, 2 replies; 58+ messages in thread
From: Willy Tarreau @ 2011-05-14 20:45 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Faidon Liambotis, linux-kernel, stable, seto.hidetoshi,
	Hervé Commowick, Randy Dunlap, Greg KH, Ben Hutchings,
	Apollon Oikonomopoulos, chronidev

Hi,

On Sat, May 14, 2011 at 09:04:23PM +0200, Nikola Ciprich wrote:
> Hello gentlemans,
> Nicolas, thanks for further report, it contradicts my theory that problem occured somewhere during 2.6.32.16.

Well, I'd like to be sure what kernel we're talking about. Nicolas said
"2.6.32.8 Debian Kernel", but I suspect it's "2.6.32-8something" instead.
Nicolas, could you please report the exact version as indicated by "uname -a" ?

> Now I think I know why several of my other machines running 2.6.32.x for long time didn't crashed:
> 
> I checked bugzilla entry for (I believe the same) problem here:
> https://bugzilla.kernel.org/show_bug.cgi?id=16991
> and Peter Zijlstra asked there, whether reporters systems were running some RT tasks. Then I realised that all of my four crashed boxes were pacemaker/corosync clusters and pacemaker uses lots of RT priority tasks. So I believe this is important, and might be reason why other machines seem to be running rock solid - they are not running any RT tasks.
> It also might help with hunting this bug. Is somebody of You also running some RT priority tasks on inflicted systems, or problem also occured without it?

No, our customer who had two of these boxes crash at the same time was
not running any RT task to the best of my knowledge.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-05-14 20:45             ` Willy Tarreau
@ 2011-05-14 20:59               ` Ben Hutchings
  2011-05-14 23:13               ` Nicolas Carlier
  1 sibling, 0 replies; 58+ messages in thread
From: Ben Hutchings @ 2011-05-14 20:59 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Nikola Ciprich, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Randy Dunlap, Greg KH,
	Apollon Oikonomopoulos, chronidev

[-- Attachment #1: Type: text/plain, Size: 703 bytes --]

On Sat, 2011-05-14 at 22:45 +0200, Willy Tarreau wrote:
> Hi,
> 
> On Sat, May 14, 2011 at 09:04:23PM +0200, Nikola Ciprich wrote:
> > Hello gentlemans,
> > Nicolas, thanks for further report, it contradicts my theory that problem occured somewhere during 2.6.32.16.
> 
> Well, I'd like to be sure what kernel we're talking about. Nicolas said
> "2.6.32.8 Debian Kernel", but I suspect it's "2.6.32-8something" instead.
> Nicolas, could you please report the exact version as indicated by "uname -a" ?
[...]

Actually you need to use 'cat /proc/version' (or dpkg) to get the full
version.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-05-14 20:45             ` Willy Tarreau
  2011-05-14 20:59               ` Ben Hutchings
@ 2011-05-14 23:13               ` Nicolas Carlier
  1 sibling, 0 replies; 58+ messages in thread
From: Nicolas Carlier @ 2011-05-14 23:13 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Nikola Ciprich, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Randy Dunlap, Greg KH,
	Ben Hutchings, Apollon Oikonomopoulos

Hi,

On Sat, May 14, 2011 at 10:45 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hi,
>
> On Sat, May 14, 2011 at 09:04:23PM +0200, Nikola Ciprich wrote:
>> Hello gentlemans,
>> Nicolas, thanks for further report, it contradicts my theory that problem occured somewhere during 2.6.32.16.
>
> Well, I'd like to be sure what kernel we're talking about. Nicolas said
> "2.6.32.8 Debian Kernel", but I suspect it's "2.6.32-8something" instead.
> Nicolas, could you please report the exact version as indicated by "uname -a" ?


Sorry, I can't provide more informations on this version because I
don't use it anymore, I can just corrected myself,  it was not a
2.6.32.8 kernel but a 2.6.32.7 backport debian kernel, which had been
recompiled.

Because of this problem I took the oportunity to change to a 2.6.32.26
kernel, however as there was nothing on the changelog or bugzilla
about the resolution of  this issue we have applied the patch found in
bugzilla which revealed this problem:

https://bugzilla.kernel.org/show_bug.cgi?id=16991#c17

>
>> Now I think I know why several of my other machines running 2.6.32.x for long time didn't crashed:
>>
>> I checked bugzilla entry for (I believe the same) problem here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=16991
>> and Peter Zijlstra asked there, whether reporters systems were running some RT tasks. Then I realised that all of my four crashed boxes were pacemaker/corosync clusters and pacemaker uses lots of RT priority tasks. So I believe this is important, and might be reason why other machines seem to be running rock solid - they are not running any RT tasks.
>> It also might help with hunting this bug. Is somebody of You also running some RT priority tasks on inflicted systems, or problem also occured without it?
>
> No, our customer who had two of these boxes crash at the same time was
> not running any RT task to the best of my knowledge.
>

Regards,

--
Nicolas Carlier

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-05-14 19:04           ` Nikola Ciprich
  2011-05-14 20:45             ` Willy Tarreau
@ 2011-05-15 22:56             ` Faidon Liambotis
  2011-05-16  6:49               ` Apollon Oikonomopoulos
  1 sibling, 1 reply; 58+ messages in thread
From: Faidon Liambotis @ 2011-05-15 22:56 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Willy Tarreau, linux-kernel, stable, seto.hidetoshi,
	Hervé Commowick, Randy Dunlap, Greg KH, Ben Hutchings,
	Apollon Oikonomopoulos, chronidev

On Sat, May 14, 2011 at 09:04:23PM +0200, Nikola Ciprich wrote:
> Nicolas, thanks for further report, it contradicts my theory that
> problem occured somewhere during 2.6.32.16.  Now I think I know why
> several of my other machines running 2.6.32.x for long time didn't
> crashed:
> 
> I checked bugzilla entry for (I believe the same) problem here:
> https://bugzilla.kernel.org/show_bug.cgi?id=16991

I don't think that that bug is related, I for one haven't seen any
backtrace that is similar to the above or relevant to divide by zero.

> and Peter Zijlstra asked there, whether reporters systems were running
> some RT tasks. Then I realised that all of my four crashed boxes were
> pacemaker/corosync clusters and pacemaker uses lots of RT priority
> tasks. So I believe this is important, and might be reason why other
> machines seem to be running rock solid - they are not running any RT
> tasks.  It also might help with hunting this bug. Is somebody of You
> also running some RT priority tasks on inflicted systems, or problem
> also occured without it?

No, no RT tasks here. The boxes in my case were just running a lot of
kvm processes.

Regards,
Faidon

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-05-15 22:56             ` Faidon Liambotis
@ 2011-05-16  6:49               ` Apollon Oikonomopoulos
  0 siblings, 0 replies; 58+ messages in thread
From: Apollon Oikonomopoulos @ 2011-05-16  6:49 UTC (permalink / raw)
  To: Faidon Liambotis
  Cc: Nikola Ciprich, Willy Tarreau, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Randy Dunlap, Greg KH,
	Ben Hutchings, chronidev

Hello all,

On 01:56 Mon 16 May     , Faidon Liambotis wrote:
> > and Peter Zijlstra asked there, whether reporters systems were running
> > some RT tasks. Then I realised that all of my four crashed boxes were
> > pacemaker/corosync clusters and pacemaker uses lots of RT priority
> > tasks. So I believe this is important, and might be reason why other
> > machines seem to be running rock solid - they are not running any RT
> > tasks.  It also might help with hunting this bug. Is somebody of You
> > also running some RT priority tasks on inflicted systems, or problem
> > also occured without it?
> 
> No, no RT tasks here. The boxes in my case were just running a lot of
> kvm processes.

Actually we are running multipathd, which is an RT process and heavily loaded
on these particular systems.

Regards,
Apollon

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-04-30 17:39       ` Faidon Liambotis
  2011-04-30 20:14         ` Willy Tarreau
@ 2011-06-28  2:25         ` john stultz
  2011-06-28  5:17           ` Willy Tarreau
  2011-07-06  6:15           ` Andrew Morton
  1 sibling, 2 replies; 58+ messages in thread
From: john stultz @ 2011-06-28  2:25 UTC (permalink / raw)
  To: Faidon Liambotis
  Cc: linux-kernel, stable, Nikola Ciprich, seto.hidetoshi,
	Hervé Commowick, Willy Tarreau, Randy Dunlap, Greg KH,
	Ben Hutchings, Apollon Oikonomopoulos

On Sat, Apr 30, 2011 at 10:39 AM, Faidon Liambotis <paravoid@debian.org> wrote:
> We too experienced problems with just the G6 blades at near 215 days uptime
> (on the 19th of April), all at the same time. From our investigation, it
> seems that their cpu_clocks jumped suddenly far in the future and then
> almost immediately rolled over due to wrapping around 64-bits.
>
> Although all of their (G6s) clocks wrapped around *at the same time*, only
> one
> of them actually crashed at the time, with a second one crashing just a few
> days later, on the 28th.
>
> Three of them had the following on their logs:
> Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers
> present
> Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4
> stuck for 17163091968s! [kvm:25913]

So, did this issue ever get any traction or get resolved?

>From the softlockup message, I suspect we hit a multiply overflow in
the underlying sched_clock() implementation.

Because the goal of  sched_clock is to be very fast, lightweight and
safe from locking issues (so it can be called anywhere) handling
transient corner cases internally has been avoided as they would
require costly locking and extra overhead. Because of this,
sched_clock users should be cautious to be robust in the face of
transient errors.

Peter: I wonder if the soft lockup code should be using the
(hopefully) more robust timekeeping code (ie: get_seconds) for its
get_timestamp function? I'd worry that you might have issues catching
cases where the system was locked up so the timekeeping accounting
code didn't get to run,  but you have the same problem in the jiffies
based sched_clock code as well (since timekeeping increments jiffies
in most cases).

That said, I didn't see from any of the backtraces in this thread why
the system actually crashed.  The softlockup message on its own
shouldn't do that, so I suspect there's still a related issue
somewhere else here.

thanks
-john

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-06-28  2:25         ` john stultz
@ 2011-06-28  5:17           ` Willy Tarreau
  2011-06-28  6:19             ` Apollon Oikonomopoulos
  2011-07-06  6:15           ` Andrew Morton
  1 sibling, 1 reply; 58+ messages in thread
From: Willy Tarreau @ 2011-06-28  5:17 UTC (permalink / raw)
  To: john stultz
  Cc: Faidon Liambotis, linux-kernel, stable, Nikola Ciprich,
	seto.hidetoshi, Hervé Commowick, Randy Dunlap, Greg KH,
	Ben Hutchings, Apollon Oikonomopoulos

On Mon, Jun 27, 2011 at 07:25:31PM -0700, john stultz wrote:
> On Sat, Apr 30, 2011 at 10:39 AM, Faidon Liambotis <paravoid@debian.org> wrote:
> > We too experienced problems with just the G6 blades at near 215 days uptime
> > (on the 19th of April), all at the same time. From our investigation, it
> > seems that their cpu_clocks jumped suddenly far in the future and then
> > almost immediately rolled over due to wrapping around 64-bits.
> >
> > Although all of their (G6s) clocks wrapped around *at the same time*, only
> > one
> > of them actually crashed at the time, with a second one crashing just a few
> > days later, on the 28th.
> >
> > Three of them had the following on their logs:
> > Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers
> > present
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4
> > stuck for 17163091968s! [kvm:25913]
> 
> So, did this issue ever get any traction or get resolved?

I'm not aware of any news on the subject unfortunately. We asked our customer
to reboot both machines one week apart so that in 6 months they don't crash at
the same time :-/

(...)
> That said, I didn't see from any of the backtraces in this thread why
> the system actually crashed.  The softlockup message on its own
> shouldn't do that, so I suspect there's still a related issue
> somewhere else here.

One of the traces clearly showed that the kernel's uptime had wrapped
or jumped, because the uptime suddenly jumped forwards to something
like 2^32/HZ seconds IIRC.

Thus it is possible that we have two bugs, one on the clock making it
jump forwards and one somewhere else causing an overflow when the clock
jumps too far.

Regards,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-06-28  5:17           ` Willy Tarreau
@ 2011-06-28  6:19             ` Apollon Oikonomopoulos
  0 siblings, 0 replies; 58+ messages in thread
From: Apollon Oikonomopoulos @ 2011-06-28  6:19 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: john stultz, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	Randy Dunlap, Greg KH, Ben Hutchings

On 07:17 Tue 28 Jun     , Willy Tarreau wrote:
> On Mon, Jun 27, 2011 at 07:25:31PM -0700, john stultz wrote:
> > That said, I didn't see from any of the backtraces in this thread why
> > the system actually crashed.  The softlockup message on its own
> > shouldn't do that, so I suspect there's still a related issue
> > somewhere else here.
> 
> One of the traces clearly showed that the kernel's uptime had wrapped
> or jumped, because the uptime suddenly jumped forwards to something
> like 2^32/HZ seconds IIRC.
> 
> Thus it is possible that we have two bugs, one on the clock making it
> jump forwards and one somewhere else causing an overflow when the clock
> jumps too far.

Our last machine with wrapped time crashed 1 month ago, almost 1 month after
the time wrap. One thing I noticed, was that although the machine seemed
healthy apart from the time-wrap, there seemed to be random scheduling
glitches, which were mostly visible as high ping times to the KVM guests
running on the machine. Unfortunately I don't have any exact numbers, so I
suppose the best I can do is describe what we saw.

All scheduler statistics under /proc/sched_debug on the host seemed normal,
however pinging a VM from outside would give random spikes in the order of
hundreds of ms among the usual 1-2 ms times. Moving the VM to another host
would restore sane ping times and any other VM moved to this host would exhibit
the same behaviour. Ping times to the host itself from outside were stable.
This was also accompanied by bad I/O performance in the KVM guests themelves
and the strange effect that the total CPU time on the VM's munin graphs would
add to less than 100% * #CPUs.  Neither the host nor the guests were
experiencing heavy load.

As a side note, this was similar to the behaviour we had experienced once when
some of multipathd's path checkers (which are RT tasks IIRC) had crashed,
although this time restarting multipathd didn't help.

Regards,
Apollon

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-06-28  2:25         ` john stultz
  2011-06-28  5:17           ` Willy Tarreau
@ 2011-07-06  6:15           ` Andrew Morton
  2011-07-12  1:18             ` MINOURA Makoto / 箕浦 真
  1 sibling, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2011-07-06  6:15 UTC (permalink / raw)
  To: john stultz
  Cc: Faidon Liambotis, linux-kernel, stable, Nikola Ciprich,
	seto.hidetoshi, Hervé Commowick, Willy Tarreau, Randy Dunlap,
	Greg KH, Ben Hutchings, Apollon Oikonomopoulos

On Mon, 27 Jun 2011 19:25:31 -0700 john stultz <johnstul@us.ibm.com> wrote:

> On Sat, Apr 30, 2011 at 10:39 AM, Faidon Liambotis <paravoid@debian.org> wrote:
> > We too experienced problems with just the G6 blades at near 215 days uptime
> > (on the 19th of April), all at the same time. From our investigation, it
> > seems that their cpu_clocks jumped suddenly far in the future and then
> > almost immediately rolled over due to wrapping around 64-bits.
> >
> > Although all of their (G6s) clocks wrapped around *at the same time*, only
> > one
> > of them actually crashed at the time, with a second one crashing just a few
> > days later, on the 28th.
> >
> > Three of them had the following on their logs:
> > Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers
> > present
> > Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4
> > stuck for 17163091968s! [kvm:25913]
> 
> So, did this issue ever get any traction or get resolved?
> 

https://bugzilla.kernel.org/show_bug.cgi?id=37382 is similar - a
divide-by-zero in update_sg_lb_stats() after 209 days uptime.

Can we change this stuff so that the timers wrap after 10 minutes
uptime, like INITIAL_JIFFIES?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-06  6:15           ` Andrew Morton
@ 2011-07-12  1:18             ` MINOURA Makoto / 箕浦 真
  2011-07-12  1:40               ` john stultz
  0 siblings, 1 reply; 58+ messages in thread
From: MINOURA Makoto / 箕浦 真 @ 2011-07-12  1:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: john stultz, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	Willy Tarreau, Rand


We're experiencing similar but slightly different
problems.  Some KVM hosts crash after 210-220 uptime.
Some of them hits divide-by-zero, but one of them shows:

[671528.8780080] BUG: soft lockup - CPU#4 stuck for 61s! [kvm:11131]

(sorry we have no full crash message including the backtrace)

The host kernel is 2.6.32.11-based (ubuntu 2.6.32-22-server,
2.6.32-22.36).

I'm not sure but probably the task scheduler is confusing by
the sched_clock overflow?

Thanks,

-- 
Minoura Makoto <minoura@valinux.co.jp>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-12  1:18             ` MINOURA Makoto / 箕浦 真
@ 2011-07-12  1:40               ` john stultz
  2011-07-12  2:49                 ` MINOURA Makoto / 箕浦 真
  0 siblings, 1 reply; 58+ messages in thread
From: john stultz @ 2011-07-12  1:40 UTC (permalink / raw)
  To: MINOURA Makoto / 箕浦 真
  Cc: Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	Willy Tarreau, Rand

On Tue, 2011-07-12 at 10:18 +0900, MINOURA Makoto / 箕浦 真 wrote:
> We're experiencing similar but slightly different
> problems.  Some KVM hosts crash after 210-220 uptime.
> Some of them hits divide-by-zero, but one of them shows:
> 
> [671528.8780080] BUG: soft lockup - CPU#4 stuck for 61s! [kvm:11131]
> 
> (sorry we have no full crash message including the backtrace)
> 
> The host kernel is 2.6.32.11-based (ubuntu 2.6.32-22-server,
> 2.6.32-22.36).
> 
> I'm not sure but probably the task scheduler is confusing by
> the sched_clock overflow?

I'm working on a debug patch that will hopefully trip sched_clock
overflows very early to see if we can't shake these issues out.

thanks
-john



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-12  1:40               ` john stultz
@ 2011-07-12  2:49                 ` MINOURA Makoto / 箕浦 真
  2011-07-12  4:19                   ` Willy Tarreau
  0 siblings, 1 reply; 58+ messages in thread
From: MINOURA Makoto / 箕浦 真 @ 2011-07-12  2:49 UTC (permalink / raw)
  To: john stultz
  Cc: Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	Willy Tarreau, Rand


|> In <1310434819.30337.21.camel@work-vm>
|>  john stultz <johnstul@us.ibm.com> wrote:

> I'm working on a debug patch that will hopefully trip sched_clock
> overflows very early to see if we can't shake these issues out.

Thanks. I hope that'll help.

-- 
Minoura Makoto <minoura@valinux.co.jp>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-12  2:49                 ` MINOURA Makoto / 箕浦 真
@ 2011-07-12  4:19                   ` Willy Tarreau
  2011-07-15  0:35                     ` john stultz
  0 siblings, 1 reply; 58+ messages in thread
From: Willy Tarreau @ 2011-07-12  4:19 UTC (permalink / raw)
  To: MINOURA Makoto / ?$BL'1: ?$B??
  Cc: john stultz, Andrew Morton, Faidon Liambotis, linux-kernel,
	stable, Nikola Ciprich, seto.hidetoshi, Hervé Commowick,
	Rand

On Tue, Jul 12, 2011 at 11:49:57AM +0900, MINOURA Makoto / ?$BL'1: ?$B?? wrote:
> 
> |> In <1310434819.30337.21.camel@work-vm>
> |>  john stultz <johnstul@us.ibm.com> wrote:
> 
> > I'm working on a debug patch that will hopefully trip sched_clock
> > overflows very early to see if we can't shake these issues out.
> 
> Thanks. I hope that'll help.

That certainly will. What currently makes this bug hard to fix is that
it takes around 7 months to test a possible fix. We don't even know if
more recent kernels are affected, it's possible that 2.6.32-stable is
the only one that people don't reboot for an update in 7 months :-/

We need to make this bug more visible, so John's patch will be very
welcome !

Regards,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-12  4:19                   ` Willy Tarreau
@ 2011-07-15  0:35                     ` john stultz
  2011-07-15  8:30                       ` Peter Zijlstra
  2011-07-15 10:01                       ` Peter Zijlstra
  0 siblings, 2 replies; 58+ messages in thread
From: john stultz @ 2011-07-15  0:35 UTC (permalink / raw)
  To: Willy Tarreau, Peter Zijlstra, Ingo Molnar
  Cc: MINOURA Makoto / ?$BL'1: ?$B??, Andrew Morton,
	Faidon Liambotis, linux-kernel, stable, Nikola Ciprich,
	seto.hidetoshi, Hervé Commowick, Rand

On Tue, 2011-07-12 at 06:19 +0200, Willy Tarreau wrote:
> On Tue, Jul 12, 2011 at 11:49:57AM +0900, MINOURA Makoto / ?$BL'1: ?$B?? wrote:
> > 
> > |> In <1310434819.30337.21.camel@work-vm>
> > |>  john stultz <johnstul@us.ibm.com> wrote:
> > 
> > > I'm working on a debug patch that will hopefully trip sched_clock
> > > overflows very early to see if we can't shake these issues out.
> > 
> > Thanks. I hope that'll help.
> 
> That certainly will. What currently makes this bug hard to fix is that
> it takes around 7 months to test a possible fix. We don't even know if
> more recent kernels are affected, it's possible that 2.6.32-stable is
> the only one that people don't reboot for an update in 7 months :-/
> 
> We need to make this bug more visible, so John's patch will be very
> welcome !

Ok. So this might be long.

So I scratched out a hack that trips the multiplication overflows to
happen after 5 minutes, but found that for my systems, it didn't cause
any strange behavior at all. The reason being that my system
sched_clock_stable is not set, so there are extra corrective layers
going on. 

This likely shows why this issue crops up on some boxes but not others.
Its likely only x86 systems with both X86_FEATURE_TSC_RELIABLE and
X86_FEATURE_CONSTANT_TSC are affected.

Even forcing sched_clock_stable on, I still didn't see any crashes or
oopses with mainline. So I then back ported to 2.6.31, and there I could
only trigger softlockup watchdog warnings (but not crashes).

None the less, adding further debug messages, I did see some odd
behavior from sched_clock around the time the multiplication overflows
happen.

Earlier in this thread, one reporter had the following in their logs
when they hit a softlockup warning:
Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers present 
Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 stuck for 17163091968s! [kvm:25913] 
...
Apr 19 10:18:32 hn-05 kernel: [ 31.587025] bond0.13: received packet with own address as source address 

This was a 2ghz box, so working backward, we know the cyc2ns scale value
is 512, so the cyc2ns equation is (cyc*512)>>10

Well, 64bits divided by 512 is ~36028797018963967 cycles, which is
~18014398 seconds or ~208 days (which lines up closely to the 17966378
printk time above from 20:56:07 - the day before the overflow). 

So on this machines we should expect to see the sched_clock values get
close to 18014398 seconds, and then drop down to a small number and grow
again. Which happens as we see at 10:18:32 timestamp above.

The oddball bit is why the large [18446743935.365550] printk time
in-between when the softlockup occured? Since we're unsigned and
shifting down by 10, we shouldn't be seeing such large numbers!

Well, I reproduced this as well periodically, and it ends up sched_clock
is doing more then (cyc*512)>>10.

Because the freq might change, there's also a normalization done by
adding a cyc2ns_offset value. Well, at boot, this value is miscalculated
on the first cpu, and we end up with a negative value in cyc2ns_offset.

So what we're seeing is the overflow, which is expected, but then the
subtraction rolls the value under and back into a very large number.

So yea, the multiplication overflow is probably hard enough for
sched_clock users to handle, however the extra spike caused by the
subtraction makes it even more complicated.

You can find my current forced-overflow patch and my cyc2ns_offset
initialization fix here (Caution, I may re-base these branches!):
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/uptime-crash

However, as I mentioned in the cyc2ns_offset fix, there's still the
chance for the same thing to occur in other cases where the cpufreq does
change. And this suggests we really need a deeper fix here.

One thought I've had is to rework the sched_clock implementation to be a
bit more like the timekeeping code. However, this requires keeping track
of a bit more data, which then requires fancy atomic updates to
structures using rcu (since we can't do locking with sched_clock), and
probably has some extra overhead.

That said, it avoids having to do the funky offset based normalization
and if we add a very rare periodic timer to do accumulation, we can
avoid the multiplication overflow all together.

I've put together a first pitch at this here (boots, but not robustly
tested):
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/sched_clock-rework

Peter/Ingo: Can you take a look at the above and let me know if you find
it too disagreeable?

thanks
-john

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-15  0:35                     ` john stultz
@ 2011-07-15  8:30                       ` Peter Zijlstra
  2011-07-15 10:02                         ` Peter Zijlstra
  2011-07-15 10:01                       ` Peter Zijlstra
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2011-07-15  8:30 UTC (permalink / raw)
  To: john stultz
  Cc: Willy Tarreau, Ingo Molnar, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand

On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> 
> Peter/Ingo: Can you take a look at the above and let me know if you find
> it too disagreeable? 

I'm not quite sure of the calling conditions of set_cyc2ns_scale(), but
there's two sites in there that do:

+       local_irq_save(flags);

+               data = kmalloc(sizeof(*data), GFP_KERNEL);

+       local_irq_restore(flags);

Clearly that's not going to work well at all.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-15  8:30                       ` Peter Zijlstra
@ 2011-07-15 10:02                         ` Peter Zijlstra
  2011-07-15 18:03                           ` john stultz
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2011-07-15 10:02 UTC (permalink / raw)
  To: john stultz
  Cc: Willy Tarreau, Ingo Molnar, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand

On Fri, 2011-07-15 at 10:30 +0200, Peter Zijlstra wrote:
> On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> > 
> > Peter/Ingo: Can you take a look at the above and let me know if you find
> > it too disagreeable? 
> 
> I'm not quite sure of the calling conditions of set_cyc2ns_scale(), but
> there's two sites in there that do:
> 
> +       local_irq_save(flags);
> 
> +               data = kmalloc(sizeof(*data), GFP_KERNEL);
> 
> +       local_irq_restore(flags);
> 
> Clearly that's not going to work well at all.

Furthermore there is a distinct lack of error handling there, it assumes
the allocation always succeeds.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-15 10:02                         ` Peter Zijlstra
@ 2011-07-15 18:03                           ` john stultz
  0 siblings, 0 replies; 58+ messages in thread
From: john stultz @ 2011-07-15 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Willy Tarreau, Ingo Molnar, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand

On Fri, 2011-07-15 at 12:02 +0200, Peter Zijlstra wrote:
> On Fri, 2011-07-15 at 10:30 +0200, Peter Zijlstra wrote:
> > On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> > > 
> > > Peter/Ingo: Can you take a look at the above and let me know if you find
> > > it too disagreeable? 
> > 
> > I'm not quite sure of the calling conditions of set_cyc2ns_scale(), but
> > there's two sites in there that do:
> > 
> > +       local_irq_save(flags);
> > 
> > +               data = kmalloc(sizeof(*data), GFP_KERNEL);
> > 
> > +       local_irq_restore(flags);
> > 
> > Clearly that's not going to work well at all.
> 
> Furthermore there is a distinct lack of error handling there, it assumes
> the allocation always succeeds.

Yes, both of those issues are embarrassing. Thanks for pointing them
out. I'll take another swing at it.

thanks
-john



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-15  0:35                     ` john stultz
  2011-07-15  8:30                       ` Peter Zijlstra
@ 2011-07-15 10:01                       ` Peter Zijlstra
  2011-07-15 17:59                         ` john stultz
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2011-07-15 10:01 UTC (permalink / raw)
  To: john stultz
  Cc: Willy Tarreau, Ingo Molnar, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand

On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> 
> Peter/Ingo: Can you take a look at the above and let me know if you find
> it too disagreeable?

+static unsigned long long __cycles_2_ns(unsigned long long cyc)
+{
+       unsigned long long ns = 0;
+       struct x86_sched_clock_data *data;
+       int cpu = smp_processor_id();
+
+       rcu_read_lock();
+       data = rcu_dereference(per_cpu(cpu_sched_clock_data, cpu));
+
+       if (unlikely(!data))
+               goto out;
+
+       ns = ((cyc - data->base_cycles) * data->mult) >> CYC2NS_SCALE_FACTOR;
+       ns += data->accumulated_ns;
+out:
+       rcu_read_unlock();
+       return ns;
+}

The way I read that we're still not wrapping properly if freq scaling
'never' happens.

Because then we're wrapping on accumulated_ns + 2^54.

Something like resetting base, and adding ns to accumulated_ns and
returning the latter would make more sense.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-15 10:01                       ` Peter Zijlstra
@ 2011-07-15 17:59                         ` john stultz
  2011-07-21  7:22                           ` Ingo Molnar
  0 siblings, 1 reply; 58+ messages in thread
From: john stultz @ 2011-07-15 17:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Willy Tarreau, Ingo Molnar, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand

On Fri, 2011-07-15 at 12:01 +0200, Peter Zijlstra wrote:
> On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> > 
> > Peter/Ingo: Can you take a look at the above and let me know if you find
> > it too disagreeable?
> 
> +static unsigned long long __cycles_2_ns(unsigned long long cyc)
> +{
> +       unsigned long long ns = 0;
> +       struct x86_sched_clock_data *data;
> +       int cpu = smp_processor_id();
> +
> +       rcu_read_lock();
> +       data = rcu_dereference(per_cpu(cpu_sched_clock_data, cpu));
> +
> +       if (unlikely(!data))
> +               goto out;
> +
> +       ns = ((cyc - data->base_cycles) * data->mult) >> CYC2NS_SCALE_FACTOR;
> +       ns += data->accumulated_ns;
> +out:
> +       rcu_read_unlock();
> +       return ns;
> +}
> 
> The way I read that we're still not wrapping properly if freq scaling
> 'never' happens.

Right, this doesn't address the mult overflow behavior. As I mentioned
in the patch that the rework allows for solving that in the future using
a (possibly very rare) timer that would accumulate cycles to ns.

This rework just really addresses the multiplication overflow->negative
roll under that currently occurs with the cyc2ns_offset value.

> Because then we're wrapping on accumulated_ns + 2^54.
> 
> Something like resetting base, and adding ns to accumulated_ns and
> returning the latter would make more sense.

Although we have to update the base_cycles and accumulated_ns
atomically, so its probably not something to do in the sched_clock path.

thanks
-john





^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-15 17:59                         ` john stultz
@ 2011-07-21  7:22                           ` Ingo Molnar
  2011-07-21 12:24                             ` Peter Zijlstra
  2011-07-21 19:53                             ` john stultz
  0 siblings, 2 replies; 58+ messages in thread
From: Ingo Molnar @ 2011-07-21  7:22 UTC (permalink / raw)
  To: john stultz
  Cc: Peter Zijlstra, Willy Tarreau, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand


* john stultz <johnstul@us.ibm.com> wrote:

> On Fri, 2011-07-15 at 12:01 +0200, Peter Zijlstra wrote:
> > On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> > > 
> > > Peter/Ingo: Can you take a look at the above and let me know if you find
> > > it too disagreeable?
> > 
> > +static unsigned long long __cycles_2_ns(unsigned long long cyc)
> > +{
> > +       unsigned long long ns = 0;
> > +       struct x86_sched_clock_data *data;
> > +       int cpu = smp_processor_id();
> > +
> > +       rcu_read_lock();
> > +       data = rcu_dereference(per_cpu(cpu_sched_clock_data, cpu));
> > +
> > +       if (unlikely(!data))
> > +               goto out;
> > +
> > +       ns = ((cyc - data->base_cycles) * data->mult) >> CYC2NS_SCALE_FACTOR;
> > +       ns += data->accumulated_ns;
> > +out:
> > +       rcu_read_unlock();
> > +       return ns;
> > +}
> > 
> > The way I read that we're still not wrapping properly if freq scaling
> > 'never' happens.
> 
> Right, this doesn't address the mult overflow behavior. As I mentioned
> in the patch that the rework allows for solving that in the future using
> a (possibly very rare) timer that would accumulate cycles to ns.
> 
> This rework just really addresses the multiplication overflow->negative
> roll under that currently occurs with the cyc2ns_offset value.
> 
> > Because then we're wrapping on accumulated_ns + 2^54.
> > 
> > Something like resetting base, and adding ns to accumulated_ns and
> > returning the latter would make more sense.
> 
> Although we have to update the base_cycles and accumulated_ns
> atomically, so its probably not something to do in the sched_clock path.

Ping, what's going on with this bug? Systems are crashing so we need 
a quick fix ASAP ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21  7:22                           ` Ingo Molnar
@ 2011-07-21 12:24                             ` Peter Zijlstra
  2011-07-21 12:50                               ` Nikola Ciprich
  2011-07-21 19:53                             ` john stultz
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2011-07-21 12:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: john stultz, Willy Tarreau, MINOURA Makoto, Andrew Morton,
	Faidon Liambotis, linux-kernel, stable, Nikola Ciprich,
	seto.hidetoshi, Hervé Commowick, Rand

On Thu, 2011-07-21 at 09:22 +0200, Ingo Molnar wrote:
> 
> Ping, what's going on with this bug? Systems are crashing so we need 
> a quick fix ASAP ... 

Something as simple as the below ought to cure things for now. Once we
get __cycles_2_ns() fixed up we can enable it again.

(patch against -tip, .32 code is different but equally simple to fix)

---
Subject: x86, intel: Don't mark sched_clock() as stable

Because the x86 sched_clock() implementation wraps at 54 bits and the
scheduler code assumes it wraps at the full 64bits we can get into
trouble after 208 days (~7 months) of uptime. 

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/kernel/cpu/intel.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ed6086e..c8dc48b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -91,8 +91,15 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
+		/*
+		 * Unfortunately our __cycles_2_ns() implementation makes
+		 * the raw sched_clock() interface wrap at 54-bits, which
+		 * makes it unsuitable for direct use, so disable this
+		 * for now.
+		 *
 		if (!check_tsc_unstable())
 			sched_clock_stable = 1;
+		 */
 	}
 
 	/*


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 12:24                             ` Peter Zijlstra
@ 2011-07-21 12:50                               ` Nikola Ciprich
  2011-07-21 12:53                                 ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Nikola Ciprich @ 2011-07-21 12:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, john stultz, Willy Tarreau, MINOURA Makoto,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Rand, Nikola Ciprich

[-- Attachment #1: Type: text/plain, Size: 2130 bytes --]

Hi,
thanks for the patch! I'll put this on our testing boxes...
Are You going to push this upstream so we can ask Greg to push this to -stable?
or do You plan to wait for more complex patch?
n.


On Thu, Jul 21, 2011 at 02:24:58PM +0200, Peter Zijlstra wrote:
> On Thu, 2011-07-21 at 09:22 +0200, Ingo Molnar wrote:
> > 
> > Ping, what's going on with this bug? Systems are crashing so we need 
> > a quick fix ASAP ... 
> 
> Something as simple as the below ought to cure things for now. Once we
> get __cycles_2_ns() fixed up we can enable it again.
> 
> (patch against -tip, .32 code is different but equally simple to fix)
> 
> ---
> Subject: x86, intel: Don't mark sched_clock() as stable
> 
> Because the x86 sched_clock() implementation wraps at 54 bits and the
> scheduler code assumes it wraps at the full 64bits we can get into
> trouble after 208 days (~7 months) of uptime. 
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/x86/kernel/cpu/intel.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index ed6086e..c8dc48b 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -91,8 +91,15 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
>  	if (c->x86_power & (1 << 8)) {
>  		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>  		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> +		/*
> +		 * Unfortunately our __cycles_2_ns() implementation makes
> +		 * the raw sched_clock() interface wrap at 54-bits, which
> +		 * makes it unsuitable for direct use, so disable this
> +		 * for now.
> +		 *
>  		if (!check_tsc_unstable())
>  			sched_clock_stable = 1;
> +		 */
>  	}
>  
>  	/*
> 
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 12:50                               ` Nikola Ciprich
@ 2011-07-21 12:53                                 ` Peter Zijlstra
  2011-07-21 18:45                                   ` Ingo Molnar
  2011-07-21 19:25                                   ` Nikola Ciprich
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Zijlstra @ 2011-07-21 12:53 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Ingo Molnar, john stultz, Willy Tarreau, MINOURA Makoto,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Rand

On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> thanks for the patch! I'll put this on our testing boxes...

With a patch that frobs the starting value close to overflowing I hope,
otherwise we'll not hear from you in like 7 months ;-)

> Are You going to push this upstream so we can ask Greg to push this to
> -stable? 

Yeah, I think we want to commit this with a -stable tag, Ingo?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 12:53                                 ` Peter Zijlstra
@ 2011-07-21 18:45                                   ` Ingo Molnar
  2011-07-21 19:32                                     ` Nikola Ciprich
  2011-08-25 18:56                                     ` Faidon Liambotis
  2011-07-21 19:25                                   ` Nikola Ciprich
  1 sibling, 2 replies; 58+ messages in thread
From: Ingo Molnar @ 2011-07-21 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nikola Ciprich, john stultz, Willy Tarreau, MINOURA Makoto,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Rand


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> > thanks for the patch! I'll put this on our testing boxes...
> 
> With a patch that frobs the starting value close to overflowing I hope,
> otherwise we'll not hear from you in like 7 months ;-)
> 
> > Are You going to push this upstream so we can ask Greg to push this to
> > -stable? 
> 
> Yeah, I think we want to commit this with a -stable tag, Ingo?

yeah - and we also want a Reported-by tag and an explanation of how 
it can crash and why it matters in practice. I can then stick it into 
the urgent branch for Linus. (probably will only hit upstream in the 
merge window though.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 18:45                                   ` Ingo Molnar
@ 2011-07-21 19:32                                     ` Nikola Ciprich
  2011-08-25 18:56                                     ` Faidon Liambotis
  1 sibling, 0 replies; 58+ messages in thread
From: Nikola Ciprich @ 2011-07-21 19:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, john stultz, Willy Tarreau, MINOURA Makoto,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Rand, Nikola Ciprich,
	Petr Kopecký

[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]

> yeah - and we also want a Reported-by tag and an explanation of how 
> it can crash and why it matters in practice. I can then stick it into 
> the urgent branch for Linus. (probably will only hit upstream in the 
> merge window though.)

Hello Ingo,

well, I guess You can add me as reporter, but this has been independently 
reported by others as well, as the bug got hit by quite a lot of people...
I'm afraid I won't add much to technical description of how this crashes the
machine apart from what has been discussed in this thread. But the reason why
this hurts us a lot is that it seems systems running RT tasks are affected in
particular, and many of our crashed machines were failover clusters running
pacemaker/corosync (which runs a lot of RT processes). And it really sucks, 
when both nodes of "high-availability" system crash in the same time :( 
So we were then forced to plan preventive restarts of some of those critical 
systems just to be sure they don't end up badly..

thanks to You all for taking a look at this!

cheers!

nik

> 
> Thanks,
> 
> 	Ingo
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 18:45                                   ` Ingo Molnar
  2011-07-21 19:32                                     ` Nikola Ciprich
@ 2011-08-25 18:56                                     ` Faidon Liambotis
  2011-08-30 22:38                                       ` [stable] " Greg KH
  1 sibling, 1 reply; 58+ messages in thread
From: Faidon Liambotis @ 2011-08-25 18:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Nikola Ciprich, john stultz, Willy Tarreau,
	MINOURA Makoto, Andrew Morton, Ingo Molnar, stable,
	seto.hidetoshi, Hervé Commowick, Rand

On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> > > thanks for the patch! I'll put this on our testing boxes...
> > 
> > With a patch that frobs the starting value close to overflowing I hope,
> > otherwise we'll not hear from you in like 7 months ;-)
> > 
> > > Are You going to push this upstream so we can ask Greg to push this to
> > > -stable? 
> > 
> > Yeah, I think we want to commit this with a -stable tag, Ingo?
> 
> yeah - and we also want a Reported-by tag and an explanation of how 
> it can crash and why it matters in practice. I can then stick it into 
> the urgent branch for Linus. (probably will only hit upstream in the 
> merge window though.)

Has this been pushed or has the problem been solved somehow? Time is
against us on this bug as more boxes will crash as they reach 200 days
of uptime...

In any case, feel free to use me as a Reported-by, my full report of the
problem being <20110430173905.GA25641@tty.gr>.

FWIW and if I understand correctly, my symptoms were caused by *two*
different bugs:
a) the 54 bits wraparound at 208 days that Peter fixed above,
b) a kernel crash at ~215 days related to RT tasks, fixed by
305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).

Regards,
Faidon

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-08-25 18:56                                     ` Faidon Liambotis
@ 2011-08-30 22:38                                       ` Greg KH
  2011-09-04 23:26                                         ` Faidon Liambotis
  0 siblings, 1 reply; 58+ messages in thread
From: Greg KH @ 2011-08-30 22:38 UTC (permalink / raw)
  To: Faidon Liambotis
  Cc: linux-kernel, seto.hidetoshi, Peter Zijlstra, MINOURA Makoto,
	Ingo Molnar, stable, Hervé Commowick, john stultz, Rand,
	Andrew Morton, Willy Tarreau

On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote:
> On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> > > > thanks for the patch! I'll put this on our testing boxes...
> > > 
> > > With a patch that frobs the starting value close to overflowing I hope,
> > > otherwise we'll not hear from you in like 7 months ;-)
> > > 
> > > > Are You going to push this upstream so we can ask Greg to push this to
> > > > -stable? 
> > > 
> > > Yeah, I think we want to commit this with a -stable tag, Ingo?
> > 
> > yeah - and we also want a Reported-by tag and an explanation of how 
> > it can crash and why it matters in practice. I can then stick it into 
> > the urgent branch for Linus. (probably will only hit upstream in the 
> > merge window though.)
> 
> Has this been pushed or has the problem been solved somehow? Time is
> against us on this bug as more boxes will crash as they reach 200 days
> of uptime...
> 
> In any case, feel free to use me as a Reported-by, my full report of the
> problem being <20110430173905.GA25641@tty.gr>.
> 
> FWIW and if I understand correctly, my symptoms were caused by *two*
> different bugs:
> a) the 54 bits wraparound at 208 days that Peter fixed above,
> b) a kernel crash at ~215 days related to RT tasks, fixed by
> 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).

So, what do I do here as part of the .32-longterm kernel?  Is there a
fix that is in Linus's tree that I need to apply here?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-08-30 22:38                                       ` [stable] " Greg KH
@ 2011-09-04 23:26                                         ` Faidon Liambotis
  2011-10-23 18:31                                           ` Ruben Kerkhof
  0 siblings, 1 reply; 58+ messages in thread
From: Faidon Liambotis @ 2011-09-04 23:26 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, seto.hidetoshi, Peter Zijlstra, MINOURA Makoto,
	Ingo Molnar, stable, Hervé Commowick, john stultz, Rand,
	Andrew Morton, Willy Tarreau

On Tue, Aug 30, 2011 at 03:38:29PM -0700, Greg KH wrote:
> On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote:
> > On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> > > > > thanks for the patch! I'll put this on our testing boxes...
> > > > 
> > > > With a patch that frobs the starting value close to overflowing I hope,
> > > > otherwise we'll not hear from you in like 7 months ;-)
> > > > 
> > > > > Are You going to push this upstream so we can ask Greg to push this to
> > > > > -stable? 
> > > > 
> > > > Yeah, I think we want to commit this with a -stable tag, Ingo?
> > > 
> > > yeah - and we also want a Reported-by tag and an explanation of how 
> > > it can crash and why it matters in practice. I can then stick it into 
> > > the urgent branch for Linus. (probably will only hit upstream in the 
> > > merge window though.)
> > 
> > Has this been pushed or has the problem been solved somehow? Time is
> > against us on this bug as more boxes will crash as they reach 200 days
> > of uptime...
> > 
> > In any case, feel free to use me as a Reported-by, my full report of the
> > problem being <20110430173905.GA25641@tty.gr>.
> > 
> > FWIW and if I understand correctly, my symptoms were caused by *two*
> > different bugs:
> > a) the 54 bits wraparound at 208 days that Peter fixed above,
> > b) a kernel crash at ~215 days related to RT tasks, fixed by
> > 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).
> 
> So, what do I do here as part of the .32-longterm kernel?  Is there a
> fix that is in Linus's tree that I need to apply here?
> 
> confused,

Is this even pushed upstream? I checked Linus' tree and the proposed
patch is *not* merged there. I'm not really sure if it was fixed some
other way, though. I thought this was intended to be an "urgent" fix or
something?

Regards,
Faidon

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-09-04 23:26                                         ` Faidon Liambotis
@ 2011-10-23 18:31                                           ` Ruben Kerkhof
  2011-10-23 22:07                                             ` Greg KH
  2011-10-25 22:44                                             ` john stultz
  0 siblings, 2 replies; 58+ messages in thread
From: Ruben Kerkhof @ 2011-10-23 18:31 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, seto.hidetoshi, Peter Zijlstra, MINOURA Makoto,
	Ingo Molnar, stable, Hervé Commowick, john stultz, Rand,
	Andrew Morton, Willy Tarreau, Faidon Liambotis

On Mon, Sep 5, 2011 at 01:26, Faidon Liambotis <paravoid@debian.org> wrote:
> On Tue, Aug 30, 2011 at 03:38:29PM -0700, Greg KH wrote:
>> On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote:
>> > On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
>> > > * Peter Zijlstra <peterz@infradead.org> wrote:
>> > >
>> > > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
>> > > > > thanks for the patch! I'll put this on our testing boxes...
>> > > >
>> > > > With a patch that frobs the starting value close to overflowing I hope,
>> > > > otherwise we'll not hear from you in like 7 months ;-)
>> > > >
>> > > > > Are You going to push this upstream so we can ask Greg to push this to
>> > > > > -stable?
>> > > >
>> > > > Yeah, I think we want to commit this with a -stable tag, Ingo?
>> > >
>> > > yeah - and we also want a Reported-by tag and an explanation of how
>> > > it can crash and why it matters in practice. I can then stick it into
>> > > the urgent branch for Linus. (probably will only hit upstream in the
>> > > merge window though.)
>> >
>> > Has this been pushed or has the problem been solved somehow? Time is
>> > against us on this bug as more boxes will crash as they reach 200 days
>> > of uptime...
>> >
>> > In any case, feel free to use me as a Reported-by, my full report of the
>> > problem being <20110430173905.GA25641@tty.gr>.
>> >
>> > FWIW and if I understand correctly, my symptoms were caused by *two*
>> > different bugs:
>> > a) the 54 bits wraparound at 208 days that Peter fixed above,
>> > b) a kernel crash at ~215 days related to RT tasks, fixed by
>> > 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).
>>
>> So, what do I do here as part of the .32-longterm kernel?  Is there a
>> fix that is in Linus's tree that I need to apply here?
>>
>> confused,
>
> Is this even pushed upstream? I checked Linus' tree and the proposed
> patch is *not* merged there. I'm not really sure if it was fixed some
> other way, though. I thought this was intended to be an "urgent" fix or
> something?
>
> Regards,
> Faidon

I just had two crashes on two different machines, both with an uptime
of 208 days.
Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000

2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup -
CPU#0 stuck for 17163091968s! [qemu-kvm:16949]
2011-10-23T16:49:18.618054+02:00 phy001 kernel: Modules linked in:
xt_limit ebt_log ebt_limit ebt_arp ebtable_filter ebtable_nat ebtables
ufs nls_utf8 tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm
ioatdma i2c_i801 igb iTCO_wdt dca iTCO_vendor_support serio_raw
i2c_core 3w_9xxx [last unloaded: scsi_wait_scan]
2011-10-23T16:49:18.618060+02:00 phy001 kernel: CPU 0
2011-10-23T16:49:18.618068+02:00 phy001 kernel: Modules linked in:
xt_limit ebt_log ebt_limit ebt_arp ebtable_filter ebtable_nat ebtables
ufs nls_utf8 tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm
ioatdma i2c_i801 igb iTCO_wdt dca iTCO_vendor_support serio_raw
i2c_core 3w_9xxx [last unloaded: scsi_wait_scan]
2011-10-23T16:49:18.618072+02:00 phy001 kernel:
2011-10-23T16:49:18.618077+02:00 phy001 kernel: Pid: 16949, comm:
qemu-kvm Tainted: G   M       2.6.34.8-68.local.fc13.x86_64 #1
X8DTU/X8DTU
2011-10-23T16:49:18.618083+02:00 phy001 kernel: RIP:
0010:[<ffffffffa007f92f>]  [<ffffffffa007f92f>]
kvm_arch_vcpu_ioctl_run+0x764/0xa74 [kvm]
2011-10-23T16:49:18.618086+02:00 phy001 kernel: RSP:
0018:ffff880bafa29d18  EFLAGS: 00000202
2011-10-23T16:49:18.618088+02:00 phy001 kernel: RAX: ffff880002000000
RBX: ffff880bafa29dc8 RCX: ffff8805e45128a0
2011-10-23T16:49:18.618091+02:00 phy001 kernel: RDX: 000000000000cb80
RSI: 0000000004b2a3a0 RDI: 000000000b630000
2011-10-23T16:49:18.618093+02:00 phy001 kernel: RBP: ffffffff8100a60e
R08: 000000000000002b R09: 00000000760d0735
2011-10-23T16:49:18.618095+02:00 phy001 kernel: R10: 0000000000000000
R11: 0000000000000000 R12: 0000000000000001
2011-10-23T16:49:18.618097+02:00 phy001 kernel: R13: ffff880bafa29cc8
R14: ffffffffa007b536 R15: ffff880bafa29ca8
2011-10-23T16:49:18.618100+02:00 phy001 kernel: FS:
00007fe92cd38700(0000) GS:ffff880002000000(0000)
knlGS:fffff880009b8000
2011-10-23T16:49:18.618102+02:00 phy001 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-10-23T16:49:18.618104+02:00 phy001 kernel: CR2: 00000000c1a00044
CR3: 00000006b3f2e000 CR4: 00000000000026e0
2011-10-23T16:49:18.618107+02:00 phy001 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-10-23T16:49:18.618109+02:00 phy001 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-10-23T16:49:18.618112+02:00 phy001 kernel: Process qemu-kvm (pid:
16949, threadinfo ffff880bafa28000, task ffff880c242e0000)
2011-10-23T16:49:18.618114+02:00 phy001 kernel: Stack:
2011-10-23T16:49:18.618116+02:00 phy001 kernel: ffff88077b1a3ca8
ffffffff81d3cf38 ffff8805e4513f00 ffff880c242e0000
2011-10-23T16:49:18.618119+02:00 phy001 kernel: <0> ffff880c242e0000
ffff880bafa29fd8 ffff8805e4513ef8 0000000000015fd0
2011-10-23T16:49:18.618121+02:00 phy001 kernel: <0> 000000000000cb80
ffff880c242e0000 ffff880bafa28000 ffff880ab43f4038
2011-10-23T16:49:18.618123+02:00 phy001 kernel: Call Trace:
2011-10-23T16:49:18.618126+02:00 phy001 kernel: [<ffffffffa006e5ba>] ?
kvm_vcpu_ioctl+0xfd/0x56e [kvm]
2011-10-23T16:49:18.618129+02:00 phy001 kernel: [<ffffffff81011252>] ?
__switch_to_xtra+0x121/0x141
2011-10-23T16:49:18.618131+02:00 phy001 kernel: [<ffffffff8111ad5f>] ?
vfs_ioctl+0x32/0xa6
2011-10-23T16:49:18.618134+02:00 phy001 kernel: [<ffffffff8111b2d2>] ?
do_vfs_ioctl+0x483/0x4c9
2011-10-23T16:49:18.618137+02:00 phy001 kernel: [<ffffffff8111b36e>] ?
sys_ioctl+0x56/0x79
2011-10-23T16:49:18.618139+02:00 phy001 kernel: [<ffffffff81009c72>] ?
system_call_fastpath+0x16/0x1b
2011-10-23T16:49:18.618142+02:00 phy001 kernel: Code: df ff 90 48 01
00 00 48 8b 55 90 65 48 8b 04 25 90 e8 00 00 f6 04 10 aa 74 05 e8 05
06 f9 e0 f0 41 80 0f 02 fb 66 0f 1f 44 00 00 <ff> 83 b0 00 00 00 48 8b
b5 68 ff ff ff 83 66 14 ef 48 8b 3b 48

Can the necessary fix please be pushed upstream?

Kind regards,

Ruben Kerkhof

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-10-23 18:31                                           ` Ruben Kerkhof
@ 2011-10-23 22:07                                             ` Greg KH
  2011-10-25 22:44                                             ` john stultz
  1 sibling, 0 replies; 58+ messages in thread
From: Greg KH @ 2011-10-23 22:07 UTC (permalink / raw)
  To: Ruben Kerkhof
  Cc: linux-kernel, seto.hidetoshi, Peter Zijlstra, MINOURA Makoto,
	Ingo Molnar, stable, Hervé Commowick, john stultz, Rand,
	Andrew Morton, Willy Tarreau, Faidon Liambotis

On Sun, Oct 23, 2011 at 08:31:32PM +0200, Ruben Kerkhof wrote:
> On Mon, Sep 5, 2011 at 01:26, Faidon Liambotis <paravoid@debian.org> wrote:
> > On Tue, Aug 30, 2011 at 03:38:29PM -0700, Greg KH wrote:
> >> On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote:
> >> > On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
> >> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> >> > >
> >> > > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> >> > > > > thanks for the patch! I'll put this on our testing boxes...
> >> > > >
> >> > > > With a patch that frobs the starting value close to overflowing I hope,
> >> > > > otherwise we'll not hear from you in like 7 months ;-)
> >> > > >
> >> > > > > Are You going to push this upstream so we can ask Greg to push this to
> >> > > > > -stable?
> >> > > >
> >> > > > Yeah, I think we want to commit this with a -stable tag, Ingo?
> >> > >
> >> > > yeah - and we also want a Reported-by tag and an explanation of how
> >> > > it can crash and why it matters in practice. I can then stick it into
> >> > > the urgent branch for Linus. (probably will only hit upstream in the
> >> > > merge window though.)
> >> >
> >> > Has this been pushed or has the problem been solved somehow? Time is
> >> > against us on this bug as more boxes will crash as they reach 200 days
> >> > of uptime...
> >> >
> >> > In any case, feel free to use me as a Reported-by, my full report of the
> >> > problem being <20110430173905.GA25641@tty.gr>.
> >> >
> >> > FWIW and if I understand correctly, my symptoms were caused by *two*
> >> > different bugs:
> >> > a) the 54 bits wraparound at 208 days that Peter fixed above,
> >> > b) a kernel crash at ~215 days related to RT tasks, fixed by
> >> > 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).
> >>
> >> So, what do I do here as part of the .32-longterm kernel?  Is there a
> >> fix that is in Linus's tree that I need to apply here?
> >>
> >> confused,
> >
> > Is this even pushed upstream? I checked Linus' tree and the proposed
> > patch is *not* merged there. I'm not really sure if it was fixed some
> > other way, though. I thought this was intended to be an "urgent" fix or
> > something?
> >
> > Regards,
> > Faidon
> 
> I just had two crashes on two different machines, both with an uptime
> of 208 days.
> Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000
> 
> 2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup -
> CPU#0 stuck for 17163091968s! [qemu-kvm:16949]
> 2011-10-23T16:49:18.618054+02:00 phy001 kernel: Modules linked in:
> xt_limit ebt_log ebt_limit ebt_arp ebtable_filter ebtable_nat ebtables
> ufs nls_utf8 tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
> garp stp llc bonding xt_comment xt_recent ip6t_REJECT
> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm
> ioatdma i2c_i801 igb iTCO_wdt dca iTCO_vendor_support serio_raw
> i2c_core 3w_9xxx [last unloaded: scsi_wait_scan]
> 2011-10-23T16:49:18.618060+02:00 phy001 kernel: CPU 0
> 2011-10-23T16:49:18.618068+02:00 phy001 kernel: Modules linked in:
> xt_limit ebt_log ebt_limit ebt_arp ebtable_filter ebtable_nat ebtables
> ufs nls_utf8 tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
> garp stp llc bonding xt_comment xt_recent ip6t_REJECT
> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm
> ioatdma i2c_i801 igb iTCO_wdt dca iTCO_vendor_support serio_raw
> i2c_core 3w_9xxx [last unloaded: scsi_wait_scan]
> 2011-10-23T16:49:18.618072+02:00 phy001 kernel:
> 2011-10-23T16:49:18.618077+02:00 phy001 kernel: Pid: 16949, comm:
> qemu-kvm Tainted: G   M       2.6.34.8-68.local.fc13.x86_64 #1
> X8DTU/X8DTU
> 2011-10-23T16:49:18.618083+02:00 phy001 kernel: RIP:
> 0010:[<ffffffffa007f92f>]  [<ffffffffa007f92f>]
> kvm_arch_vcpu_ioctl_run+0x764/0xa74 [kvm]
> 2011-10-23T16:49:18.618086+02:00 phy001 kernel: RSP:
> 0018:ffff880bafa29d18  EFLAGS: 00000202
> 2011-10-23T16:49:18.618088+02:00 phy001 kernel: RAX: ffff880002000000
> RBX: ffff880bafa29dc8 RCX: ffff8805e45128a0
> 2011-10-23T16:49:18.618091+02:00 phy001 kernel: RDX: 000000000000cb80
> RSI: 0000000004b2a3a0 RDI: 000000000b630000
> 2011-10-23T16:49:18.618093+02:00 phy001 kernel: RBP: ffffffff8100a60e
> R08: 000000000000002b R09: 00000000760d0735
> 2011-10-23T16:49:18.618095+02:00 phy001 kernel: R10: 0000000000000000
> R11: 0000000000000000 R12: 0000000000000001
> 2011-10-23T16:49:18.618097+02:00 phy001 kernel: R13: ffff880bafa29cc8
> R14: ffffffffa007b536 R15: ffff880bafa29ca8
> 2011-10-23T16:49:18.618100+02:00 phy001 kernel: FS:
> 00007fe92cd38700(0000) GS:ffff880002000000(0000)
> knlGS:fffff880009b8000
> 2011-10-23T16:49:18.618102+02:00 phy001 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 0000000080050033
> 2011-10-23T16:49:18.618104+02:00 phy001 kernel: CR2: 00000000c1a00044
> CR3: 00000006b3f2e000 CR4: 00000000000026e0
> 2011-10-23T16:49:18.618107+02:00 phy001 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-10-23T16:49:18.618109+02:00 phy001 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-10-23T16:49:18.618112+02:00 phy001 kernel: Process qemu-kvm (pid:
> 16949, threadinfo ffff880bafa28000, task ffff880c242e0000)
> 2011-10-23T16:49:18.618114+02:00 phy001 kernel: Stack:
> 2011-10-23T16:49:18.618116+02:00 phy001 kernel: ffff88077b1a3ca8
> ffffffff81d3cf38 ffff8805e4513f00 ffff880c242e0000
> 2011-10-23T16:49:18.618119+02:00 phy001 kernel: <0> ffff880c242e0000
> ffff880bafa29fd8 ffff8805e4513ef8 0000000000015fd0
> 2011-10-23T16:49:18.618121+02:00 phy001 kernel: <0> 000000000000cb80
> ffff880c242e0000 ffff880bafa28000 ffff880ab43f4038
> 2011-10-23T16:49:18.618123+02:00 phy001 kernel: Call Trace:
> 2011-10-23T16:49:18.618126+02:00 phy001 kernel: [<ffffffffa006e5ba>] ?
> kvm_vcpu_ioctl+0xfd/0x56e [kvm]
> 2011-10-23T16:49:18.618129+02:00 phy001 kernel: [<ffffffff81011252>] ?
> __switch_to_xtra+0x121/0x141
> 2011-10-23T16:49:18.618131+02:00 phy001 kernel: [<ffffffff8111ad5f>] ?
> vfs_ioctl+0x32/0xa6
> 2011-10-23T16:49:18.618134+02:00 phy001 kernel: [<ffffffff8111b2d2>] ?
> do_vfs_ioctl+0x483/0x4c9
> 2011-10-23T16:49:18.618137+02:00 phy001 kernel: [<ffffffff8111b36e>] ?
> sys_ioctl+0x56/0x79
> 2011-10-23T16:49:18.618139+02:00 phy001 kernel: [<ffffffff81009c72>] ?
> system_call_fastpath+0x16/0x1b
> 2011-10-23T16:49:18.618142+02:00 phy001 kernel: Code: df ff 90 48 01
> 00 00 48 8b 55 90 65 48 8b 04 25 90 e8 00 00 f6 04 10 aa 74 05 e8 05
> 06 f9 e0 f0 41 80 0f 02 fb 66 0f 1f 44 00 00 <ff> 83 b0 00 00 00 48 8b
> b5 68 ff ff ff 83 66 14 ef 48 8b 3b 48
> 
> Can the necessary fix please be pushed upstream?

I agree, again, can someone please do this?

greg k-h

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-10-23 18:31                                           ` Ruben Kerkhof
  2011-10-23 22:07                                             ` Greg KH
@ 2011-10-25 22:44                                             ` john stultz
  2011-10-25 23:25                                               ` Willy Tarreau
  2011-10-26 18:21                                               ` Ruben Kerkhof
  1 sibling, 2 replies; 58+ messages in thread
From: john stultz @ 2011-10-25 22:44 UTC (permalink / raw)
  To: Ruben Kerkhof
  Cc: Greg KH, linux-kernel, seto.hidetoshi, Peter Zijlstra,
	MINOURA Makoto, Ingo Molnar, stable, Hervé Commowick, Rand,
	Andrew Morton, Willy Tarreau, Faidon Liambotis

On Sun, 2011-10-23 at 20:31 +0200, Ruben Kerkhof wrote:
> On Mon, Sep 5, 2011 at 01:26, Faidon Liambotis <paravoid@debian.org> wrote:
> > On Tue, Aug 30, 2011 at 03:38:29PM -0700, Greg KH wrote:
> >> On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote:
> >> > On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
> >> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> >> > >
> >> > > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> >> > > > > thanks for the patch! I'll put this on our testing boxes...
> >> > > >
> >> > > > With a patch that frobs the starting value close to overflowing I hope,
> >> > > > otherwise we'll not hear from you in like 7 months ;-)
> >> > > >
> >> > > > > Are You going to push this upstream so we can ask Greg to push this to
> >> > > > > -stable?
> >> > > >
> >> > > > Yeah, I think we want to commit this with a -stable tag, Ingo?
> >> > >
> >> > > yeah - and we also want a Reported-by tag and an explanation of how
> >> > > it can crash and why it matters in practice. I can then stick it into
> >> > > the urgent branch for Linus. (probably will only hit upstream in the
> >> > > merge window though.)
> >> >
> >> > Has this been pushed or has the problem been solved somehow? Time is
> >> > against us on this bug as more boxes will crash as they reach 200 days
> >> > of uptime...
> >> >
> >> > In any case, feel free to use me as a Reported-by, my full report of the
> >> > problem being <20110430173905.GA25641@tty.gr>.
> >> >
> >> > FWIW and if I understand correctly, my symptoms were caused by *two*
> >> > different bugs:
> >> > a) the 54 bits wraparound at 208 days that Peter fixed above,
> >> > b) a kernel crash at ~215 days related to RT tasks, fixed by
> >> > 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).
> >>
> >> So, what do I do here as part of the .32-longterm kernel?  Is there a
> >> fix that is in Linus's tree that I need to apply here?
> >>
> >> confused,
> >
> > Is this even pushed upstream? I checked Linus' tree and the proposed
> > patch is *not* merged there. I'm not really sure if it was fixed some
> > other way, though. I thought this was intended to be an "urgent" fix or
> > something?
> >
> > Regards,
> > Faidon
> 
> I just had two crashes on two different machines, both with an uptime
> of 208 days.
> Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000
> 
> 2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup -
> CPU#0 stuck for 17163091968s! [qemu-kvm:16949]

So were these actual crashes, or just softlockup false positives?

I had thought the earlier crash issue (div by zero) fix from PeterZ had
been already pushed upstream, but maybe that was just against 2.6.32 and
not 2.6.33?

The softlockup false positive issue should have been fixed by Peter's
"x86, intel: Don't mark sched_clock() as stable" below. But I'm not
seeing it upstream.  Peter, is this still the right fix?

thanks
-john


From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: x86, intel: Don't mark sched_clock() as stable

Because the x86 sched_clock() implementation wraps at 54 bits and the
scheduler code assumes it wraps at the full 64bits we can get into
trouble after 208 days (~7 months) of uptime. 

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/kernel/cpu/intel.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ed6086e..c8dc48b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -91,8 +91,15 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
        if (c->x86_power & (1 << 8)) {
                set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
                set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
+               /*
+                * Unfortunately our __cycles_2_ns() implementation makes
+                * the raw sched_clock() interface wrap at 54-bits, which
+                * makes it unsuitable for direct use, so disable this
+                * for now.
+                *
                if (!check_tsc_unstable())
                        sched_clock_stable = 1;
+                */
        }
 
        /*




^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-10-25 22:44                                             ` john stultz
@ 2011-10-25 23:25                                               ` Willy Tarreau
  2011-12-02 23:45                                                 ` Greg KH
  2011-10-26 18:21                                               ` Ruben Kerkhof
  1 sibling, 1 reply; 58+ messages in thread
From: Willy Tarreau @ 2011-10-25 23:25 UTC (permalink / raw)
  To: john stultz
  Cc: Ruben Kerkhof, Greg KH, linux-kernel, seto.hidetoshi,
	Peter Zijlstra, MINOURA Makoto, Ingo Molnar, stable,
	Hervé Commowick, Rand, Andrew Morton, Faidon Liambotis

Hi John,

On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> The softlockup false positive issue should have been fixed by Peter's
> "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> seeing it upstream.  Peter, is this still the right fix?

I've not seen any other one proposed, and both you and Peter appeared
to like it. I understood that Ingo was waiting for the merge window to
submit it and I think that it simply got lost.

Ingo, can you confirm ?

Thanks,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-10-25 23:25                                               ` Willy Tarreau
@ 2011-12-02 23:45                                                 ` Greg KH
  2011-12-03  0:02                                                   ` john stultz
  0 siblings, 1 reply; 58+ messages in thread
From: Greg KH @ 2011-12-02 23:45 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: john stultz, Ruben Kerkhof, linux-kernel, seto.hidetoshi,
	Peter Zijlstra, MINOURA Makoto, Ingo Molnar, stable,
	Hervé Commowick, Rand, Andrew Morton, Faidon Liambotis

On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> Hi John,
> 
> On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > The softlockup false positive issue should have been fixed by Peter's
> > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > seeing it upstream.  Peter, is this still the right fix?
> 
> I've not seen any other one proposed, and both you and Peter appeared
> to like it. I understood that Ingo was waiting for the merge window to
> submit it and I think that it simply got lost.
> 
> Ingo, can you confirm ?

I'm totally confused here, what's the status of this, and what exactly
is the patch?

greg k-h

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-12-02 23:45                                                 ` Greg KH
@ 2011-12-03  0:02                                                   ` john stultz
  2011-12-03  1:02                                                     ` Greg KH
  0 siblings, 1 reply; 58+ messages in thread
From: john stultz @ 2011-12-03  0:02 UTC (permalink / raw)
  To: Greg KH
  Cc: Willy Tarreau, Ruben Kerkhof, linux-kernel, seto.hidetoshi,
	Peter Zijlstra, MINOURA Makoto, Ingo Molnar, stable,
	Hervé Commowick, Rand, Andrew Morton, Faidon Liambotis

On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > Hi John,
> > 
> > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > The softlockup false positive issue should have been fixed by Peter's
> > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > seeing it upstream.  Peter, is this still the right fix?
> > 
> > I've not seen any other one proposed, and both you and Peter appeared
> > to like it. I understood that Ingo was waiting for the merge window to
> > submit it and I think that it simply got lost.
> > 
> > Ingo, can you confirm ?
> 
> I'm totally confused here, what's the status of this, and what exactly
> is the patch?

Ingo has the fix from Salman queued in -tip, but I'm not sure why its
not been pushed to Linus yet.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9

thanks
-john


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-12-03  0:02                                                   ` john stultz
@ 2011-12-03  1:02                                                     ` Greg KH
  2011-12-03  7:00                                                       ` Willy Tarreau
  2011-12-05 16:53                                                       ` Ingo Molnar
  0 siblings, 2 replies; 58+ messages in thread
From: Greg KH @ 2011-12-03  1:02 UTC (permalink / raw)
  To: john stultz, Ingo Molnar
  Cc: Willy Tarreau, Ruben Kerkhof, linux-kernel, seto.hidetoshi,
	Peter Zijlstra, MINOURA Makoto, stable, Hervé Commowick,
	Rand, Andrew Morton, Faidon Liambotis

On Fri, Dec 02, 2011 at 04:02:23PM -0800, john stultz wrote:
> On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> > On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > > Hi John,
> > > 
> > > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > > The softlockup false positive issue should have been fixed by Peter's
> > > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > > seeing it upstream.  Peter, is this still the right fix?
> > > 
> > > I've not seen any other one proposed, and both you and Peter appeared
> > > to like it. I understood that Ingo was waiting for the merge window to
> > > submit it and I think that it simply got lost.
> > > 
> > > Ingo, can you confirm ?
> > 
> > I'm totally confused here, what's the status of this, and what exactly
> > is the patch?
> 
> Ingo has the fix from Salman queued in -tip, but I'm not sure why its
> not been pushed to Linus yet.
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9

Wonderful, thanks for pointing this out to me.

Ingo, any idea when this will go to Linus's tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-12-03  1:02                                                     ` Greg KH
@ 2011-12-03  7:00                                                       ` Willy Tarreau
  2011-12-05 16:53                                                       ` Ingo Molnar
  1 sibling, 0 replies; 58+ messages in thread
From: Willy Tarreau @ 2011-12-03  7:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: john stultz, Greg KH, Ruben Kerkhof, linux-kernel, seto.hidetoshi,
	Peter Zijlstra, MINOURA Makoto, stable, Rand, Andrew Morton,
	Faidon Liambotis

On Fri, Dec 02, 2011 at 05:02:32PM -0800, Greg KH wrote:
> On Fri, Dec 02, 2011 at 04:02:23PM -0800, john stultz wrote:
> > On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> > > On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > > > Hi John,
> > > > 
> > > > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > > > The softlockup false positive issue should have been fixed by Peter's
> > > > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > > > seeing it upstream.  Peter, is this still the right fix?
> > > > 
> > > > I've not seen any other one proposed, and both you and Peter appeared
> > > > to like it. I understood that Ingo was waiting for the merge window to
> > > > submit it and I think that it simply got lost.
> > > > 
> > > > Ingo, can you confirm ?
> > > 
> > > I'm totally confused here, what's the status of this, and what exactly
> > > is the patch?
> > 
> > Ingo has the fix from Salman queued in -tip, but I'm not sure why its
> > not been pushed to Linus yet.
> > 
> > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
> 
> Wonderful, thanks for pointing this out to me.
>
> Ingo, any idea when this will go to Linus's tree?

Yes please Ingo, do not delay it any further, this is becoming a real
problem, there are people who monitor their uptime to plan a reboot
before 200 days. We shouldn't need to wait for the next merge window,
the patch is already 15 days old and is a fix for a real-world stability
issue !

Thanks,
Willy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-12-03  1:02                                                     ` Greg KH
  2011-12-03  7:00                                                       ` Willy Tarreau
@ 2011-12-05 16:53                                                       ` Ingo Molnar
  1 sibling, 0 replies; 58+ messages in thread
From: Ingo Molnar @ 2011-12-05 16:53 UTC (permalink / raw)
  To: Greg KH
  Cc: john stultz, Willy Tarreau, Ruben Kerkhof, linux-kernel,
	seto.hidetoshi, Peter Zijlstra, MINOURA Makoto, stable,
	Hervé Commowick, Rand, Andrew Morton, Faidon Liambotis


* Greg KH <greg@kroah.com> wrote:

> On Fri, Dec 02, 2011 at 04:02:23PM -0800, john stultz wrote:
> > On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> > > On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > > > Hi John,
> > > > 
> > > > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > > > The softlockup false positive issue should have been fixed by Peter's
> > > > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > > > seeing it upstream.  Peter, is this still the right fix?
> > > > 
> > > > I've not seen any other one proposed, and both you and Peter appeared
> > > > to like it. I understood that Ingo was waiting for the merge window to
> > > > submit it and I think that it simply got lost.
> > > > 
> > > > Ingo, can you confirm ?
> > > 
> > > I'm totally confused here, what's the status of this, and what exactly
> > > is the patch?
> > 
> > Ingo has the fix from Salman queued in -tip, but I'm not sure why its
> > not been pushed to Linus yet.
> > 
> > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
> 
> Wonderful, thanks for pointing this out to me.
> 
> Ingo, any idea when this will go to Linus's tree?

today if everything goes fine.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-10-25 22:44                                             ` john stultz
  2011-10-25 23:25                                               ` Willy Tarreau
@ 2011-10-26 18:21                                               ` Ruben Kerkhof
  1 sibling, 0 replies; 58+ messages in thread
From: Ruben Kerkhof @ 2011-10-26 18:21 UTC (permalink / raw)
  To: john stultz
  Cc: Greg KH, linux-kernel, seto.hidetoshi, Peter Zijlstra,
	MINOURA Makoto, Ingo Molnar, stable, Hervé Commowick, Rand,
	Andrew Morton, Willy Tarreau, Faidon Liambotis

On Wed, Oct 26, 2011 at 00:44, john stultz <johnstul@us.ibm.com> wrote:
> On Sun, 2011-10-23 at 20:31 +0200, Ruben Kerkhof wrote:

>> I just had two crashes on two different machines, both with an uptime
>> of 208 days.
>> Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000
>>
>> 2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup -
>> CPU#0 stuck for 17163091968s! [qemu-kvm:16949]
>
> So were these actual crashes, or just softlockup false positives?

Just softlockups, I haven't seen the divide_by_zero crash.

Thanks,

Ruben

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 12:53                                 ` Peter Zijlstra
  2011-07-21 18:45                                   ` Ingo Molnar
@ 2011-07-21 19:25                                   ` Nikola Ciprich
  2011-07-21 19:37                                     ` john stultz
  1 sibling, 1 reply; 58+ messages in thread
From: Nikola Ciprich @ 2011-07-21 19:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, john stultz, Willy Tarreau, MINOURA Makoto,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Rand, Nikola Ciprich

[-- Attachment #1: Type: text/plain, Size: 709 bytes --]

Hello Peter,

> With a patch that frobs the starting value close to overflowing I hope,
> otherwise we'll not hear from you in like 7 months ;-)
sure. Which is the best patch to use for testing, You mean john's one?
(http://www.gossamer-threads.com/lists/linux/kernel/1406779#1406779)
or some other?
nik

> Yeah, I think we want to commit this with a -stable tag, Ingo?
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21 19:25                                   ` Nikola Ciprich
@ 2011-07-21 19:37                                     ` john stultz
  0 siblings, 0 replies; 58+ messages in thread
From: john stultz @ 2011-07-21 19:37 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Peter Zijlstra, Ingo Molnar, Willy Tarreau, MINOURA Makoto,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	seto.hidetoshi, Hervé Commowick, Rand

On Thu, 2011-07-21 at 21:25 +0200, Nikola Ciprich wrote:
> Hello Peter,
> 
> > With a patch that frobs the starting value close to overflowing I hope,
> > otherwise we'll not hear from you in like 7 months ;-)
> sure. Which is the best patch to use for testing, You mean john's one?
> (http://www.gossamer-threads.com/lists/linux/kernel/1406779#1406779)
> or some other?

http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=commitdiff;h=3eedec0ad10a856cf48ada066e88871d8ab427b3

Should do it.

thanks
-john



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: 2.6.32.21 - uptime related crashes?
  2011-07-21  7:22                           ` Ingo Molnar
  2011-07-21 12:24                             ` Peter Zijlstra
@ 2011-07-21 19:53                             ` john stultz
  1 sibling, 0 replies; 58+ messages in thread
From: john stultz @ 2011-07-21 19:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Willy Tarreau, MINOURA Makoto / ?$BL'1: ?$B??,
	Andrew Morton, Faidon Liambotis, linux-kernel, stable,
	Nikola Ciprich, seto.hidetoshi, Hervé Commowick, Rand

On Thu, 2011-07-21 at 09:22 +0200, Ingo Molnar wrote:
> * john stultz <johnstul@us.ibm.com> wrote:
> 
> > On Fri, 2011-07-15 at 12:01 +0200, Peter Zijlstra wrote:
> > > On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> > > > 
> > > > Peter/Ingo: Can you take a look at the above and let me know if you find
> > > > it too disagreeable?
> > > 
> > > +static unsigned long long __cycles_2_ns(unsigned long long cyc)
> > > +{
> > > +       unsigned long long ns = 0;
> > > +       struct x86_sched_clock_data *data;
> > > +       int cpu = smp_processor_id();
> > > +
> > > +       rcu_read_lock();
> > > +       data = rcu_dereference(per_cpu(cpu_sched_clock_data, cpu));
> > > +
> > > +       if (unlikely(!data))
> > > +               goto out;
> > > +
> > > +       ns = ((cyc - data->base_cycles) * data->mult) >> CYC2NS_SCALE_FACTOR;
> > > +       ns += data->accumulated_ns;
> > > +out:
> > > +       rcu_read_unlock();
> > > +       return ns;
> > > +}
> > > 
> > > The way I read that we're still not wrapping properly if freq scaling
> > > 'never' happens.
> > 
> > Right, this doesn't address the mult overflow behavior. As I mentioned
> > in the patch that the rework allows for solving that in the future using
> > a (possibly very rare) timer that would accumulate cycles to ns.
> > 
> > This rework just really addresses the multiplication overflow->negative
> > roll under that currently occurs with the cyc2ns_offset value.
> > 
> > > Because then we're wrapping on accumulated_ns + 2^54.
> > > 
> > > Something like resetting base, and adding ns to accumulated_ns and
> > > returning the latter would make more sense.
> > 
> > Although we have to update the base_cycles and accumulated_ns
> > atomically, so its probably not something to do in the sched_clock path.
> 
> Ping, what's going on with this bug? Systems are crashing so we need 
> a quick fix ASAP ...

I think Peter's patch disabling sched_clock_stable is a good approach
for now.

And just to clarify a bit here, while there was a related scheduler
division-by-zero issue which to my understanding has already been fixed
post-2.6.32.21, I have not actually seen any other crash logs connected
to the overflow.

There have been posted softlockup watchdog false-positive messages
(which I have also reproduced), but I've not seen any details on actual
crashes or have I been able to reproduce them using my forced-overflow
patch.

This isn't to say that the overflow isn't causing crashes, but that the
reports have not been clear that there have been crashes by something
other then the div-bv-zero issue.

thanks
-john



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-29 10:02   ` Nikola Ciprich
  2011-04-30  9:36     ` Willy Tarreau
@ 2011-05-06  3:12     ` Hidetoshi Seto
  1 sibling, 0 replies; 58+ messages in thread
From: Hidetoshi Seto @ 2011-05-06  3:12 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Willy Tarreau, linux-kernel mlist, linux-stable mlist,
	Hervé Commowick

Hi Nikola,

Sorry for not replying sooner.

(2011/04/29 19:02), Nikola Ciprich wrote:
> (another CC added)
> 
> Hello Willy!
> 
> I made some statistics of our servers regarding kernel version and uptime.
> Here are some my thoughts:
> - I'm 100% sure this problem wasn't present in kernels <= 2.6.30.x (we've got a lot of boxes with uptimes >600days)
> - I'm 90% sure this problem also wasn't present in 2.6.32.16 (we've got 6 boxes running for 235 to 280days)
> 
> What I'm not sure is, whether this is present in 2.6.19, I have:
> 2 boxes running 2.6.32.19 for 238days and one 2.6.32.20 for 216days.
> I also have a bunch ov 2.6.32.23 boxes, which are now getting close to 200days uptime.
> But I suspect this really is first problematic version, more on it later. 
> First regarding Your question about CONFIG_HZ - we use 250HZ setting, which leads me to following:
> 250 * 60 * 60 * 24 * 199 = 4298400000 which is value a little over 2**32! So maybe some unsingned long variable
> might overflow? Does this make sense?
> 
> And to my suspicion about 2.6.32.19, there is one commit which maybe is related:
> 
> commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239
> Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> Date:   Wed Dec 2 17:28:07 2009 +0900
> 
>     sched, cputime: Introduce thread_group_times()
>     
>     This is a real fix for problem of utime/stime values decreasing
>     described in the thread:
>     
>        http://lkml.org/lkml/2009/11/3/522
>     
>     Now cputime is accounted in the following way:
>     
>      - {u,s}time in task_struct are increased every time when the thread
>        is interrupted by a tick (timer interrupt).
>     
>      - When a thread exits, its {u,s}time are added to signal->{u,s}time,
>        after adjusted by task_times().
>     
>      - When all threads in a thread_group exits, accumulated {u,s}time
>        (and also c{u,s}time) in signal struct are added to c{u,s}time
>        in signal struct of the group's parent.
> .
> .
> .
> 
> I haven't studied this into detail yet, but it seems to me it might really be related. Hidetoshi-san - do You have some opinion about this?
> Could this somehow either create or invoke the problem with overflow of some variable which would lead to division by zero or similar problems?

No.

The commit you pointed is a change for runtimes (cputime_t) accounted for
threads, not for uptime/jiffies/tick. And I suppose any overflow/zero-div
cannot be there:

   if (total) {
 :
       do_div(temp, total);
 :
   }
 :
   p->prev_utime = max(p->prev_utime, utime);

> 
> Any other thoughts?
> 
> best regards
> 
> nik

>From a glance of diff v2.6.32.16..v2.6.32.23, tick_nohz_* could be an
another suspect. Humm...


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [stable] 2.6.32.21 - uptime related crashes?
  2011-04-28 18:34 ` [stable] " Willy Tarreau
  2011-04-29 10:02   ` Nikola Ciprich
@ 2011-05-13 22:08   ` Nicolas Carlier
  1 sibling, 0 replies; 58+ messages in thread
From: Nicolas Carlier @ 2011-05-13 22:08 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Nikola Ciprich, linux-kernel mlist, linux-stable mlist,
	Hervé Commowick

Hi Willy,

On Thu, Apr 28, 2011 at 8:34 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hello Nikola,
>
> On Thu, Apr 28, 2011 at 10:26:25AM +0200, Nikola Ciprich wrote:
>> Hello everybody,
>>
>> I'm trying to solve strange issue, today, my fourth machine running 2.6.32.21 just crashed. What makes the cases similar, apart fromn same kernel version is that all boxes had very similar uptimes: 214, 216, 216, and 224 days. This might just be a coincidence, but I think this might be important.
>
> Interestingly, one of our customers just had two machines who crashed
> yesterday after 212 days and 212+20h respectively. They were running
> debian's 2.6.32-bpo.5-amd64 which is based on 2.6.32.23 AIUI.
>
> The crash looks very similar to the following bug which we have updated :
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=16991
>
> (bugzilla doesn't appear to respond as I'm posting this mail).
>
> The top of your ouput is missing. In our case as in the reports on the bug
> above, there was a divide by zero error. Did you happen to spot this one
> too, or do you just not know ? I observe "divide_error+0x15/0x20" in one
> of your reports, so it's possible that it matches the same pattern at least
> for one trace. Just in case, it would be nice to feed the bugzilla entry
> above.
>
>> Unfortunately I only have backtraces of two crashes (and those are trimmed, sorry), and they do not look as similar as I'd like, but still maybe there is something in common:
>>
>> [<ffffffff81120cc7>] pollwake+0x57/0x60
>> [<ffffffff81046720>] ? default_wake_function+0x0/0x10
>> [<ffffffff8103683a>] __wake_up_common+0x5a/0x90
>> [<ffffffff8103a313>] __wake_up+0x43/0x70
>> [<ffffffffa0321573>] process_masterspan+0x643/0x670 [dahdi]
>> [<ffffffffa0326595>] coretimer_func+0x135/0x1d0 [dahdi]
>> [<ffffffff8105d74d>] run_timer_softirq+0x15d/0x320
>> [<ffffffffa0326460>] ? coretimer_func+0x0/0x1d0 [dahdi]
>> [<ffffffff8105690c>] __do_softirq+0xcc/0x220
>> [<ffffffff8100c40c>] call_softirq+0x1c/0x30
>> [<ffffffff8100e3ba>] do_softirq+0x4a/0x80
>> [<ffffffff810567c7>] irq_exit+0x87/0x90
>> [<ffffffff8100d7b7>] do_IRQ+0x77/0xf0
>> [<ffffffff8100bc53>] ret_from_intr+0x0/Oxa
>> <EUI> [<ffffffffa019e556>] ? acpi_idle_enter_bm+0x273/0x2a1 [processor]
>> [<ffffffffa019e54c>] ? acpi_idle_enter_bm+0x269/0x2a1 [processor]
>> [<ffffffff81280095>] ? cpuidle_idle_call+0xa5/0x150
>> [<ffffffff8100a18f>] ? cpu_idle+0x4f/0x90
>> [<ffffffff81323c95>] ? rest_init+0x75/0x80
>> [<ffffffff81582d7f>] ? start_kernel+0x2ef/0x390
>> [<ffffffff81582271>] ? x86_64_start_reservations+0x81/0xc0
>> [<ffffffff81582386>] ? x86_64_start_kernel+0xd6/0x100
>>
>> this box (actually two of the crashed ones) is using dahdi_dummy module to generate timing for asterisk SW pbx, so maybe it's related to it.
>>
>>
>> [<ffffffff810a5063>] handle_IRQ_event+0x63/0x1c0
>> [<ffffffff810a71ae>] handle_edge_irq+0xce/0x160
>> [<ffffffff8100e1bf>] handle_irq+0x1f/0x30
>> [<ffffffff8100d7ae>] do_IRQ+0x6e/0xf0
>> [<ffffffff8100bc53>] ret_from_intr+0x0/Oxa
>> <EUI> [<ffffffff8133?f?f>] ? _spin_un1ock_irq+0xf/0x40
>> [<ffffffff81337f79>] ? _spin_un1ock_irq+0x9/0x40
>> [<ffffffff81064b9a>] ? exit_signals+0x8a/0x130
>> [<ffffffff8105372e>] ? do_exit+0x7e/0x7d0
>> [<ffffffff8100f8a7>] ? oops_end+0xa7/0xb0
>> [<ffffffff8100faa6>] ? die+0x56/0x90
>> [<ffffffff8100c810>] ? do_trap+0x130/0x150
>> [<ffffffff8100ccca>] ? do_divide_error+0x8a/0xa0
>> [<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
>> [<ffffffff8104400b>] ? cpuacct_charge+0x6b/0x90
>> [<ffffffff8100c045>] ? divide_error+0x15/0x20
>> [<ffffffff8103d227>] ? find_busiest_group+0x3d7/0xa00
>> [<ffffffff8103cfff>] ? find_busiest_group+0x1af/0xa00
>> [<ffffffff81335483>] ? thread_return+0x4ce/0x7bb
>> [<ffffffff8133bec5>] ? do_nanosleep+0x75/0x30
>> [<ffffffff810?1?4e>] ? hrtimer_nanosleep+0x9e/0x120
>> [<ffffffff810?08f0>] ? hrtimer_wakeup+0x0/0x30
>> [<ffffffff810?183f>] ? sys_nanosleep+0x6f/0x80
>>
>> another two don't use it. only similarity I see here is that it seems to be IRQ handling related, but both issues don't have anything in common.
>> Does anybody have an idea on where should I look? Of course I should update all those boxes to (at least) latest 2.6.32.x, and I'll do it for sure, but still I'd first like to know where the problem was, and if it has been fixed, or how to fix it...
>> I'd be gratefull for any help...
>
> There were quite a bunch of scheduler updates recently. We may be lucky and
> hope for the bug to have vanished with the changes, but we may as well see
> the same crash in 7 months :-/
>
> My coworker Hervé (CC'd) who worked on the issue suggests that we might have
> something which goes wrong past a certain uptime (eg: 212 days), which needs
> a special event to be triggered (I/O, process exiting, etc...). I think this
> makes quite some sense.
>
> Could you check your CONFIG_HZ so that we could convert those uptimes to
> jiffies ? Maybe this will ring a bell in someone's head :-/
>

We had encounter the same issue on many nodes of our cluster which ran
on a 2.6.32.8 Debian Kernel. All the servers which had crashed, had
almost the same uptime, more than 200 days. But those which didn't
crashed,  had the same uptime.

Each time, we had the "divide by zero" in "find_busiest_group"

One explanation can be the difference in term of number of  tasks since boot.

As the servers fallen one by one, and as we were not able to reproduce
the problem quickly, we had  use the patch provides by  Andrew
Dickinson.

Regards,

--
Nicolas Carlier

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2011-12-05 16:56 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-28  8:26 2.6.32.21 - uptime related crashes? Nikola Ciprich
2011-04-28 18:34 ` [stable] " Willy Tarreau
2011-04-29 10:02   ` Nikola Ciprich
2011-04-30  9:36     ` Willy Tarreau
2011-04-30 11:22       ` Henrique de Moraes Holschuh
2011-04-30 11:54         ` Willy Tarreau
2011-04-30 12:32           ` Henrique de Moraes Holschuh
2011-04-30 12:02       ` Nikola Ciprich
2011-04-30 15:57         ` Greg KH
2011-04-30 16:08           ` Randy Dunlap
2011-04-30 16:49             ` Willy Tarreau
2011-04-30 18:14               ` Henrique de Moraes Holschuh
2011-04-30 17:39       ` Faidon Liambotis
2011-04-30 20:14         ` Willy Tarreau
2011-05-14 19:04           ` Nikola Ciprich
2011-05-14 20:45             ` Willy Tarreau
2011-05-14 20:59               ` Ben Hutchings
2011-05-14 23:13               ` Nicolas Carlier
2011-05-15 22:56             ` Faidon Liambotis
2011-05-16  6:49               ` Apollon Oikonomopoulos
2011-06-28  2:25         ` john stultz
2011-06-28  5:17           ` Willy Tarreau
2011-06-28  6:19             ` Apollon Oikonomopoulos
2011-07-06  6:15           ` Andrew Morton
2011-07-12  1:18             ` MINOURA Makoto / 箕浦 真
2011-07-12  1:40               ` john stultz
2011-07-12  2:49                 ` MINOURA Makoto / 箕浦 真
2011-07-12  4:19                   ` Willy Tarreau
2011-07-15  0:35                     ` john stultz
2011-07-15  8:30                       ` Peter Zijlstra
2011-07-15 10:02                         ` Peter Zijlstra
2011-07-15 18:03                           ` john stultz
2011-07-15 10:01                       ` Peter Zijlstra
2011-07-15 17:59                         ` john stultz
2011-07-21  7:22                           ` Ingo Molnar
2011-07-21 12:24                             ` Peter Zijlstra
2011-07-21 12:50                               ` Nikola Ciprich
2011-07-21 12:53                                 ` Peter Zijlstra
2011-07-21 18:45                                   ` Ingo Molnar
2011-07-21 19:32                                     ` Nikola Ciprich
2011-08-25 18:56                                     ` Faidon Liambotis
2011-08-30 22:38                                       ` [stable] " Greg KH
2011-09-04 23:26                                         ` Faidon Liambotis
2011-10-23 18:31                                           ` Ruben Kerkhof
2011-10-23 22:07                                             ` Greg KH
2011-10-25 22:44                                             ` john stultz
2011-10-25 23:25                                               ` Willy Tarreau
2011-12-02 23:45                                                 ` Greg KH
2011-12-03  0:02                                                   ` john stultz
2011-12-03  1:02                                                     ` Greg KH
2011-12-03  7:00                                                       ` Willy Tarreau
2011-12-05 16:53                                                       ` Ingo Molnar
2011-10-26 18:21                                               ` Ruben Kerkhof
2011-07-21 19:25                                   ` Nikola Ciprich
2011-07-21 19:37                                     ` john stultz
2011-07-21 19:53                             ` john stultz
2011-05-06  3:12     ` [stable] " Hidetoshi Seto
2011-05-13 22:08   ` Nicolas Carlier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox