From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-owner@vger.kernel.org>
Received: from static.68.134.40.188.clients.your-server.de ([188.40.134.68]:58552
	"EHLO mail02.iobjects.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753907AbbJHT1y (ORCPT
	<rfc822;stable@vger.kernel.org>); Thu, 8 Oct 2015 15:27:54 -0400
Subject: Re: Soft lockup issue in Linux 4.1.9
To: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>,
	Eric Dumazet <edumazet@google.com>
References: <1443650383.13282.10.camel@daevel.fr>
 <560D1223.3070606@googlemail.com>
 <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
 <560D1C5A.3050508@googlemail.com>
 <CANn89iKvTUKZTiWQYbjXLr4NvYKDgd_mKYVzZHiSqk0e0OfGxA@mail.gmail.com>
 <1444322507@msgid.manchmal.in-ulm.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	LKML <linux-kernel@vger.kernel.org>, stable@vger.kernel.org
From: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?=
	<holger.hoffstaette@googlemail.com>
Message-ID: <5616C3B7.9000305@googlemail.com>
Date: Thu, 8 Oct 2015 21:27:51 +0200
MIME-Version: 1.0
In-Reply-To: <1444322507@msgid.manchmal.in-ulm.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Sender: stable-owner@vger.kernel.org
List-ID: <stable.vger.kernel.org>

On 10/08/15 18:56, Christoph Biedl wrote:
> Eric Dumazet wrote...
> 
> [ commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af ]
> 
>> It definitely should help !
> 
> Yesterday, I've experienced issues somewhat similar to this, but I'm
> not entirely sure:
> 
> Four of five systems running 4.1.9 stopped working. No reaction on
> network, keyboard, serial console. In one case, the stack trace as
> below made it to the loghost.
> 
> Two things are quite different. First, the systems had a reasonable
> uptime, about a week.
> 
> And second, the scary part: All incidents happened within a rather
> short time span of three minutes the most, beginning after 16:41:28 and
> before 16:41:54 UTC. So I assumed a brownout first - until I realized
> the systems faded away at slightly different times, and one is at a
> different location. While other systems using different kernel versions
> continued to operate on both sites.
> 
> So, I'd be glad for answers for
> 
> - Is this the same issue or should I be even more afraid?

There's always room for more. :-)

> - What might be the reason for this temporal coincidence? I have no
>   plausible idea.

More bugs?

> Confused,
>     Christoph
> 
> 
>  INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=8932806 c=8932805 q=58491)
>  rcu_sched kthread starved for 5999 jiffies!
>  Task dump for CPU 3:
>  swapper/3       R  running task        0     0      1 0x00000008
>   ffffffff81e396c0 ffff88042dcc3b20 ffffffff810807da 0000000000000003
>   ffffffff81e396c0 ffff88042dcc3b40 ffffffff81083b78 ffff88042dcc3b80
>   0000000000000003 ffff88042dcc3b70 ffffffff810a945c ffff88042dcd5740
>  Call Trace:
>   <IRQ>  [<ffffffff810807da>] sched_show_task+0xaa/0x110
>   [<ffffffff81083b78>] dump_cpu_task+0x38/0x40
>   [<ffffffff810a945c>] rcu_dump_cpu_stacks+0x8c/0xc0
>   [<ffffffff810abf31>] rcu_check_callbacks+0x3b1/0x680
>   [<ffffffff810e7bb7>] ? acct_account_cputime+0x17/0x20
>   [<ffffffff8108484e>] ? account_system_time+0x8e/0x180
>   [<ffffffff810ae4d3>] update_process_times+0x33/0x60
>   [<ffffffff810bcae0>] tick_sched_handle.isra.14+0x30/0x40
>   [<ffffffff810bcbd3>] tick_sched_timer+0x43/0x80
>   [<ffffffff810aea2a>] __run_hrtimer.isra.32+0x4a/0xd0
>   [<ffffffff810af225>] hrtimer_interrupt+0xd5/0x1f0
>   [<ffffffff81034d84>] local_apic_timer_interrupt+0x34/0x60
>  INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=8932806 c=8932805 q=58491)
>  rcu_sched kthread starved for 5999 jiffies!
>  Task dump for CPU 3:
>  swapper/3       R  running task        0     0      1 0x00000008
>   ffffffff81e396c0 ffff88042dcc3b20 ffffffff810807da 0000000000000003
>   ffffffff81e396c0 ffff88042dcc3b40 ffffffff81083b78 ffff88042dcc3b80
>   0000000000000003 ffff88042dcc3b70 ffffffff810a945c ffff88042dcd5740
>  Call Trace:
>   <IRQ>  [<ffffffff810807da>] sched_show_task+0xaa/0x110
>   [<ffffffff81083b78>] dump_cpu_task+0x38/0x40
>   [<ffffffff8103516c>] smp_apic_timer_interrupt+0x3c/0x60
>   [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70
>   [<ffffffff8190c8a9>] ? _raw_spin_unlock_irqrestore+0x9/0x10
>   [<ffffffff810ade58>] try_to_del_timer_sync+0x48/0x60
>   [<ffffffff810adeb2>] ? del_timer_sync+0x42/0x60
>   [<ffffffff810adeba>] del_timer_sync+0x4a/0x60
>   [<ffffffff8178b7da>] inet_csk_reqsk_queue_drop+0x7a/0x1f0
>   [<ffffffff8178ba7f>] reqsk_timer_handler+0x12f/0x290
>   [<ffffffff8178b950>] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0
>   [<ffffffff810ad9e6>] call_timer_fn.isra.26+0x26/0x80
>   [<ffffffff810a945c>] rcu_dump_cpu_stacks+0x8c/0xc0
>   [<ffffffff810abf31>] rcu_check_callbacks+0x3b1/0x680
>   [<ffffffff810e7bb7>] ? acct_account_cputime+0x17/0x20
>   [<ffffffff8108484e>] ? account_system_time+0x8e/0x180
>   [<ffffffff810ae4d3>] update_process_times+0x33/0x60
>   [<ffffffff810bcae0>] tick_sched_handle.isra.14+0x30/0x40
>   [<ffffffff810bcbd3>] tick_sched_timer+0x43/0x80
>   [<ffffffff810aea2a>] __run_hrtimer.isra.32+0x4a/0xd0
>   [<ffffffff810af225>] hrtimer_interrupt+0xd5/0x1f0
>   [<ffffffff81034d84>] local_apic_timer_interrupt+0x34/0x60
>   [<ffffffff810ae1ae>] run_timer_softirq+0x18e/0x220
>   [<ffffffff81060b1a>] __do_softirq+0xda/0x1f0
>   [<ffffffff81060e16>] irq_exit+0x76/0xa0
>   [<ffffffff81035175>] smp_apic_timer_interrupt+0x45/0x60
>   [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70
>   <EOI>  [<ffffffff810844be>] ? sched_clock_cpu+0x9e/0xb0
>   [<ffffffff8100bc15>] ? amd_e400_idle+0x35/0xd0
>   [<ffffffff8100bc13>] ? amd_e400_idle+0x33/0xd0
>   [<ffffffff8100c42a>] arch_cpu_idle+0xa/0x10
>   [<ffffffff810929e3>] cpu_startup_entry+0x2c3/0x330
>   [<ffffffff8103516c>] smp_apic_timer_interrupt+0x3c/0x60
>   [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70
>   [<ffffffff8190c8a9>] ? _raw_spin_unlock_irqrestore+0x9/0x10
>   [<ffffffff810ade58>] try_to_del_timer_sync+0x48/0x60
>   [<ffffffff810adeb2>] ? del_timer_sync+0x42/0x60
>   [<ffffffff810adeba>] del_timer_sync+0x4a/0x60
>   [<ffffffff8178b7da>] inet_csk_reqsk_queue_drop+0x7a/0x1f0
>   [<ffffffff8178ba7f>] reqsk_timer_handler+0x12f/0x290
>   [<ffffffff8178b950>] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0
>   [<ffffffff810ad9e6>] call_timer_fn.isra.26+0x26/0x80
>   [<ffffffff810332dc>] start_secondary+0x17c/0x1a0
>   [<ffffffff810ae1ae>] run_timer_softirq+0x18e/0x220
>   [<ffffffff81060b1a>] __do_softirq+0xda/0x1f0
>   [<ffffffff81060e16>] irq_exit+0x76/0xa0
>   [<ffffffff81035175>] smp_apic_timer_interrupt+0x45/0x60
>   [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70
>   <EOI>  [<ffffffff810844be>] ? sched_clock_cpu+0x9e/0xb0
>   [<ffffffff8100bc15>] ? amd_e400_idle+0x35/0xd0
>   [<ffffffff8100bc13>] ? amd_e400_idle+0x33/0xd0
>   [<ffffffff8100c42a>] arch_cpu_idle+0xa/0x10
>   [<ffffffff810929e3>] cpu_startup_entry+0x2c3/0x330
>   [<ffffffff810332dc>] start_secondary+0x17c/0x1a0
> 

The timer fixes were followups to a patch that went into 4.1 called
"tcp/dccp: get rid of central timewait timer", and it seems there were
a few more patches in that area very recently.

So after some git spelunking I am now running with the following patches
on top of 4.1.10 + 83fccfc3940.. (for the lockups), in the following
order:

fc01538f9fb75572c969ca9988176ffc2a8741d6 simplify timewait refcounting
dbe7faa4045ea83a37b691b12bb02a8f86c2d2e9 inet_twsk_deschedule factorization
29c6852602e259d2c1882f320b29d5c3fec0de04 fix races in reqsk_queue_hash_req()
ed2e923945892a8372ab70d2f61d364b0b6d9054 fix timewait races in timer handling

They may not all be required for the particular problem you just summoned,
but (from what I could tell) are required to apply everything properly.
They certainly can't make things worse. :-)

Oh and while you're at it you can apply these l33t cubic fixes :-)
30927520dbae297182990bb21d08762bcc35ce1d better follow cubic curve after idle period
c2e7204d180f8efc80f27959ca9cf16fa17f67db do not set epoch_start in the future
 
I've been running these on 3 machines for almost 10 minutes without issue,
so they are totally safe to go into production right away.

-h