All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Holger Hoffstätte" <holger.hoffstaette@googlemail.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: Soft lockup issue in Linux 4.1.9
Date: Thu, 1 Oct 2015 13:43:22 +0200	[thread overview]
Message-ID: <560D1C5A.3050508@googlemail.com> (raw)
In-Reply-To: <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>

On 10/01/15 13:29, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 3:59 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
>>
>> On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote:
>>
>>> On 01. okt. 2015 00:37, Holger Hoffstätte wrote:
>>>> On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote:
>>>>
>>>>> for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux
>>>>> 4.1.9, and have some random soft lockup. If this can help :
>>>>
>>>> Congratulations! You're not the first one to get hit by this, but
>>>> you are probably the first one to get a meaningful stacktrace! \o/
>>>>
>>>>> [  204.478380] Call Trace:
>>>>> [  204.478381]  <IRQ>
>>>>> [  204.478385]  [<ffffffff81076121>] ? try_to_del_timer_sync+0x43/0x4d
>>>>> [  204.478386]  [<ffffffff810760de>] ? del_timer+0x4d/0x4d
>>>>> [  204.478388]  [<ffffffff8107614b>] ? del_timer_sync+0x20/0x3d
>>>>
>>>> Can you try to revert
>>>>
>>>>     [PATCH 4.1 157/159] inet: fix races with reqsk timers
>>>>
>>>> and see how that works for you? I'll do the same on my end. So far the
>>>> only thing I ever could gleam was an rcu stall after cpuidle_enter(),
>>>> but never anything regarding the timer - though it was definitely
>>>> related to NIC activity after idle.
>>>
>>> I'm running with this patch reverted now as well. 2 hours no issues so
>>> far, but I can't conclude anything yet as I've seen it take up to 6+
>>> hours to explode here. As a result the bisect was going veeery slowly.
>>
>> Now 12+ hours going without problems, never got this far with the patch
>> included, as it would usually freeze during idle periods.
>>
>> As far as I'm concerned this is the culprit and should be reverted in
>> 4.1.x, unless Eric can suggest how to fix this. (cc'ed).
>>
> 
> Looks an old and known problem...
> 
> Following commit should be sent/added for 4.1 stable tree :
> 
> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Aug 13 15:44:51 2015 -0700
> 
>     inet: fix potential deadlock in reqsk_queue_unlink()
> 
>     When replacing del_timer() with del_timer_sync(), I introduced
>     a deadlock condition :
> 
>     reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
> 
>     inet_csk_reqsk_queue_drop() can be called from many contexts,
>     one being the timer handler itself (reqsk_timer_handler()).
> 
>     In this case, del_timer_sync() loops forever.
> 
>     Simple fix is to test if timer is pending.
> 
>     Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
we speak. Let's hope that this fixes the lockups.

Thanks for the quick reply!

Holger


  parent reply	other threads:[~2015-10-01 11:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-30 21:59 Soft lockup issue in Linux 4.1.9 Olivier Bonvalet
2015-09-30 22:37 ` Holger Hoffstätte
2015-10-01  4:41   ` Andre Tomt
2015-10-01 10:51     ` Holger Hoffstätte
     [not found] ` <560D1223.3070606@googlemail.com>
     [not found]   ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
2015-10-01 11:43     ` Holger Hoffstätte [this message]
2015-10-01 11:52       ` Eric Dumazet
2015-10-02  6:52         ` Andre Tomt
2015-10-02  7:17           ` Holger Hoffstätte
2015-10-02 19:25             ` Wolfgang Walter
2015-10-02 19:25               ` Wolfgang Walter
2015-10-03 19:14             ` Thomas D.
2015-10-17 23:41               ` Greg Kroah-Hartman
2015-10-17 23:41                 ` Greg Kroah-Hartman
2015-10-02 20:04         ` Thomas Gleixner
2015-10-02 20:59           ` Eric Dumazet
2015-10-02 21:04             ` Thomas Gleixner
2015-10-02 21:32               ` Eric Dumazet
2015-10-08 16:56         ` Christoph Biedl
2015-10-08 19:27           ` Holger Hoffstätte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560D1C5A.3050508@googlemail.com \
    --to=holger.hoffstaette@googlemail.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=edumazet@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.