* Re: Soft lockup issue in Linux 4.1.9 [not found] ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com> @ 2015-10-01 11:43 ` Holger Hoffstätte 2015-10-01 11:52 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Holger Hoffstätte @ 2015-10-01 11:43 UTC (permalink / raw) To: Eric Dumazet Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, linux-kernel, stable, netdev On 10/01/15 13:29, Eric Dumazet wrote: > On Thu, Oct 1, 2015 at 3:59 AM, Holger Hoffstätte > <holger.hoffstaette@googlemail.com> wrote: >> >> On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote: >> >>> On 01. okt. 2015 00:37, Holger Hoffstätte wrote: >>>> On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote: >>>> >>>>> for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux >>>>> 4.1.9, and have some random soft lockup. If this can help : >>>> >>>> Congratulations! You're not the first one to get hit by this, but >>>> you are probably the first one to get a meaningful stacktrace! \o/ >>>> >>>>> [ 204.478380] Call Trace: >>>>> [ 204.478381] <IRQ> >>>>> [ 204.478385] [<ffffffff81076121>] ? try_to_del_timer_sync+0x43/0x4d >>>>> [ 204.478386] [<ffffffff810760de>] ? del_timer+0x4d/0x4d >>>>> [ 204.478388] [<ffffffff8107614b>] ? del_timer_sync+0x20/0x3d >>>> >>>> Can you try to revert >>>> >>>> [PATCH 4.1 157/159] inet: fix races with reqsk timers >>>> >>>> and see how that works for you? I'll do the same on my end. So far the >>>> only thing I ever could gleam was an rcu stall after cpuidle_enter(), >>>> but never anything regarding the timer - though it was definitely >>>> related to NIC activity after idle. >>> >>> I'm running with this patch reverted now as well. 2 hours no issues so >>> far, but I can't conclude anything yet as I've seen it take up to 6+ >>> hours to explode here. As a result the bisect was going veeery slowly. >> >> Now 12+ hours going without problems, never got this far with the patch >> included, as it would usually freeze during idle periods. >> >> As far as I'm concerned this is the culprit and should be reverted in >> 4.1.x, unless Eric can suggest how to fix this. (cc'ed). >> > > Looks an old and known problem... > > Following commit should be sent/added for 4.1 stable tree : > > commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > Author: Eric Dumazet <edumazet@google.com> > Date: Thu Aug 13 15:44:51 2015 -0700 > > inet: fix potential deadlock in reqsk_queue_unlink() > > When replacing del_timer() with del_timer_sync(), I introduced > a deadlock condition : > > reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() > > inet_csk_reqsk_queue_drop() can be called from many contexts, > one being the timer handler itself (reqsk_timer_handler()). > > In this case, del_timer_sync() loops forever. > > Simple fix is to test if timer is pending. > > Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: David S. Miller <davem@davemloft.net> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as we speak. Let's hope that this fixes the lockups. Thanks for the quick reply! Holger ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-01 11:43 ` Soft lockup issue in Linux 4.1.9 Holger Hoffstätte @ 2015-10-01 11:52 ` Eric Dumazet 2015-10-02 6:52 ` Andre Tomt 2015-10-02 20:04 ` Thomas Gleixner 0 siblings, 2 replies; 11+ messages in thread From: Eric Dumazet @ 2015-10-01 11:52 UTC (permalink / raw) To: Holger Hoffstätte Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte <holger.hoffstaette@googlemail.com> wrote: > On 10/01/15 13:29, Eric Dumazet wrote: >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af >> Author: Eric Dumazet <edumazet@google.com> >> Date: Thu Aug 13 15:44:51 2015 -0700 >> >> inet: fix potential deadlock in reqsk_queue_unlink() >> >> When replacing del_timer() with del_timer_sync(), I introduced >> a deadlock condition : >> >> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() >> >> inet_csk_reqsk_queue_drop() can be called from many contexts, >> one being the timer handler itself (reqsk_timer_handler()). >> >> In this case, del_timer_sync() loops forever. >> >> Simple fix is to test if timer is pending. >> >> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers") >> Signed-off-by: Eric Dumazet <edumazet@google.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> > > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > we speak. Let's hope that this fixes the lockups. > It definitely should help ! David, since patch is not yet seen on http://patchwork.ozlabs.org/bundle/davem/stable/?state=* could you please add it to your queue ? Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-01 11:52 ` Eric Dumazet @ 2015-10-02 6:52 ` Andre Tomt 2015-10-02 7:17 ` Holger Hoffstätte 2015-10-02 20:04 ` Thomas Gleixner 1 sibling, 1 reply; 11+ messages in thread From: Andre Tomt @ 2015-10-02 6:52 UTC (permalink / raw) To: Eric Dumazet, Holger Hoffstätte Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev On 01. okt. 2015 13:52, Eric Dumazet wrote: > On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte > <holger.hoffstaette@googlemail.com> wrote: >> On 10/01/15 13:29, Eric Dumazet wrote: > >>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af >>> Author: Eric Dumazet <edumazet@google.com> >>> Date: Thu Aug 13 15:44:51 2015 -0700 >>> >>> inet: fix potential deadlock in reqsk_queue_unlink() <snip> >> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as >> we speak. Let's hope that this fixes the lockups. >> > > It definitely should help ! > > David, since patch is not yet seen on > http://patchwork.ozlabs.org/bundle/davem/stable/?state=* > could you please add it to your queue ? Seems to fix it for me as well. 3 systems have been running varying types of production-like loads with it for 14+ hours without hanging. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-02 6:52 ` Andre Tomt @ 2015-10-02 7:17 ` Holger Hoffstätte 2015-10-02 19:25 ` Wolfgang Walter 2015-10-03 19:14 ` Thomas D. 0 siblings, 2 replies; 11+ messages in thread From: Holger Hoffstätte @ 2015-10-02 7:17 UTC (permalink / raw) To: Andre Tomt, Eric Dumazet Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev On 10/02/15 08:52, Andre Tomt wrote: > On 01. okt. 2015 13:52, Eric Dumazet wrote: >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte >> <holger.hoffstaette@googlemail.com> wrote: >>> On 10/01/15 13:29, Eric Dumazet wrote: >> >>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af >>>> Author: Eric Dumazet <edumazet@google.com> >>>> Date: Thu Aug 13 15:44:51 2015 -0700 >>>> >>>> inet: fix potential deadlock in reqsk_queue_unlink() > <snip> >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as >>> we speak. Let's hope that this fixes the lockups. >>> >> >> It definitely should help ! >> >> David, since patch is not yet seen on >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* >> could you please add it to your queue ? > > Seems to fix it for me as well. 3 systems have been running varying > types of production-like loads with it for 14+ hours without hanging. Just got up, and yes - my systems survived the night as well, no issues. Greg, any chance you can drop this into the pending 4.1.10? Otherwise people will get another broken release. cheers Holger ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-02 7:17 ` Holger Hoffstätte @ 2015-10-02 19:25 ` Wolfgang Walter 2015-10-03 19:14 ` Thomas D. 1 sibling, 0 replies; 11+ messages in thread From: Wolfgang Walter @ 2015-10-02 19:25 UTC (permalink / raw) To: Holger Hoffstätte Cc: Andre Tomt, Eric Dumazet, David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte: > On 10/02/15 08:52, Andre Tomt wrote: > > On 01. okt. 2015 13:52, Eric Dumazet wrote: > >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte > >> > >> <holger.hoffstaette@googlemail.com> wrote: > >>> On 10/01/15 13:29, Eric Dumazet wrote: > >>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > >>>> Author: Eric Dumazet <edumazet@google.com> > >>>> Date: Thu Aug 13 15:44:51 2015 -0700 > >>>> > >>>> inet: fix potential deadlock in reqsk_queue_unlink() > > > > <snip> > > > >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > >>> we speak. Let's hope that this fixes the lockups. > >> > >> It definitely should help ! > >> > >> David, since patch is not yet seen on > >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* > >> could you please add it to your queue ? > > > > Seems to fix it for me as well. 3 systems have been running varying > > types of production-like loads with it for 14+ hours without hanging. > > Just got up, and yes - my systems survived the night as well, no issues. > > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > will get another broken release. > Fixes the problem here, too. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-02 7:17 ` Holger Hoffstätte 2015-10-02 19:25 ` Wolfgang Walter @ 2015-10-03 19:14 ` Thomas D. 2015-10-17 23:41 ` Greg Kroah-Hartman 1 sibling, 1 reply; 11+ messages in thread From: Thomas D. @ 2015-10-03 19:14 UTC (permalink / raw) To: Holger Hoffstätte, Andre Tomt, Eric Dumazet, stable Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, netdev Hi, Holger Hoffstätte wrote: > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > will get another broken release. For me it looks like the request was too late, the patch is not included in 4.1.10. So don't forget to re-apply the patch when doing the upgrade. Greg, do you need a dedicated inclusion request for http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af in 4.1.x or is it already on your list? -Thomas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-03 19:14 ` Thomas D. @ 2015-10-17 23:41 ` Greg Kroah-Hartman 0 siblings, 0 replies; 11+ messages in thread From: Greg Kroah-Hartman @ 2015-10-17 23:41 UTC (permalink / raw) To: Thomas D. Cc: Holger Hoffstätte, Andre Tomt, Eric Dumazet, stable, David S. Miller, Eric W. Biederman, Stephen Hemminger, LKML, netdev On Sat, Oct 03, 2015 at 09:14:16PM +0200, Thomas D. wrote: > Hi, > > Holger Hoffstätte wrote: > > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > > will get another broken release. > > For me it looks like the request was too late, the patch is not included > in 4.1.10. So don't forget to re-apply the patch when doing the upgrade. > > Greg, do you need a dedicated inclusion request for > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > in 4.1.x or is it already on your list? Now applied, thanks. greg k-h ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-01 11:52 ` Eric Dumazet 2015-10-02 6:52 ` Andre Tomt @ 2015-10-02 20:04 ` Thomas Gleixner 2015-10-02 20:59 ` Eric Dumazet 1 sibling, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2015-10-02 20:04 UTC (permalink / raw) To: Eric Dumazet Cc: Holger Hoffstätte, David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev [-- Attachment #1: Type: TEXT/PLAIN, Size: 1309 bytes --] On Thu, 1 Oct 2015, Eric Dumazet wrote: > On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte > <holger.hoffstaette@googlemail.com> wrote: > > On 10/01/15 13:29, Eric Dumazet wrote: > > >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > >> Author: Eric Dumazet <edumazet@google.com> > >> Date: Thu Aug 13 15:44:51 2015 -0700 > >> > >> inet: fix potential deadlock in reqsk_queue_unlink() > >> > >> When replacing del_timer() with del_timer_sync(), I introduced > >> a deadlock condition : > >> > >> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() > >> > >> inet_csk_reqsk_queue_drop() can be called from many contexts, > >> one being the timer handler itself (reqsk_timer_handler()). > >> > >> In this case, del_timer_sync() loops forever. > >> > >> Simple fix is to test if timer is pending. > >> > >> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers") > >> Signed-off-by: Eric Dumazet <edumazet@google.com> > >> Signed-off-by: David S. Miller <davem@davemloft.net> > > > > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > > we speak. Let's hope that this fixes the lockups. > > > > It definitely should help ! What makes sure, that the timer cannot be readded while that timer callback is running? Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-02 20:04 ` Thomas Gleixner @ 2015-10-02 20:59 ` Eric Dumazet 2015-10-02 21:04 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2015-10-02 20:59 UTC (permalink / raw) To: Thomas Gleixner Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote: > What makes sure, that the timer cannot be readded while that timer > callback is running? What is exactly your question ? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-02 20:59 ` Eric Dumazet @ 2015-10-02 21:04 ` Thomas Gleixner 2015-10-02 21:32 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2015-10-02 21:04 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev On Fri, 2 Oct 2015, Eric Dumazet wrote: > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote: > > > What makes sure, that the timer cannot be readded while that timer > > callback is running? > > What is exactly your question ? CPU0 CPU1 timer expires callback add timer timer_pending() == true ===> del_timer_sync() I was just curious how this is prevented as I got lost in the networking code as usual :) Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9 2015-10-02 21:04 ` Thomas Gleixner @ 2015-10-02 21:32 ` Eric Dumazet 0 siblings, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2015-10-02 21:32 UTC (permalink / raw) To: Thomas Gleixner Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller, Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev On Fri, 2015-10-02 at 23:04 +0200, Thomas Gleixner wrote: > On Fri, 2 Oct 2015, Eric Dumazet wrote: > > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote: > > > > > What makes sure, that the timer cannot be readded while that timer > > > callback is running? > > > > What is exactly your question ? > > CPU0 CPU1 > > timer expires > callback > add timer > timer_pending() == true > ===> del_timer_sync() > > I was just curious how this is prevented as I got lost in the > networking code as usual :) Sure ;) I believe this can not happen for following reasons : mod_timer_pinned() is used only when req is created, while timer cannot possibly be running on the same req. The _pinned part is critical because we set the req->refcnt _after_ starting the timer, to avoid being visible and caught from rcu lookups in hash tables. Then, timer might be modified only by mod_timer_pending() from tcp_check_req() : This should not re-start timer if another cpu is in the timer callback. Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-10-17 23:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1443650383.13282.10.camel@daevel.fr>
[not found] ` <560D1223.3070606@googlemail.com>
[not found] ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
2015-10-01 11:43 ` Soft lockup issue in Linux 4.1.9 Holger Hoffstätte
2015-10-01 11:52 ` Eric Dumazet
2015-10-02 6:52 ` Andre Tomt
2015-10-02 7:17 ` Holger Hoffstätte
2015-10-02 19:25 ` Wolfgang Walter
2015-10-03 19:14 ` Thomas D.
2015-10-17 23:41 ` Greg Kroah-Hartman
2015-10-02 20:04 ` Thomas Gleixner
2015-10-02 20:59 ` Eric Dumazet
2015-10-02 21:04 ` Thomas Gleixner
2015-10-02 21:32 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).