From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752766AbbLNR2p (ORCPT ); Mon, 14 Dec 2015 12:28:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44798 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752043AbbLNR2o (ORCPT ); Mon, 14 Dec 2015 12:28:44 -0500 Date: Mon, 14 Dec 2015 12:28:40 -0500 From: Don Zickus To: Jeff Merkey Cc: LKML , akpm@linux-foundation.org, uobergfe@redhat.com, atomlin@redhat.com, cmetcalf@ezchip.com, fweisbec@gmail.com Subject: Re: [PATCH 1/1] Fix HARD Lockup Firing off while in debugger Message-ID: <20151214172840.GB42652@redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 12, 2015 at 02:08:13PM -0700, Jeff Merkey wrote: > The current touch_nmi_watchdog() function in /kernel/watchdog.c does > not always catch all cases when a processor is spinning in the nmi > handler inside either KGDB, KDB, or MDB. The hrtimer_interrupts_saved > count can still end up matching the previous value in some cases, > resulting in the hard lockup detector tagging processors inside a Hi Jeff, I am confused here, the 'touch_nmi_watchdog()' was supposed to block the check for hrtimer_interrupts from happening. So if the check is still being executed _after_ you executed touch_nmi_watchdog(), it would imply there was about 10 seconds or so of time elapse from the touch command to the hrtimer check. So I am not sure how the below patch would fix this, other than just add another 10 second delay (for a total of 20 seconds) to your timeout? > debugger and executing a panic. The patch below corrects this > problem. I did not add this to the touch_nmi_function directly > becuase of possible affects on timing issues. > > I have tested this patch and it fixes the problem for kernel debuggers > stopping errant hard lockup events when processors are spinning inside > the debugger. The kernel doesn't normal take patches like this without a corresponding user, which I didn't see attached in this patch or a patch series. Cheers, Don > > > Signed-off-by: Jeff V. Merkey > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 18f34cf..b682aab 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -283,6 +283,13 @@ static bool is_hardlockup(void) > __this_cpu_write(hrtimer_interrupts_saved, hrint); > return false; > } > + > +void touch_hardlockup_watchdog(void) > +{ > + __this_cpu_write(hrtimer_interrupts_saved, 0); > +} > +EXPORT_SYMBOL_GPL(touch_hardlockup_watchdog); > + > #endif > > static int is_softlockup(unsigned long touch_ts)