From: Don Zickus <dzickus@redhat.com>
To: Jeff Merkey <linux.mdb@gmail.com>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
uobergfe@redhat.com, atomlin@redhat.com, cmetcalf@ezchip.com,
fweisbec <fweisbec@gmail.com>,
tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
x86@kernel.org, peterz@infradead.org, luto@kernel.org
Subject: Re: [PATCH] Fix spurious hard lockup events while in debugger
Date: Wed, 16 Dec 2015 12:50:00 -0500 [thread overview]
Message-ID: <20151216175000.GM60826@redhat.com> (raw)
In-Reply-To: <CAO6TR8WWq103=Kz+tKv_0PGCbWFiPF_p+EZcD3ZnsABT_Z2iwQ@mail.gmail.com>
On Wed, Dec 16, 2015 at 10:22:18AM -0700, Jeff Merkey wrote:
> On 12/14/15, Jeff Merkey <linux.mdb@gmail.com> wrote:
> > The current touch_nmi_watchdog() function in /kernel/watchdog.c does
> > not always catch all cases when a processor is spinning in the nmi
> > handler inside either KGDB, KDB, or MDB, in particular, the case where
> > a processor is being held by a debugger inside an int1 handler.
> >
> > The hrtimer_interrupts_saved count can still end up matching the
> > hrtime value in some cases, resulting in the hard lockup detector
> > tagging processors inside a debugger and executing a panic.
> >
> > The patch below corrects this problem. I did not add this to
> > the touch_nmi_function directly becuase of possible affects on
> > timing issues since the function is widely used by drivers and
> > modules.
> >
> > I have tested this patch and it fixes the problem for kernel debuggers
> > stopping errant hard lockup events when processors are spinning inside
> > the debugger.
> >
> > Signed-off-by: Jeff Merkey <linux.mdb@gmail.com>
> > ---
> > kernel/watchdog.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 18f34cf..b682aab 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -283,6 +283,13 @@ static bool is_hardlockup(void)
> > __this_cpu_write(hrtimer_interrupts_saved, hrint);
> > return false;
> > }
> > +
> > +void touch_hardlockup_watchdog(void)
> > +{
> > + __this_cpu_write(hrtimer_interrupts_saved, 0);
> > +}
> > +EXPORT_SYMBOL_GPL(touch_hardlockup_watchdog);
> > +
> > #endif
> >
> > static int is_softlockup(unsigned long touch_ts)
> > --
> > 1.8.3.1
> >
> >
>
> I got to the bottom of it. It's related to the hardware I am using.
> One of the processors is faulting and hanging due to an existing bug
> in the hw_breakpoint handler not setting the resume flag (I have
> previously reported it and submitted a patch). This breaks your code,
> but there's nothing you can do about it.
>
> There is a severe bug in hw_breakpoint.c that causes int1 recursion
> and this whole "lazy debug register switching" nonsense does not work
> properly. I am probably the first person to actually test this code
> path robustly. I applied the patch that fixes this bug in
> hw_breakpoint.c and the problem with your code firing off and ignoring
> the touch flag
> went away.
Ah, good to know. Thanks! I'll drop this patch then.
Cheers,
Don
next prev parent reply other threads:[~2015-12-16 17:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-15 3:30 [PATCH] Fix spurious hard lockup events while in debugger Jeff Merkey
2015-12-16 16:57 ` Jeff Merkey
2015-12-16 17:22 ` Jeff Merkey
2015-12-16 17:50 ` Don Zickus [this message]
2015-12-16 17:55 ` Jeff Merkey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151216175000.GM60826@redhat.com \
--to=dzickus@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@redhat.com \
--cc=cmetcalf@ezchip.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux.mdb@gmail.com \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=uobergfe@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.