From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752399AbbLJTUd (ORCPT ); Thu, 10 Dec 2015 14:20:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39207 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750865AbbLJTUc (ORCPT ); Thu, 10 Dec 2015 14:20:32 -0500 Date: Thu, 10 Dec 2015 20:20:24 +0100 From: Jiri Olsa To: Andy Lutomirski Cc: Thomas Gleixner , Jeff Merkey , LKML , Ingo Molnar , "H. Peter Anvin" , X86 ML , Peter Zijlstra , Andy Lutomirski , Masami Hiramatsu , Steven Rostedt , Borislav Petkov , Jiri Olsa Subject: Re: [PATCH 1/1] Fix int1 recursion when no perf_bp_event is registered Message-ID: <20151210192024.GA5365@krava.redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 10, 2015 at 11:09:21AM -0800, Andy Lutomirski wrote: > On Thu, Dec 10, 2015 at 10:55 AM, Thomas Gleixner wrote: > > Jeff, > > > > On Thu, 10 Dec 2015, Jeff Merkey wrote: > > > >> If an int1 hardware breakpoint exception is triggered, but no perf bp > >> pevent block was registered from arch_install_hw_breakpoint, the > >> system will hard hang with the CPU stuck constantly re-interrupting at > >> the same execution address because the resume flag never gets set, and > >> the NOTIFY_DONE state prevents other int1 handlers, including the > >> default handler in do_debug, from running to handle the condition. > >> Can be reproduced by writing a program that sets an execute breakpoint > >> at schedule() without calling arch_install_hw_breakpoint. > >> > >> The proposed fix checks the dr7 register and sets the resume flag in > >> pt->regs if it determines an executed breakpoint was triggered just in > >> case the check lower down fails. I have seen this bug and its a bug. > > > >> Signed-off-by: jeffmerkey@gmail.com > >> diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c > >> index 50a3fad..6effcae 100644 > >> --- a/arch/x86/kernel/hw_breakpoint.c > >> +++ b/arch/x86/kernel/hw_breakpoint.c > >> @@ -475,6 +475,14 @@ static int hw_breakpoint_handler(struct die_args *args) > >> for (i = 0; i < HBP_NUM; ++i) { > >> if (likely(!(dr6 & (DR_TRAP0 << i)))) > >> continue; > >> + /* > >> + * Set up resume flag to avoid breakpoint recursion when > >> + * returning back to origin in the event an int1 > >> + * exception is triggered and no event handler > >> + * is present. > >> + */ > >> + if ((dr7 & (3 << ((i * 4) + 16))) == 0) > > > > We have proper defines for all of this. See __encode_dr7(). > > > >> + args->regs->flags |= X86_EFLAGS_RF; > > > > If there is a break point installed, then we do the same thing after > > calling perf_bp_event() again. > > On brief inspection, this smells like a microcode bug. Can you send > /proc/cpuinfo output? > > For example, this CPU and microcode combination is known bad: > > processor : 7 > vendor_id : AuthenticAMD > cpu family : 21 > model : 2 > model name : AMD Opteron(tm) Processor 3380 > stepping : 0 > microcode : 0x6000832 > > If this is the issue, I'm not sure we want to be in the business of > working around localized microcode bugs and, if we do, then I think we > should explicitly detect the bug and log about it. seems like the issue we hit some time ago: http://marc.info/?l=linux-kernel&m=143976421117070&w=2 jirka