From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932712AbbHDI47 (ORCPT ); Tue, 4 Aug 2015 04:56:59 -0400 Received: from mail-wi0-f181.google.com ([209.85.212.181]:36162 "EHLO mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932255AbbHDI4y (ORCPT ); Tue, 4 Aug 2015 04:56:54 -0400 Date: Tue, 4 Aug 2015 10:56:51 +0200 From: Michal Hocko To: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= Cc: Jonathan Corbet , Peter Zijlstra , Ingo Molnar , "Eric W. Biederman" , "H. Peter Anvin" , Andrew Morton , Thomas Gleixner , Vivek Goyal , "linux-doc@vger.kernel.org" , "x86@kernel.org" , "kexec@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Ingo Molnar , =?utf-8?B?5bmz5p2+6ZuF5bezIC8gSElSQU1BVFXvvIxNQVNBTUk=?= Subject: Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI Message-ID: <20150804085651.GC18509@dhcp22.suse.cz> References: <55B6E2A3.8070004@hitachi.com> <04EAB7311EE43145B2D3536183D1A8445491D5E8@GSjpTKYDCembx31.service.hitachi.net> <20150729082329.GA15801@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A8445491DB5E@GSjpTKYDCembx31.service.hitachi.net> <20150729092157.GC15801@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A8445491F23A@GSjpTKYDCembx31.service.hitachi.net> <20150730074812.GA9387@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A8445491FC55@GSjpTKYDCembx31.service.hitachi.net> <20150730122747.GA3954@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A844549220E7@GSjpTKYDCembx31.service.hitachi.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <04EAB7311EE43145B2D3536183D1A844549220E7@GSjpTKYDCembx31.service.hitachi.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote: > > From: Michal Hocko [mailto:mhocko@kernel.org] [...] > > I am saying that watchdog_overflow_callback might trigger on more CPUs > > and panic from NMI context as well. So this is not reduced to the NMI > > button sends NMI to more CPUs. > > I understand. So, I have to also modify watchdog_overflow_callback > to call nmi_panic(). yes. [...] > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > > > why CPU 130 waits so long. I'll try to consider for a while. > > > > Yes, I do not understand the timing here either and the fact that the > > log is a complete mess in the important parts doesn't help a wee bit. > > I'm interested in where "kernel panic -not syncing: " is. > It may give us a clue. This one is lost in the mangled text: [ 167.843771] U<0>[ 167.843771] hhuh. NMI received for unkn<0><0>[ 167.843765] Uh[ 16NM843774I own rea reived for unknow<0 r 16n 2d 765] Uhhuh. CPU recei11. <0known reason 7. on770] Ker<[ - not rn NMI:nic - not contt sing <0 >[ : Not con.inu437azed and confused, b] Dtryingaed annue fu 167.8ut trying>[ to 7.<0377 167.843775] U<0>[ 167.843776] ]hhu.ived for u3nknown rMason 3 re oived for [nk167.843781] 1. <. N0>[ 167.843781] Uh. NMI recen 3d on CPU 0.i< >[ nowon 3d on] Chhuh.MI eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120. <0nk>no 167.wn843ason 3na s p120. o<0er savi d6 e843ab88] Do yeu have a [ er saving mode e nabl1d?7<4][ 167 84hu94]MIuh. NceIived for unknown reas vdfor 1no3was0>[ 2d 67.84380on CI rUe 12e. ive7d8u3800wn rveaseo f2d on CPo3.r< u>k[o 1 rea6s.o2d8 oo you hn aPve <0st>a e power 1s7.843816] Do yoauv ng moade enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow < reaso0> 2d on [PU1626.41]0> Uh67.h. NM387I] receihed for .nknown reason 2Nn MC U ceived for . [son 2d on CPU 6. < 160>7.8467.84873] Uhhuh. 3MI received 908 o knstra [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed? n