From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932540AbbIYQ3o (ORCPT ); Fri, 25 Sep 2015 12:29:44 -0400 Received: from mga03.intel.com ([134.134.136.65]:59722 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756402AbbIYQ3n (ORCPT ); Fri, 25 Sep 2015 12:29:43 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,587,1437462000"; d="scan'208";a="813192182" Date: Fri, 25 Sep 2015 09:29:39 -0700 From: "Raj, Ashok" To: Borislav Petkov Cc: x86-ml , "Luck, Tony" , "linux-kernel@vger.kernel.org" , "linux-edac@vger.kernel.org" , ashok.raj@intel.com Subject: Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts Message-ID: <20150925162939.GB15216@linux.intel.com> References: <1443073720-3940-1-git-send-email-ashok.raj@intel.com> <20150924154722.GD3774@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F32B014AA@ORSMSX114.amr.corp.intel.com> <20150924185243.GJ3774@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F32B014D3@ORSMSX114.amr.corp.intel.com> <20150924192224.GL3774@pd.tnic> <20150924202212.GA13075@linux.intel.com> <20150924210732.GM3774@pd.tnic> <20150924212541.GA13483@linux.intel.com> <20150925082901.GA3568@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150925082901.GA3568@pd.tnic> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 25, 2015 at 10:29:01AM +0200, Borislav Petkov wrote: > > > > > > > The last patch of that series had 2 changes. > > > > 1. Allow offline cpu's to participate in the rendezvous. Since in the odd > > chance the offline cpus have any errors collected we can still report them. > > (we changed mce_start/mce_end to use cpu_present_mask instead of just > > online map). > > This is not necessarily wrong - it is just unusual. Correct!. > > > Without this change today if i were to inject an broadcast MCE > > it ends up hanging, since the offline cpu is also incrementing mce_callin. > > It will always end up more than cpu_online_mask by the number of cpu's > > logically offlined > > Yeah, I'd like to have a bit somewhere which says "don't report MCEs on this > core." But we talked about this already. > > > Consider for e.g. if 2 thread of the core are offline. And the MLC picks up > > What is MLC? Mid Level Cache. This is shared between the 2 threads in that core. As opposed to the Last Level Cache (LLC) which is shared between all the threads in the socket. > > > an error. Other cpus in the socket can't access them. Only way is to let those > > CPUs read and report their own banks as they are core scoped. In upcoming CPUs > > we have some banks that can be thread scoped as well. > > > > Its understood OS doesn't execute any code on those CPUs. But SMI can still > > run on them, and could collect errors that can be logged. > > Well, that is not our problem, is it? > > I mean, SMM wants to stay undetected. When all of a sudden offlined > cores start reporting MCEs, that's going to raise some brows. You are right.. i was simply trying to state how an offline CPU from the OS perspective could still be collecting errors. Only trying to highlight what happens from a platform level. > > Regardless, there are other reasons why offlined cores might report MCEs > - the fact that logical cores share functional units and data flow goes > through them might trip the reporting on those cores. Yadda yadda... Yep! > > > 2. If the cpu is offline, we copied them to mce_log buffer, and them copy > > those out from the rendezvous master during mce_reign(). > > > > If we were to replace this mce_log_add() with gen_pool_add(), then i would > > have to call mce_gen_pool_add() from the offline CPU. This will end up calling > > RCU functions. > > > > We don't want to leave any errors reported by the offline CPU for purpose > > of logging. It is rare, but still interested in capturing those errors if they > > were to happen. > > > > Does this help? > > So first of all, we need to hold this down somewhere, maybe in > Documentation/ to explain why we're running on offlined cores. This is > certainly unusual code and people will ask WTF is going on there. Good idea to document these weird cases. I can do it in either the cpu-hotplug.txt, or in the x86/x86_64/machinecheck file as appropriate. Will add that in my next update. > > Then, I really really don't like a static buffer which we will have > to increase with each new bigger machine configuration. This is just > clumsy. > > It'd be probably much better to make that MCE buffer per CPU. We can > say, we're allowed to log 2-3, hell, 5 errors in it and when we're done > with the rendezvous, an online core goes and flushes out the error > records to gen_pool. That makes good sense.. i will make these per-cpu and also look at clearing them during mce_panic to make sure we flush them in fatal cases. per-cpu certainly scales better than a static number.. > > This scales much better than any artificial MCE_LOG_LEN size. > > Oh, and we either overwrite old errors when we fill up the percpu buffer > or we return. that's something we can discuss later. Or we come up with > a bit smarter strategy of selecting which ones to overwrite. > > Just artificially increasing a static buffer is not good design IMO. Agreed.. thanks, i will get another rev rolling. Cheers, Ashok