From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH] 3/3: MCA/MCE correctable error handling Date: Wed, 22 Aug 2007 11:09:41 +0100 Message-ID: <46CC2785.76E4.0078.0@novell.com> References: <200708211531.44997.Christoph.Egger@amd.com> <46CB28CE.76E4.0078.0@novell.com> <200708221100.34795.Christoph.Egger@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <200708221100.34795.Christoph.Egger@amd.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Christoph Egger , xen-devel@lists.xensource.com Cc: Gavin.Maltby@sun.com, Keir Fraser List-Id: xen-devel@lists.xenproject.org >>> "Christoph Egger" 22.08.07 11:00 >>> >On Tuesday 21 August 2007 18:02:54 Jan Beulich wrote: >> >+ if (mc_global->mc_flags & MC_FLAG_UNCORRECTABLE) >> >+ printk(KERN_EMERG); >> >+ else >> >+ printk(KERN_INFO); >> >> KERN_INFO seems gross understatement here - generally, correctable MCs = are >> considered indicators that within not too distant future uncorrectable = MCs >> might result, so this generally is a call for action (and hence = shouldn't >> be hidden with default log level settings). > >Well, that is what the "old" code did. It used KERN_EMERG for fatal = errors >and KERN_INFO in the polling service routine. What do you want me to = suggest? This should be at least KERN_WARNING, probably even KERN_ERR (note though that KERN_ERR and KERN_EMERG both resolve to XENLOG_ERR). >> Also, I'm not sure adjusting the polling frequency makes much sense - = 30s >> seems an awful lot of time to me. > >It's not clear to me what you are trying to tell me. Please explain/elabor= ate. What I'm trying to say is that I'd think this should be polled at a much = higher frequency (I'd suggest 1Hz), without adjustments. Typically, a healthy = system will not encounter problems soon after boot, but after running for perhaps = a very long time (and a system in bad condition is likely to encounter = problems right away, so wouldn't be affected by changing the polling rate). Thus, = in the general case, you'd have a comparably long latency, during which some kind of (automated) action could already be taken to preserve data consistency. Jan