From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Christoph Egger" Subject: Re: [PATCH] 3/3: MCA/MCE correctable error handling Date: Wed, 22 Aug 2007 17:56:00 +0200 Message-ID: <200708221756.00902.Christoph.Egger@amd.com> References: <200708211531.44997.Christoph.Egger@amd.com> <200708221100.34795.Christoph.Egger@amd.com> <46CC2785.76E4.0078.0@novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <46CC2785.76E4.0078.0@novell.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: Gavin.Maltby@sun.com, Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org On Wednesday 22 August 2007 12:09:41 Jan Beulich wrote: > >>> "Christoph Egger" 22.08.07 11:00 >>> > > > >On Tuesday 21 August 2007 18:02:54 Jan Beulich wrote: > >> >+ if (mc_global->mc_flags & MC_FLAG_UNCORRECTABLE) > >> >+ printk(KERN_EMERG); > >> >+ else > >> >+ printk(KERN_INFO); > >> > >> KERN_INFO seems gross understatement here - generally, correctable MCs > >> are considered indicators that within not too distant future > >> uncorrectable MCs might result, so this generally is a call for action > >> (and hence shouldn't be hidden with default log level settings). > > > >Well, that is what the "old" code did. It used KERN_EMERG for fatal erro= rs > >and KERN_INFO in the polling service routine. What do you want me to > > suggest? > > This should be at least KERN_WARNING, probably even KERN_ERR (note > though that KERN_ERR and KERN_EMERG both resolve to XENLOG_ERR). I changed to KERN_WARNING. This made the above if block superflous. Tnx. I will re-submit this patch as well. > >> Also, I'm not sure adjusting the polling frequency makes much sense - > >> 30s seems an awful lot of time to me. > > > >It's not clear to me what you are trying to tell me. Please > > explain/elaborate. > > What I'm trying to say is that I'd think this should be polled at a much > higher frequency (I'd suggest 1Hz), without adjustments. Typically, a > healthy system will not encounter problems soon after boot, but after > running for perhaps a very long time (and a system in bad condition is > likely to encounter problems right away, so wouldn't be affected by > changing the polling rate). Thus, in the general case, you'd have a > comparably long latency, during which some kind of (automated) action cou= ld > already be taken to preserve data consistency. The polling routine that is in the -unstable tree (the version taken from=20 Linux) runs every 15 seconds without adjustments. 1Hz causes too much system load for a healthy system IMO. That's why I introduced the adjustments with use of hw threshold registers to come to a compromise solution. =2D-=20 AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Gesch=E4ftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplement=E4r: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Gesch=E4ftsf=FChrer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy