From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Christoph Egger" <Christoph.Egger@amd.com>
Subject: Re: [PATCH] 3/3: MCA/MCE correctable error handling
Date: Wed, 22 Aug 2007 17:56:00 +0200
Message-ID: <200708221756.00902.Christoph.Egger@amd.com>
References: <200708211531.44997.Christoph.Egger@amd.com>
	<200708221100.34795.Christoph.Egger@amd.com>
	<46CC2785.76E4.0078.0@novell.com>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <46CC2785.76E4.0078.0@novell.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: xen-devel@lists.xensource.com
Cc: Gavin.Maltby@sun.com, Keir Fraser <keir@xensource.com>, Jan Beulich <jbeulich@novell.com>
List-Id: xen-devel@lists.xenproject.org

On Wednesday 22 August 2007 12:09:41 Jan Beulich wrote:
> >>> "Christoph Egger" <Christoph.Egger@amd.com> 22.08.07 11:00 >>>
> >
> >On Tuesday 21 August 2007 18:02:54 Jan Beulich wrote:
> >> >+		if (mc_global->mc_flags & MC_FLAG_UNCORRECTABLE)
> >> >+			printk(KERN_EMERG);
> >> >+		else
> >> >+			printk(KERN_INFO);
> >>
> >> KERN_INFO seems gross understatement here - generally, correctable MCs
> >> are considered indicators that within not too distant future
> >> uncorrectable MCs might result, so this generally is a call for action
> >> (and hence shouldn't be hidden with default log level settings).
> >
> >Well, that is what the "old" code did. It used KERN_EMERG for fatal erro=
rs
> >and KERN_INFO in the polling service routine. What do you want me to
> > suggest?
>
> This should be at least KERN_WARNING, probably even KERN_ERR (note
> though that KERN_ERR and KERN_EMERG both resolve to XENLOG_ERR).

I changed to KERN_WARNING. This made the above if block
superflous. Tnx.
I will re-submit this patch as well.

> >> Also, I'm not sure adjusting the polling frequency makes much sense -
> >> 30s seems an awful lot of time to me.
> >
> >It's not clear to me what you are trying to tell me. Please
> > explain/elaborate.
>
> What I'm trying to say is that I'd think this should be polled at a much
> higher frequency (I'd suggest 1Hz), without adjustments. Typically, a
> healthy system will not encounter problems soon after boot, but after
> running for perhaps a very long time (and a system in bad condition is
> likely to encounter problems right away, so wouldn't be affected by
> changing the polling rate). Thus, in the general case, you'd have a
> comparably long latency, during which some kind of (automated) action cou=
ld
> already be taken to preserve data consistency.

The polling routine that is in the -unstable tree (the version taken from=20
Linux) runs every 15 seconds without adjustments.
1Hz causes too much system load for a healthy system IMO.
That's why I introduced the adjustments with use of hw threshold registers
to come to a compromise solution.


=2D-=20
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Gesch=E4ftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplement=E4r:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Gesch=E4ftsf=FChrer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy