From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Christoph Egger" Subject: Re: [PATCH] 3/3: MCA/MCE correctable error handling Date: Thu, 23 Aug 2007 08:57:28 +0200 Message-ID: <200708230857.29134.Christoph.Egger@amd.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: Gavin.Maltby@sun.com, Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org On Wednesday 22 August 2007 18:10:24 Keir Fraser wrote: > On 22/8/07 17:05, "Keir Fraser" wrote: > >> The polling routine that is in the -unstable tree (the version taken > >> from Linux) runs every 15 seconds without adjustments. > >> 1Hz causes too much system load for a healthy system IMO. > >> That's why I introduced the adjustments with use of hw threshold > >> registers to come to a compromise solution. > > > > What's the deal here? Do correctable errors not cause an MCE, yet are > > still detected via the machine-check architecture (albeit by a polling > > method)? The deal here is, detect correctable errors via polling und uncorrectable=20 errors via MCE. This patchset is about correctable errors. > > Are there going to be patches on the Linux side to pick up this MCA inf= o? > > What is Linux going to do with it, apart from log it (which Xen can > > already do itself)? Or is this all Solaris-specific? The general idea is the Dom0 picks up this MCA info and a) uses the error-handling infrastructure provided for the non-virtualized form and b) will use hypercalls to tell xen to also report MCA to a DomU and/or kill a DomU. Some hw features for self-healing can only use Dom0 (because registers sit in the PCI extended config space, Xen doesn't have access to) and some can use Xen itself. I wrote a demo driver that mainly tests that the Dom0 actually receives the MCA info for NetBSD/Xen (Sun prefers to look into BSD licensed code). It should be easy to port it to Linux. > Oh, and is AMD-specific code really needed in non-fatal.c? I though the M= CA > stuff was architectural now rather than vendor specific? If there are > vendor-specific extensions then they belong in the vendor's .c file. AMD-specific is the use of the hw register code. Intel has some additional= =20 machine check MSR's containing the register set. Intel may add a structure to patch 2/3 that make use of them. Should I move the amd polling handler to amd.c ? =2D-=20 AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Gesch=E4ftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplement=E4r: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Gesch=E4ftsf=FChrer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy