From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Christoph Egger" Subject: Re: [PATCH] x86: machine check exception handling Date: Thu, 21 Jun 2007 16:38:47 +0200 Message-ID: <200706211638.47367.Christoph.Egger@amd.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: Gavin Maltby , Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org On Thursday 21 June 2007 16:15:36 Keir Fraser wrote: > On 19/6/07 11:06, "Jan Beulich" wrote: > > Properly handle MCE (connecting the exisiting, but so far unused vendor > > specific handlers). HVM guests don't own CR4.MCE (and hence can't > > suppress the exception) anymore, preventing silent machine shutdown. > > > > This patch won't apply or work without the patch removing i386's NMI > > deferral. > > Applied with the following changes: > 1. Pulled out the common parts of the NMI/MCE asm handlers into a common > subroutine (like all other execption handlers jump at handle_exception to > do the hard work). > 2. Kept do_machine_check() as analog of do_nmi(), which can hide > machine_check_vector definition (and hence I removed all changes inside > arch/x86/cpu/mcheck). I'd like to keep do_machine_check(), even if it > remains no more than a direct call at machine_check_vector(). We could > clean up machine_check_vector() as a separate patch -- not sure if it's > worth it right now, and maybe we're better off keeping close to original > Linux files? That's not possible. The #MC handler and the polling handler (in non-fatal.= c) (are going to) do something completely different than any OS will ever do. See the discussion with the subject "MCA/MCE concept" for more information. > 3. Most contentious, I'm sure: removed VMX changes that would=20 > keep interrupts disabled across NMI/MCE. The reason is simply that SVM do= es > not bother with this. If there is a requirement that NMI/MCE be called wi= th > particular constraints on EFLAGS, then we should make that clear and fix = up > both VMX and SVM in a separate patch. The pain of this is that it would > probably require extra checks on critical vmexit paths. Is it *really* th= at > bad for #MC to get interrupted? In opposition to the polling handler, #MC interruption is *very* bad. A #MC always means, that an uncorrectable ECC error is detected by the hw. First you have to figure out, who is impacted: Is it Xen, Dom0 o= r=20 DomU? In case of Xen and Dom0 you can only do something using hw correction features or crash. In case of DomU, you can kill DomU in the worst case and keep the rest running. Again see the discussion with the subject "MCA/MCE concept" for more=20 information. Christoph =2D-=20 AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Gesch=E4ftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplement=E4r: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Gesch=E4ftsf=FChrer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy