From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 4DC2A6B004F for ; Mon, 15 Jun 2009 11:15:49 -0400 (EDT) Date: Mon, 15 Jun 2009 17:24:28 +0200 From: Andi Kleen Subject: Re: [PATCH 00/22] HWPOISON: Intro (v5) Message-ID: <20090615152427.GF31969@one.firstfloor.org> References: <20090615024520.786814520@intel.com> <4A35BD7A.9070208@linux.vnet.ibm.com> <20090615042753.GA20788@localhost> <20090615140019.4e405d37@lxorguk.ukuu.org.uk> <20090615132934.GE31969@one.firstfloor.org> <20090615154832.73c89733@lxorguk.ukuu.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090615154832.73c89733@lxorguk.ukuu.org.uk> Sender: owner-linux-mm@kvack.org To: Alan Cox Cc: Andi Kleen , Hugh Dickins , Wu Fengguang , Balbir Singh , Andrew Morton , LKML , Ingo Molnar , Mel Gorman , Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra , Nick Piggin , "riel@redhat.com" , "chris.mason@oracle.com" , "linux-mm@kvack.org" List-ID: > Everyone I knew in the business end of deploying Linux turned on panics > for I/O errors, reboot on panic and all the rest of those. oops=panic already implies panic on all machine check exceptions, so they will be fine then (assuming this is the best strategy for availability for them, which I personally find quite doubtful, but we can discuss this some other time) > Really - so if your design is wrong for the way PPC wants to work what > are we going to do ? It's not a requirement that PPC64 support is there Then we change the code. Or if it's too difficult don't support their stuff. After all it's not cast in stone. That said I doubt the PPC requirements will be much different than what we have. > I'd guess that zSeries has some rather different views on how ECC > failures propogate through the hypervisors for example, including the > fact that a failed page can be unfailed which you don't seem to allow for. That's correct. That's because unpoisioning is quite hard -- you need some kind of synchronization point for all the error handling and that's the poisoned page and if it unposions itself then you need some very heavy weight synchronization to avoid handling errors multiple time. I looked at it, but it's quite messy. Also it's of somewhat dubious value. > > (You can unfail pages on x86 as well it appears by scrubbing them via DMA > - yes ?) Not architectually. Also the other problem is not just unpoisoning them, but finding out if the page is permenantly bad or just temporarily. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org