From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andi Kleen Subject: Re: Hardware Error Kernel Mini-Summit Date: Mon, 24 May 2010 20:26:03 +0200 Message-ID: <20100524182603.GA3429@gargoyle.fritz.box> References: <4BF2C3D1.10009@redhat.com> <1274204560.17703.82.camel@Joe-Laptop.home> <20100518185305.GA23921@elte.hu> <987664A83D2D224EAE907B061CE93D53C61D1C57@orsmsx505.amr.corp.intel.com> <20100518191802.GG25224@aftab> <20100518222832.GJ22675@basil.fritz.box> <20100519090323.GA18073@basil.fritz.box> <20100524162124.GB7145@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20100524162124.GB7145@sgi.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bluesmoke-devel-bounces@lists.sourceforge.net To: Russ Anderson Cc: Hidetoshi Seto , "Luck, Tony" , Mauro Carvalho Chehab , "Young, Brent" , Linux Kernel Mailing List , "bluesmoke-devel@lists.sourceforge.net" , "Eric W. Biederman" , Doug Thompson , Joe Perches , Thomas Gleixner , Linux Edac Mailing List , Ingo Molnar , Matt Domsch List-Id: edac.vger.kernel.org > Having the infrastructure to automatically off-line pages > is a good thing. The details of where to set the predictive It's already there with a modern mcelog in daemon mode and a recent kernel that supports soft offlining. > threshold likely will be hardware specific (different DIMM > types failing at different rates). It needs to be adjustable. The current default in mcelog is 10 corrected errors per 24h per 4k page or 1 uncorrected error on the page (if your CPU supports recovering from that). It is on by default. You can configure it to be different if you want. -Andi ------------------------------------------------------------------------------