From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hidetoshi Seto Subject: Re: Hardware Error Kernel Mini-Summit Date: Tue, 18 May 2010 15:52:26 +0900 Message-ID: <4BF2392A.9040409@jp.fujitsu.com> References: <4BF18995.6070008@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4BF18995.6070008@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bluesmoke-devel-bounces@lists.sourceforge.net To: Mauro Carvalho Chehab Cc: Tony Luck , Brent Young , Linux Kernel Mailing List , Borislav Petkov , Ingo Molnar , Matt Domsch , Doug Thompson , Thomas Gleixner , bluesmoke-devel@lists.sourceforge.net, Linux Edac Mailing List List-Id: edac.vger.kernel.org (2010/05/18 3:23), Mauro Carvalho Chehab wrote: > During the last LF Collaboration Summit, we've done a mini-summit [1], > intended to improve the hardware error detection in kernel, currently > provided by MCE and EDAC subsystems. > > The idea of this mini-summit came up after Thomas Gleixner and Ingo > Molnar suggestions that edac and mce should converge into an error > subsystem. > > I'm enclosing the minutes of the meeting, in order to allow it to be > reviewed by other kernel hackers that are interested on the theme but > unfortunately couldn't come to the meeting. > > Btw, during the meeting, it were decided that EDAC ML could better work > if moved to vger, so I'm copying here both the old and the new edac > mailing lists. > > [1] http://events.linuxfoundation.org/lfcs2010/edac > > --- Thank you very much for providing this report. I agree that we should have a well organized error subsystem that covers all error sources in the system and that provides enough simple and powerful API for users. As one of interested absentee, I think I could be of some help to you (e.g. x86 low level). It might be off-topic here, but I'd like to point that you missed the presence of PCIe AER subsystem that handle hardware errors on PCIe devices nowadays (It works well on ppc, x86 and so on). Given that APEI also covers PCIe errors and that some system can map MC registers to PCI configuration space, I think there is no way for the new error subsystem to ignore I/O device errors while it care errors on CPU/memory and cooperate with APEI. Thanks, H.Seto ------------------------------------------------------------------------------ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754856Ab0ERGy3 (ORCPT ); Tue, 18 May 2010 02:54:29 -0400 Received: from fgwmail7.fujitsu.co.jp ([192.51.44.37]:57351 "EHLO fgwmail7.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752699Ab0ERGyZ (ORCPT ); Tue, 18 May 2010 02:54:25 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Message-ID: <4BF2392A.9040409@jp.fujitsu.com> Date: Tue, 18 May 2010 15:52:26 +0900 From: Hidetoshi Seto User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Mauro Carvalho Chehab CC: Linux Kernel Mailing List , bluesmoke-devel@lists.sourceforge.net, Linux Edac Mailing List , Thomas Gleixner , Ingo Molnar , Ben Woodard , Matt Domsch , Doug Thompson , Borislav Petkov , Tony Luck , Brent Young Subject: Re: Hardware Error Kernel Mini-Summit References: <4BF18995.6070008@redhat.com> In-Reply-To: <4BF18995.6070008@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2010/05/18 3:23), Mauro Carvalho Chehab wrote: > During the last LF Collaboration Summit, we've done a mini-summit [1], > intended to improve the hardware error detection in kernel, currently > provided by MCE and EDAC subsystems. > > The idea of this mini-summit came up after Thomas Gleixner and Ingo > Molnar suggestions that edac and mce should converge into an error > subsystem. > > I'm enclosing the minutes of the meeting, in order to allow it to be > reviewed by other kernel hackers that are interested on the theme but > unfortunately couldn't come to the meeting. > > Btw, during the meeting, it were decided that EDAC ML could better work > if moved to vger, so I'm copying here both the old and the new edac > mailing lists. > > [1] http://events.linuxfoundation.org/lfcs2010/edac > > --- Thank you very much for providing this report. I agree that we should have a well organized error subsystem that covers all error sources in the system and that provides enough simple and powerful API for users. As one of interested absentee, I think I could be of some help to you (e.g. x86 low level). It might be off-topic here, but I'd like to point that you missed the presence of PCIe AER subsystem that handle hardware errors on PCIe devices nowadays (It works well on ppc, x86 and so on). Given that APEI also covers PCIe errors and that some system can map MC registers to PCI configuration space, I think there is no way for the new error subsystem to ignore I/O device errors while it care errors on CPU/memory and cooperate with APEI. Thanks, H.Seto