public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>,
	akpm@linux-foundation.org, greg@kroah.com, tglx@linutronix.de,
	hpa@zytor.com, dougthompson@xmission.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64
Date: Fri, 1 May 2009 14:39:56 +0200	[thread overview]
Message-ID: <20090501123956.GA4225@elte.hu> (raw)
In-Reply-To: <20090430124716.GW23223@one.firstfloor.org>


* Andi Kleen <andi@firstfloor.org> wrote:

> > Kconfig, mce code delivers needed error info to edac which, in 
> > turn, goes and decodes the error/does the mapping to DIMM 
> > blocks/supplies DRAM error injection facility for testing 
> > purposes and similar things. That way you have both and they 
> > don't overlap in functionality.
> 
> You can do that, but it's redundant because mcelog can do this 
> this already. [...]

The thing is, when we took up x86 maintenance i had a good look at 
the MCE situation, i checked both the kernel and the user-space 
side.

The kernel side MCE code was in pretty bad shape to begin with, but 
mcelog (the user-space tool) is a big stinking pile of poo on every 
level.

It's one of the worst piece of kernel related code i ever saw. I 
think you wrote all of it, and you should be ashamed of that code, 
and you should be ashamed of the design and you should be ashamed of 
the concept.

It even came with its own 'database' code: mcelog*/db.[ch] is 600+ 
lines of needless code instead of obvious library use. It's NIH and 
self-serving complexity all over.

And the thing is, mcelog/mcedecode never really _did_ anything real 
an useful, other than to:

 1) Confuse kernel users who see a fatal MCE panic, with cryptic, 
    quirky codes, who write that down on paper, then run it through 
    the user-space tool - just to see a piece of information the 
    kernel could have provided already. (if they didnt make any 
    mistakes while writing down the codes)

 2) Decode a quirky, binary MCE record and combine it with DMI data.
    (which the kernel can and should do just fine.)

Yes, i know about tolerant=3 and certain people/companies opting to 
ignore MCE fatality levels and live dangerously (and i also know 
about non-fatal reporting and correction extensions in hw) - but for 
99.999% of the Linux users the whole thing is just needless 
complexity today, that does not offer anything valuable.

And that is really what happens when code is misdesigned and the 
wrong pieces of code are pushed to user-space: a crappy, limited ABI 
and an under-maintained, big pile of junk user-space kit.

The obvious truth is that hardware faults have to be caught, decoded 
and optionally handled by the kernel.

The EDAC code at least has a sane design: it realizes that hardware 
faults _must_ be fully known, decoded and potentially handled in the 
kernel.

Piggyback-ing to user-space is plain idiotic and not defensible. So 
if a piece of hardware capability is handled by the EDAC code, the 
x86 MCE code will step aside and will stay the heck out of that 
business. At least until the two concepts are merged into some sane 
kernel hardware fault logging and handling framework.

And Andi, until you dont grasp such _basic_ design concepts, you 
have no business writing such code really. You should stay the heck 
away from it and you should stop 'advising' people who made the 
right calls while you messed up. It is mcelog that is crap, not the 
EDAC code.

	Ingo

  parent reply	other threads:[~2009-05-01 12:40 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-29 16:54 [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64 Borislav Petkov
2009-04-29 16:54 ` [PATCH 01/21] x86: add methods for writing of an MSR on several CPUs Borislav Petkov
2009-04-29 17:39   ` H. Peter Anvin
2009-05-04 16:46     ` Borislav Petkov
2009-05-04 17:25       ` H. Peter Anvin
2009-05-04 17:53         ` Borislav Petkov
2009-05-04 20:51           ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 02/21] amd64_edac: add PCI config register defines Borislav Petkov
2009-05-04 20:54   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 03/21] amd64_edac: add driver structs Borislav Petkov
2009-05-04 20:38   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 04/21] amd64_edac: add memory scrubber interface Borislav Petkov
2009-05-04 21:02   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 05/21] amd64_edac: add sys addr to memory controller mapping helpers Borislav Petkov
2009-05-04 21:08   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 06/21] amd64_edac: add functionality to compute the DRAM hole Borislav Petkov
2009-05-04 21:22   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 07/21] amd64_edac: add DRAM address type conversion facilities Borislav Petkov
2009-05-04 21:39   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 08/21] amd64_edac: add helper to dump relevant registers Borislav Petkov
2009-05-04 21:43   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 09/21] amd64_edac: assign DRAM chip select base and mask in a family-specific way Borislav Petkov
2009-05-04 21:59   ` Mauro Carvalho Chehab
2009-05-05 10:25     ` Borislav Petkov
2009-04-29 16:54 ` [PATCH 10/21] amd64_edac: add k8-specific methods Borislav Petkov
2009-05-04 22:06   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 11/21] amd64_edac: add f10-and-later methods-p1 Borislav Petkov
2009-05-04 22:10   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 12/21] amd64_edac: add f10-and-later methods-p2 Borislav Petkov
2009-05-04 23:25   ` Mauro Carvalho Chehab
2009-04-29 16:54 ` [PATCH 13/21] amd64_edac: add f10-and-later methods-p3 Borislav Petkov
2009-04-29 18:22   ` Ingo Molnar
2009-04-29 18:24     ` Ingo Molnar
2009-04-29 19:05     ` Andrew Morton
2009-04-29 19:23       ` Ingo Molnar
2009-04-29 19:42         ` Andrew Morton
2009-04-29 19:53           ` Ingo Molnar
2009-04-29 20:47             ` Ingo Molnar
2009-04-30 10:01               ` Borislav Petkov
2009-04-30 10:42                 ` Ingo Molnar
2009-05-04 23:36   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 14/21] amd64_edac: add per-family descriptors Borislav Petkov
2009-05-04 23:39   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 15/21] amd64_edac: add ECC chipkill syndrome mapping table Borislav Petkov
2009-05-04 23:42   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 16/21] amd64_edac: add error decoding logic Borislav Petkov
2009-04-29 18:19   ` Ingo Molnar
2009-05-04 23:48   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 17/21] amd64_edac: add EDAC core-related initializers Borislav Petkov
2009-05-04 23:53   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 18/21] amd64_edac: add ECC reporting initializers Borislav Petkov
2009-05-04 23:59   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 19/21] amd64_edac: add debugging/testing code Borislav Petkov
2009-04-29 18:18   ` Ingo Molnar
2009-04-29 16:55 ` [PATCH 20/21] amd64_edac: add DRAM error injection logic using sysfs Borislav Petkov
2009-04-29 18:17   ` Ingo Molnar
2009-05-05  0:06   ` Mauro Carvalho Chehab
2009-04-29 16:55 ` [PATCH 21/21] amd64_edac: add module registration routines Borislav Petkov
2009-05-05  0:10   ` Mauro Carvalho Chehab
2009-04-29 19:30 ` [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64 Andi Kleen
2009-04-30 11:57   ` Borislav Petkov
2009-04-30 12:21     ` Ingo Molnar
2009-04-30 12:47     ` Andi Kleen
2009-04-30 14:48       ` Aristeu Rozanski
2009-05-01  7:53         ` Borislav Petkov
2009-05-03  0:32           ` Aristeu Rozanski
2009-04-30 18:37       ` Mauro Carvalho Chehab
2009-05-01 12:39       ` Ingo Molnar [this message]
  -- strict thread matches above, loose matches on Subject: below --
2009-04-30 14:23 Doug Thompson
2009-04-30 14:39 Doug Thompson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090501123956.GA4225@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=borislav.petkov@amd.com \
    --cc=dougthompson@xmission.com \
    --cc=greg@kroah.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox