Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
       [not found] <211660974.2010861266660371958.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2010-02-20 10:14 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2010-02-20 10:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Borislav Petkov, Thomas Gleixner, Ingo Molnar, mingo,
	hpa, linux-kernel, linux-tip-commits, Doug Thompson

----- "Andi Kleen" <andi@firstfloor.org> escreveu:

> On Fri, Feb 19, 2010 at 10:14:17PM -0200, Mauro Carvalho Chehab
> wrote:
> 
> Mauro,
> 
> > I was thinking on a way where we could work with the EDAC/MCE
> issues, and
> > a way for us to move ahead. My proposal is to organize an EDAC/MCE
> BoF session
> > or a mini-summit during the Collaboration Summit, in San Francisco,
> with 
> > the interested parties. I suspect that some of us will be there
> already. 
> 
> I didn't plan to be there so far. A BoF is probably a good idea,
> and also looking closely at EDAC together,
> but it would be better at some more kernel focussed conference.
> 
> If everyone else is at that summit I can try to come,
> but it would be likely difficult.

I'm proposing the Collaboration Summit due to its date: it will
happen in April. The next Kernel conf will be on a longer time.

> We could probably do some kind of online BoF shorter time
> (e.g. using some chat setup or on the phone)

We may try to do some discussions via chat before it, but I still
think that having we all at the same room with some whiteboard
will better work.

> > It shouldn't be hard to find some place there for us to take a look
> at the
> > EDAC architecture and come with some proposal.
> 
> Proposal for what exactly?
> 
> Is this for a event interface or for a topology interface or both
> or something else entirely?

We should define the topics. I think we should discuss both topics.
I agree that the better is to represent the hardware per FRU. So,
maybe we can find a better topology representation.

> My personal plan so far was to work on the APEI interface
> and then possibly look at migrating MCE to that infrastructure too,
> while updating mcelog to talk to it. This would be mostly
> addressing events so far.

The better is to have some discussions before using APEI or any other
interface for it. It should be considered that the hardware errors
should be presented on a consistent way not only for newer processors
and memory controllers, but also for the already supported ones.

> > As today is the last day for CFP, I've also submitted there a
> proposal for a
> > panel. If approved, we can use it to collect data from hardware
> error users
> > (sysops and other users that require high availability on their
> services),
> > for us to discuss some strategies to address the issue or to
> summarize what
> > will be discussed on the event.
> 
> You want to collect error rates or want to collect use cases?
> 
> For a serious collection doing it online would probably
> give better coverage.
> 
> I do both to some degree already.

Both data collect are important (and I also do it to some degree),
and we can do it via other means, but a face-to-face meeting may
help to vallidate any ideas we may have about improvements/changes
at the interfaces and features.

If we succeed on such discussions at the planning phase, this will
save us a lot of development time and will help to have an easier
upstream adoption.

So, I think it is a worthy try.

Cheers,
Mauro

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll
@ 2010-01-23 11:33 Andi Kleen
  2010-02-05 23:31 ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI tip-bot for Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2010-01-23 11:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, hpa, linux-kernel, andi, tglx, Borislav Petkov,
	Andreas Herrmann, Hidetoshi Seto, linux-tip-commits,
	Peter Zijlstra, Fr??d??ric Weisbecker

> FYI, this commit triggered a -tip test failure:
> 
>  arch/x86/kernel/cpu/mcheck/mce-xeon75xx.c: In function 'xeon75xx_mce_init':
>  arch/x86/kernel/cpu/mcheck/mce-xeon75xx.c:340: error: implicit declaration of function 'pci_match_id'

Does this patch fix it? (I assume your config is missing CONFIG_PCI)

-Andi

---

commit 748e9c041da32f173f20289c71ef0462b8e97234
Author: Andi Kleen <ak@linux.intel.com>
Date:   Sat Jan 23 12:34:10 2010 +0100

    Make xeon75xx memory driver dependent on PCI
    
    Found by Ingo Molnar's automated tester
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f62db24..ab2d0a8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -834,7 +834,7 @@ config X86_MCE_INTEL
 
 config X86_MCE_XEON75XX
 	tristate "Intel Xeon 7500 series corrected memory error driver"
-	depends on X86_MCE_INTEL
+	depends on X86_MCE_INTEL && PCI
 	---help---
 	   Add support for a Intel Xeon 7500 series specific memory error driver.
 	   This allows to report the DIMM and physical address on a corrected

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-01-23 11:33 [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll Andi Kleen
@ 2010-02-05 23:31 ` tip-bot for Andi Kleen
  2010-02-16 20:47   ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: tip-bot for Andi Kleen @ 2010-02-05 23:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, andi, ak, tglx, mingo

Commit-ID:  757fd770c649b0dfa6eeefc2d5e2ea3119b6be9c
Gitweb:     http://git.kernel.org/tip/757fd770c649b0dfa6eeefc2d5e2ea3119b6be9c
Author:     Andi Kleen <andi@firstfloor.org>
AuthorDate: Sat, 23 Jan 2010 12:33:59 +0100
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Fri, 5 Feb 2010 14:51:53 -0800

x86, mce: Make xeon75xx memory driver dependent on PCI

Found by Ingo Molnar's automated tester.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100123113359.GA29555@one.firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f62db24..ab2d0a8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -834,7 +834,7 @@ config X86_MCE_INTEL
 
 config X86_MCE_XEON75XX
 	tristate "Intel Xeon 7500 series corrected memory error driver"
-	depends on X86_MCE_INTEL
+	depends on X86_MCE_INTEL && PCI
 	---help---
 	   Add support for a Intel Xeon 7500 series specific memory error driver.
 	   This allows to report the DIMM and physical address on a corrected

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-05 23:31 ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI tip-bot for Andi Kleen
@ 2010-02-16 20:47   ` Ingo Molnar
  2010-02-16 22:29     ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2010-02-16 20:47 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, andi, ak, tglx
  Cc: linux-tip-commits, Thomas Gleixner, Doug Thompson,
	Mauro Carvalho Chehab, Borislav Petkov, H. Peter Anvin

* tip-bot for Andi Kleen <andi@firstfloor.org> wrote:

> Commit-ID:  757fd770c649b0dfa6eeefc2d5e2ea3119b6be9c
> Gitweb:     http://git.kernel.org/tip/757fd770c649b0dfa6eeefc2d5e2ea3119b6be9c
> Author:     Andi Kleen <andi@firstfloor.org>
> AuthorDate: Sat, 23 Jan 2010 12:33:59 +0100
> Committer:  H. Peter Anvin <hpa@zytor.com>
> CommitDate: Fri, 5 Feb 2010 14:51:53 -0800
> 
> x86, mce: Make xeon75xx memory driver dependent on PCI
> 
> Found by Ingo Molnar's automated tester.
> 
> Reported-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> LKML-Reference: <20100123113359.GA29555@one.firstfloor.org>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>

As an x86 maintainer i'm NAK-ing your Nehalem MCE patches. I really dont want 
to see this code in v2.6.34, please work on it some more - this should be 
implemented _properly_ and _cleanly_.

Andi, you've ignored my repeated complaints about the cleanliness of the MCE 
code:

  http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-01/msg08454.html

  ( There's been earlier remarks from me on this topic, months ago, the first 
    one more than a year ago, so it's not like you didnt have any advance 
    warning. I suspect i should have NAK-ed earlier. )

and you've ignored what the EDAC developers such as Mauro tried to explain to 
you in that same thread. Your design to expose Intel hardware features sucks 
(because it's essentially non-existent, and because it exposes so little and 
gives users so little benefits) and you should know it.

Please work with Mauro on the Nehalem EDAC bits, they seem rather advanced to 
me for v2.6.34, and _far_ cleaner and more capable as well. See those Intel 
support bits at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core.git master

  Documentation/edac.txt              |  151 +++
  arch/x86/include/asm/pci_x86.h      |    2 +
  arch/x86/kernel/cpu/mcheck/mce_64.c | 1200 +++++++++++++++++++++
  arch/x86/pci/legacy.c               |   43 +-
  drivers/edac/Kconfig                |   19 +
  drivers/edac/Makefile               |    7 +
  drivers/edac/edac_core.h            |    5 +
  drivers/edac/edac_mc_sysfs.c        |    4 +
  drivers/edac/edac_mce.c             |   61 ++
  drivers/edac/i7core_edac.c          | 1966 +++++++++++++++++++++++++++++++++++
  include/linux/edac_mce.h            |   31 +
  include/linux/pci.h                 |    1 +
  include/linux/pci_ids.h             |   19 +
  13 files changed, 3492 insertions(+), 17 deletions(-)

It's a big step forward for Intel CPU features in my opinion, as it is a 
proper, clean driver interface, and i find it highly curious why you are not 
coding for that feature instead.

Once that is done the EDAC code can be pushed by all distros as a common 
interface, as recent Intel CPUs would be supported by it.

I really do not understand why you are trying to keep Intel CPUs in the dark 
ages while most other CPUs are properly supported by the EDAC code ... It 
seems hugely counter-intuitive to me.

Note, any missing functionality on the EDAC side needs a proper driver 
interface to arch/x86 - not this kind of ugly butchered-in 700 lines piece of 
ugly crap that you are trying to push here ... I'd welcome patches for such 
interfaces as they'd further simplify (and also enhance) arch/x86.

( And as i suggested you might also want to work on representing machine
  events as part of the perf events framework. )

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-16 20:47   ` Ingo Molnar
@ 2010-02-16 22:29     ` Andi Kleen
  2010-02-19 10:50       ` Thomas Gleixner
  2010-02-22  7:38       ` Hidetoshi Seto
  0 siblings, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2010-02-16 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, hpa, linux-kernel, andi, tglx, linux-tip-commits,
	Doug Thompson, Mauro Carvalho Chehab, Borislav Petkov

>
> Please work with Mauro on the Nehalem EDAC bits, they seem rather advanced to
> me for v2.6.34, and _far_ cleaner and more capable as well. See those Intel
> support bits at:

Hi Ingo,

core_i7 and EDAC has nothing to do with this code and
it has nothing to do with the problem this patch is
solving.

This is for a different chip (xeon75xx)
which has a completely different memory subsystem
and reports memory errors in a completely different way
than xeon75xx/core_i7.

For core_i7/xeon55xx there is no additional event interface needed;
it's all supplied by the hardware on the existing interfaces.

The point of this code is to annotate the CE events on Xeon 75xx
and to implement specific backend actions (page offlining, triggers)
based on specific events. These backend actions are already implemented
on 55xx without additional changes (no need for EDAC)

EDAC does not provide an event interface that can
be polled, just counts, so this cannot be done with EDAC.
It's simply a topology enumeration with error counts.
mcelog is not a topology interface, it's a event
notification mechanism.

EDAC and mcelog are orthogonal, they don't solve the same
problem.

So your nack is based on incorrect assumptions and doesn't make
sense. What you're asking for cannot be done with current
EDAC as far as I know.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-16 22:29     ` Andi Kleen
@ 2010-02-19 10:50       ` Thomas Gleixner
  2010-02-19 12:17         ` Andi Kleen
  2010-02-22  7:38       ` Hidetoshi Seto
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2010-02-19 10:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, mingo, hpa, linux-kernel, andi, linux-tip-commits,
	Doug Thompson, Mauro Carvalho Chehab, Borislav Petkov

On Tue, 16 Feb 2010, Andi Kleen wrote:
> > 
> > Please work with Mauro on the Nehalem EDAC bits, they seem rather advanced
> > to
> > me for v2.6.34, and _far_ cleaner and more capable as well. See those Intel
> > support bits at:
> 
> Hi Ingo,
> 
> core_i7 and EDAC has nothing to do with this code and
> it has nothing to do with the problem this patch is
> solving.
> 
> This is for a different chip (xeon75xx)
> which has a completely different memory subsystem
> and reports memory errors in a completely different way
> than xeon75xx/core_i7.
> 
> For core_i7/xeon55xx there is no additional event interface needed;
> it's all supplied by the hardware on the existing interfaces.
> 
> The point of this code is to annotate the CE events on Xeon 75xx
> and to implement specific backend actions (page offlining, triggers)
> based on specific events. These backend actions are already implemented
> on 55xx without additional changes (no need for EDAC)
> 
> EDAC does not provide an event interface that can
> be polled, just counts, so this cannot be done with EDAC.
> It's simply a topology enumeration with error counts.
> mcelog is not a topology interface, it's a event
> notification mechanism.
> 
> EDAC and mcelog are orthogonal, they don't solve the same
> problem.
> 
> So your nack is based on incorrect assumptions and doesn't make
> sense. What you're asking for cannot be done with current
> EDAC as far as I know.

It does not matter at all that current EDAC cannot do that right
now. Fact is that you are stubbornly ignoring any request from the x86
maintainers to rework MCE, consolidate it with EDAC and integrate it
into perf as the suitable event logging mechanism.

MCE has no design at all, it's a specialized hack which is limited to
a specific subset of the overall machine health monitoring and
reporting facilities.

You refuse to even think about consolidating the handling of all
health monitoring and reporting facilities into a well designed and
integrated framework. 

Your sole argument is that mce can do it and EDAC or whatever can
not. That's not a technical argument at all. MCE does not become a
better design just because you hacked another feature into it.

Ingo's NAK is completely correct and he has my full support for it.

We do not want new crap in the already horrible MCE code. We simply
request a consolidation of machine health monitoring/reporting
facilities before adding new stuff.

You have been ignoring our technical requests for more than a
year. You are refusing to work with other people on a well designed
solution. You just follow your own agenda and try to squeeze more
stuff into MCE.

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 10:50       ` Thomas Gleixner
@ 2010-02-19 12:17         ` Andi Kleen
  2010-02-19 12:45           ` Borislav Petkov
  2010-02-19 15:46           ` Thomas Gleixner
  0 siblings, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2010-02-19 12:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Ingo Molnar, mingo, hpa, linux-kernel, andi,
	linux-tip-commits, Doug Thompson, Mauro Carvalho Chehab,
	Borislav Petkov

Hi Thomas,

I would appreciate if you could read the whole email
and ideally the references too before replying. I apologize for the length,
but this is a complicated topic.

> and integrate it
> into perf as the suitable event logging mechanism.

The main reason I didn't react to that proposal is
I don't see a clear path to make perf a good error mechanism.

I know there's a tendency that if you're working on something
that you think is cool, to try to force everything else
you're seeing into that model too (I occasionally have such tendencies
too :-) 

But if you take a step back and look at the requirements with a sceptical
eye that's not always the best thing to do.

Requirements for error handling are very different from performance monitoring.

Let me walk you through some of these differences:

USER TOOLS:

The current perf user tools are not suitable for errors: they are not
"always on running in the background" like you need for errors.

They are aimed at a interactive user model which is fine
for performance monitoring (well at least some forms of performance
monitoring), but not for errors.

Yes they could be probably reworked for a "always on" daemon model, but the 
result would be

a) completely different than what you have today in terms of interface
(it would be a lot more like you have with oprofile, and as I understand
one of the main motivations for perf was wide spread dislike of the oprofile
daemon model)
b) likely worse for performance monitoring (unless you fork them into two)
The requirements are simply very different.
c) a lot like what mcelog is today. mcelog today is a always on 
error daemon optimized for error handling, nothing else.

There's no associated error oriented infrastructure like triggers etc.
in perf Yes that could be all implemented, but (b) and (c) above apply.

So yes it could be probably done, but I suspect the result would
not make you happy for performance monitoring.

EVENT INTERFACE I:

The perf interface is aimed at a specific way of filtering events, which 
is not the right interface for errors, because you need usually 
all errors in most (not all) cases. Basically in performance
monitoring typically most events are off and you sometimes
turn them on, in error handling it's exactly the other way around.

Also errors tend to have different behavior from performance
counters, for example a model for a error on a object
is more the "leaky bucket", which is not a good fit for performance.

(I have more details on this in http://halobates.de/plumbers-error.pdf)

OVERHEAD:

The perf subsystem has relatively high overhead today (in terms
of memory size and code size overhead) and is IMHO not suitable
to be always active because of this. 

Errors are very fundamental and error reporting
mechanisms have to be always active, so it's extremely important
that they have very low overhead by default. That's not what
perf's model is: it trades memory size and code size for more
performance. That is fine for optional monitoring (if you
can afford it), but not the right model for an fundamental
"always on" mechanism. For "always on" infrastructure it's better
to be slim.

That said I suspect perf events could be likely put on a serious diet, but it's
unclear if the result would work as well as it does today 
for performance monitoring. You would likely lose some features
optimized for it at least.

EVENT INTERFACE II:

Partly that's because it has a lot of functionality that are not needed 
for errors anyways.  For example error just needs some very simple error 
buffers that can be straight forwardly implemented using kfifos (I did
that already in fact). That's just a few lines, all the functionality
in kernel/perf/* is not really needed.

There's no good way to throttle events per source, like it's needed
for errors.

EVENT INTERFACE III:

Then one of the current issues with mcelog is that it's not straight forward
to add variable length record types with typing etc. This isn't too
big a problem for MCEs (although the DIMM error reporting would have been
slightly nicer with it) but for some other types of errors it's a bigger
issue.

Now the funny thing is (and I keep waiting for Ingo to figure that out :-): 
the perf record format has exactly the same problem as mcelog in this regard. 
It's a untyped binary format where it's only possible to add something
at the end, not a fully extensible typed format with sub records etc. 

A better match would be either netlink with its sub record
(although for various reasons other I don't think it's the best model either) 
or the ASCII based udev sysfs interfaces.

In fact that is what Ingo asked for some time ago (before he
moved to the "everything must be perf" model). He wanted an ASCII
interface (so more like the udev model).  I'm not completely happy with 
that either, but it's probably still one of the better models and could be made
to work. 

It's definitely not perf though.

> year. You are refusing to work with other people on a well designed

First I work with a lot of people on error handling, even if you're
not always in Cc.

We would need to agree to disagree on EDAC being a "well designed
solution) IMHO it has a lot of problems (not just in my opinion
if you read some of the mails e.g. from Borislav he's stating the same)
and it's definitely not the general frame work you're asking for 
In fact in many ways EDAC far more specialized to some specific subset of 
errors than mcelog.

A generic error frame work (that would be neither EDAC nor perf nor
mcelog on the interface level) could be probably done and I have 
some ideas on how to do that properly (e.g. see the link below), 
but it's not a short term project. It needs a lot of design
work to be done properly and also would likely need to evolve
for some time. It would also need a suitable user level infrastructure,
which is actually a larger project than the kernel interfaces.

The patch above was simply intended to solve a specific problem on a specific 
chip.  I don't claim the interface is the best I ever did (definitely not), 
but at least it solves an existing problem in a relatively straight forward
way and I claim there's no clear better solution with today's infrastructure.

How are you suggesting to solve the DIMM error reporting in the short
term (let's say 2.6.34/35 time frame, without major redesigns) ?

-Andi

References:
- Thoughts on future error handling model:
http://halobates.de/plumbers-error.pdf
- mcelog kernel and userland design today:
http://halobates.de/mce-lc09-2.pdf

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 12:17         ` Andi Kleen
@ 2010-02-19 12:45           ` Borislav Petkov
  2010-02-19 13:21             ` Andi Kleen
  2010-02-19 15:46           ` Thomas Gleixner
  1 sibling, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2010-02-19 12:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Andi Kleen, Ingo Molnar, mingo, hpa,
	linux-kernel, linux-tip-commits, Doug Thompson,
	Mauro Carvalho Chehab

On Fri, Feb 19, 2010 at 01:17:34PM +0100, Andi Kleen wrote:

<snip all the perf analysis>

I think you're missing the point - it doesn't have to be
perf. It could just as well be some other tool which _shares_
functionality with perf. See this last mail from Ingo:
http://marc.info/?l=linux-kernel&m=126635420607236 on which you were
also CCed, by the way, where he suggests that we could have a tool
called 'hw' which reuses functionality with perf but concentrates on
error handling and all that RAS functionality in modern CPUs. It should
also have a daemon component etc...

> > year. You are refusing to work with other people on a well designed

Sorry, but from our last discussion on attempting to work towards such
an infrastructure solution I got the same impression as Thomas and Ingo
that you're simply not willing to work together on getting a real thing
done. That's why I stopped bothering - it simply made no sense to me to
waste time in fruitless discussions.

-- 
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 12:45           ` Borislav Petkov
@ 2010-02-19 13:21             ` Andi Kleen
  2010-02-19 15:17               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2010-02-19 13:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, mingo, hpa,
	linux-kernel, linux-tip-commits, Doug Thompson,
	Mauro Carvalho Chehab

Borislav,

> <snip all the perf analysis>
>
> I think you're missing the point - it doesn't have to be
> perf. It could just as well be some other tool which _shares_

That was one of my points, but there were others too about
the suitability of the kernel infra structure and the interfaces.

I don't think the perf syscall interface (nor the internal
implement) in its current form is good for errors and also see no clear
path to make it so (without basically adding another different interface)

> functionality with perf. See this last mail from Ingo:
> http://marc.info/?l=linux-kernel&m=126635420607236 on which you were
> also CCed, by the way, where he suggests that we could have a tool
> called 'hw' which reuses functionality with perf but concentrates on
> error handling and all that RAS functionality in modern CPUs. It should
> also have a daemon component etc...

So you would have different interfaces: you don't really
need new syscalls for this (read/write is fine) and perf_counter_open()
in its current form is completely unsuitable for errors
(unless your error reporting registers look like a x86 PMU --
mine doesn't at least) And different userland. And the kernel internal
requirements are very different too (see previous email)

Where is the commonality with perf?

On the tool side:

I'm working on such a tool already for quite some time. It's
called mcelog. Now it uses an older interface today, but at some
point I would expect it to move to other interfaces too
(e.g. next step for that would be APEI errors)

If you only knew mcelog from a few years ago: it's quite
different today than it was and please look again.

That is the end result will be likely called different
(it doesn't make much sense to call something that handles
all kinds of errors "mcelog") and also some stuff needs
to be more generic, but I suspect it'll share quite some
concepts.

If the only problem is the naming we can probably work something
out? In principle it could be called "hw", but the name
seems awfully generic, especially for a daemon. I was more
tending something like "errord" or so.

On the topology: I was not trying to replace existing
topology tools (like lscpu, lspci etc.). I don't see any
major problems (apart from some details that don't
deserve a redesign) with them.

>
>>> year. You are refusing to work with other people on a well designed
>
> Sorry, but from our last discussion on attempting to work towards such
> an infrastructure solution I got the same impression as Thomas and Ingo
> that you're simply not willing to work together on getting a real thing
> done. That's why I stopped bothering - it simply made no sense to me to
> waste time in fruitless discussions.

Well I keep ignoring suggestions to put more stuff into EDAC,
mostly because I think the EDAC design needs to be thrown out
instead of extended. Are you referring to that?

My impression was that you  got to the same conclusion (at least
for parts of current EDAC like the events) based on your earlier emails.

The current issue is less in enumeration/topology anyways but more
in event handling I would say. In the end topology/enumeration is
the easier part, and most of it is already working quite well.

I'm trying to do things step by step, including for short term
problems extending current interfaces if possible and then longer
term moving to new better interfaces.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 13:21             ` Andi Kleen
@ 2010-02-19 15:17               ` Mauro Carvalho Chehab
  2010-02-19 15:37                 ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2010-02-19 15:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Borislav Petkov, Andi Kleen, Thomas Gleixner, Ingo Molnar, mingo,
	hpa, linux-kernel, linux-tip-commits, Doug Thompson

Andi, 

Andi Kleen wrote:
> Borislav,
>>>> year. You are refusing to work with other people on a well designed
>>
>> Sorry, but from our last discussion on attempting to work towards such
>> an infrastructure solution I got the same impression as Thomas and Ingo
>> that you're simply not willing to work together on getting a real thing
>> done. That's why I stopped bothering - it simply made no sense to me to
>> waste time in fruitless discussions.
> 
> Well I keep ignoring suggestions to put more stuff into EDAC,
> mostly because I think the EDAC design needs to be thrown out
> instead of extended. Are you referring to that?
> 
> My impression was that you  got to the same conclusion (at least
> for parts of current EDAC like the events) based on your earlier emails.
> 
> The current issue is less in enumeration/topology anyways but more
> in event handling I would say. In the end topology/enumeration is
> the easier part, and most of it is already working quite well.
> 
> I'm trying to do things step by step, including for short term
> problems extending current interfaces if possible and then longer
> term moving to new better interfaces.

Ingo and Thomas are right: we need to work together to improve the
error report, and this means that we shouldn't ignore putting more stuff
in EDAC.

EDAC is generic enough to work with different type of memory and memory
controllers, and to provide a consistent interface to describe it on a way
that userspace doesn't need to know what are the error registers at
the hardware, nor how to decode a "magic" error number into something
that has a meaning.

As Boris properly pointed, EDAC has space for improvements, and part of
the perf logic can be used as a start point to give some flash new ideas.

The main issue I see with MCE is at the interface level. I think if we
all cope together, we can converge into a proper interface that will
be accepted upstream.

-- 

Cheers,
Mauro

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 15:17               ` Mauro Carvalho Chehab
@ 2010-02-19 15:37                 ` Andi Kleen
  2010-02-20  0:14                   ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2010-02-19 15:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Andi Kleen, Thomas Gleixner, Ingo Molnar, mingo,
	hpa, linux-kernel, linux-tip-commits, Doug Thompson

>
> EDAC is generic enough to work with different type of memory and memory
> controllers, and to provide a consistent interface to describe it on a way
> that userspace doesn't need to know what are the error registers at
> the hardware, nor how to decode a "magic" error number into something
> that has a meaning.

Well the main problem I have with EDAC is that it has far too much information
(e.g. down to ranks/banks and also too much information on
the internal topology of the memory controller, and it can't even
express some current designs).

For me it looks like it was designed by someone starring at a motherboard/DIMM
semantics plan, and I don't think that's the right level to think about
these things.

Going that deep typically
requires very hardware specific information and in some cases
it's not even possible. I also don't think it's useful information
to present (and it's really the opposite of "abstraction")

I also have yet to see a useful use case where you need to look "inside" a DIMM
on the reporting level. The useful level is typically the "FRU" (something
you can replace), with only some very specific extensions for special
use cases.

There's also no generic way to do the necessary enumeration down to
the level EDAC needs. For some cases hardware specific drivers can be written,
but it's always better if the generic case works in a architectural way.

Then it does all the enumeration on the kernel, but there
are no useful facilities to sync that with a user level representation.
And most of the useful advannced & interesting RAS features I'm interested in
need user level support.

I prefer at least for MCE to stay on the architectural level
with only minor extensions for specific use cases.

Now to address these problems you could throw large parts of EDAC
out (do you mean that with 'flexible enough'?) and then add a actual
event interface (working on the later is my plan)

> As Boris properly pointed, EDAC has space for improvements, and part of
> the perf logic can be used as a start point to give some flash new ideas.

See my analysis several mails up. Which parts of perf do you want
to actually use? I don't see any that's actually directly usable
without major changes.

> The main issue I see with MCE is at the interface level. I think if we
> all cope together, we can converge into a proper interface that will
> be accepted upstream.

Just that we're on the same level, could you spell out in detail
what problems you're seeing with it?

[I'm not claiming there are none, I'm just curious what you think they are]

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 15:37                 ` Andi Kleen
@ 2010-02-20  0:14                   ` Mauro Carvalho Chehab
  2010-02-20  9:01                     ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2010-02-20  0:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Borislav Petkov, Andi Kleen, Thomas Gleixner, Ingo Molnar, mingo,
	hpa, linux-kernel, linux-tip-commits, Doug Thompson

I was thinking on a way where we could work with the EDAC/MCE issues, and
a way for us to move ahead. My proposal is to organize an EDAC/MCE BoF session
or a mini-summit during the Collaboration Summit, in San Francisco, with 
the interested parties. I suspect that some of us will be there already. 

It shouldn't be hard to find some place there for us to take a look at the
EDAC architecture and come with some proposal.

As today is the last day for CFP, I've also submitted there a proposal for a
panel. If approved, we can use it to collect data from hardware error users
(sysops and other users that require high availability on their services),
for us to discuss some strategies to address the issue or to summarize what
will be discussed on the event.

Comments?

Cheers,
Mauro

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-20  0:14                   ` Mauro Carvalho Chehab
@ 2010-02-20  9:01                     ` Andi Kleen
  0 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2010-02-20  9:01 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andi Kleen, Borislav Petkov, Andi Kleen, Thomas Gleixner,
	Ingo Molnar, mingo, hpa, linux-kernel, linux-tip-commits,
	Doug Thompson

On Fri, Feb 19, 2010 at 10:14:17PM -0200, Mauro Carvalho Chehab wrote:

Mauro,

> I was thinking on a way where we could work with the EDAC/MCE issues, and
> a way for us to move ahead. My proposal is to organize an EDAC/MCE BoF session
> or a mini-summit during the Collaboration Summit, in San Francisco, with 
> the interested parties. I suspect that some of us will be there already. 

I didn't plan to be there so far. A BoF is probably a good idea,
and also looking closely at EDAC together,
but it would be better at some more kernel focussed conference.

If everyone else is at that summit I can try to come,
but it would be likely difficult.

We could probably do some kind of online BoF shorter time
(e.g. using some chat setup or on the phone)

> It shouldn't be hard to find some place there for us to take a look at the
> EDAC architecture and come with some proposal.

Proposal for what exactly?

Is this for a event interface or for a topology interface or both
or something else entirely?

My personal plan so far was to work on the APEI interface
and then possibly look at migrating MCE to that infrastructure too,
while updating mcelog to talk to it. This would be mostly
addressing events so far.

> As today is the last day for CFP, I've also submitted there a proposal for a
> panel. If approved, we can use it to collect data from hardware error users
> (sysops and other users that require high availability on their services),
> for us to discuss some strategies to address the issue or to summarize what
> will be discussed on the event.

You want to collect error rates or want to collect use cases?

For a serious collection doing it online would probably
give better coverage.

I do both to some degree already.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-19 12:17         ` Andi Kleen
  2010-02-19 12:45           ` Borislav Petkov
@ 2010-02-19 15:46           ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2010-02-19 15:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Ingo Molnar, mingo, hpa, linux-kernel,
	linux-tip-commits, Doug Thompson, Mauro Carvalho Chehab,
	Borislav Petkov

Andi,

On Fri, 19 Feb 2010, Andi Kleen wrote:
> > and integrate it
> > into perf as the suitable event logging mechanism.
> 
> The main reason I didn't react to that proposal is
> I don't see a clear path to make perf a good error mechanism.

Thanks for confirming the main problem here:

  _You_ do not react, because _you_ do not see a clear path.

So _you_ ignore any request for a major cleanup of that whole MCE mess
and just keep on tinkering with the existing crap.

And you really expect us to tolerate your ignorance forever and just
apply happily more crap to something which should have never been
merged in the first place.

<snip>

> The patch above was simply intended to solve a specific problem on a specific 
> chip.  I don't claim the interface is the best I ever did (definitely not), 
> but at least it solves an existing problem in a relatively straight forward
> way and I claim there's no clear better solution with today's infrastructure.

The patch does not solve a specific problem. It merily adds extra info
retrieved from this specific chip. This is simply a new feature which
you hacked into the existing mce code.

There will always be a next generation chip with a specific new
feature and it does _NOT_ matter at all whether support for such a new
feature is straight forward in your opinion or not. Straight forward
crap is still crap.

The whole point is that we are not longer accepting whacky addons to
MCE w/o seing any collaborative reaction from your side on our
technical requests to fix up the whole thing for real.

You can point me to as many PDFs as you want. I'm not interested in
stuff you are breeding on your own w/o talking to EDAC folks and
trying to address the concerns which were raised by Ingo, myself and
others.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI
  2010-02-16 22:29     ` Andi Kleen
  2010-02-19 10:50       ` Thomas Gleixner
@ 2010-02-22  7:38       ` Hidetoshi Seto
  1 sibling, 0 replies; 14+ messages in thread
From: Hidetoshi Seto @ 2010-02-22  7:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, mingo, hpa, linux-kernel, andi, tglx,
	linux-tip-commits, Doug Thompson, Mauro Carvalho Chehab,
	Borislav Petkov

(2010/02/17 7:29), Andi Kleen wrote:
> This is for a different chip (xeon75xx)
> which has a completely different memory subsystem
> and reports memory errors in a completely different way
> than xeon75xx/core_i7.
> 
> For core_i7/xeon55xx there is no additional event interface needed;
> it's all supplied by the hardware on the existing interfaces.
> 
> The point of this code is to annotate the CE events on Xeon 75xx
> and to implement specific backend actions (page offlining, triggers)
> based on specific events. These backend actions are already implemented
> on 55xx without additional changes (no need for EDAC)

If my understanding is correct, your patch doesn't interact with xeon75xx
processor itself, but with the associated chip (I/O hub, aka Boxboro,
you abbreviated it to BXB) at least at first of all.

Then since MCE codes contain rather "generic, processor oriented" stuffs
while EDAC codes contain relatively "specific, chipset oriented" stuffs,
people can suppose that this "arch/x86/kernel/cpu/mcheck/mce-xeon75xx.c"
could be implemented in different way and the file could have a different
name like "drivers/edac/i7500_edac.c" or so.

However the real issue is that I couldn't figure out why this code
requires new hook in the polling handler, what is PFA, and what kind of
restriction let you to do so.  You noted that "The act of retrieving
the DIMM/PA information can take some time" but I'm not sure it is safe
even if poll handler is called via CMCI.

Anyway, Andi, could you point proper specification or datasheet to
know/check what you are going to do here?
Otherwise I could not distinguish your work from black magic...

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-02-22  7:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <211660974.2010861266660371958.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2010-02-20 10:14 ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI Mauro Carvalho Chehab
2010-01-23 11:33 [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll Andi Kleen
2010-02-05 23:31 ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI tip-bot for Andi Kleen
2010-02-16 20:47   ` Ingo Molnar
2010-02-16 22:29     ` Andi Kleen
2010-02-19 10:50       ` Thomas Gleixner
2010-02-19 12:17         ` Andi Kleen
2010-02-19 12:45           ` Borislav Petkov
2010-02-19 13:21             ` Andi Kleen
2010-02-19 15:17               ` Mauro Carvalho Chehab
2010-02-19 15:37                 ` Andi Kleen
2010-02-20  0:14                   ` Mauro Carvalho Chehab
2010-02-20  9:01                     ` Andi Kleen
2010-02-19 15:46           ` Thomas Gleixner
2010-02-22  7:38       ` Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).