All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>,
	Borislav Petkov <bp@amd64.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Thomas Gleixner <tglx@linutronix.de>,
	H Peter Anvin <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Van De Ven, Arjan" <arjan.van.de.ven@intel.com>,
	"Siddha, Suresh B" <suresh.b.siddha@intel.com>,
	"Brown, Len" <len.brown@intel.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-pm <linux-pm@vger.kernel.org>, x86 <x86@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	"Herrmann3, Andreas" <Andreas.Herrmann3@amd.com>
Subject: Re: [PATCH v4 0/7] x86: BSP or CPU0 online/offline
Date: Thu, 8 Dec 2011 05:43:03 +0100	[thread overview]
Message-ID: <20111208044303.GA9485@elte.hu> (raw)
In-Reply-To: <0207C53569FE594381A4F2EB66570B2A018EF3B51C@orsmsx508.amr.corp.intel.com>


* Luck, Tony <tony.luck@intel.com> wrote:

> > The question is, how realistically does this report true CPU 
> > troubles, statistically? The on-die cache might have the 
> > highest transistor count, but it's not under nearly the same 
> > thermal stress as functional units.
> >
> > If 90% of all hard CPU failures can be predicted that way 
> > then it's probably useful. If it's only 20%, then not so 
> > much.
> 
> Intel doesn't release error rates - so I can't help with data 
> here.

Well, precise data won't be needed - but we need *something* 
indicative to justify the feature - faith alone won't be enough.

Is there any third party research on this? I remember that 
Google released hard drive failure stats a few years ago, maybe 
there's some approximate data about CPU "soft" failure rates. 
Even anecdotal data and speculation/estimation would be a start 
- it could be contradicted later on by more precise data, once 
people start using the "generic CPU hot-unplug" feature. (which 
this feature should really be named, instead of the 'BSP unplug' 
name.)

> > Also, it's still all theoretical until there's systems out 
> > there where the CPU socket is physically hotpluggable. If 
> > there's such plans in the works then sure, theory becomes 
> > reality and then it's all useful - and then we can do these 
> > patches (and more).
> 
> No - physical removal of the cpu is not a requirement for this 
> to be useful. [...]

Indeed, you are right, i stand corrected there.

Okay, i'm convinced, i guess we can do this.

> [...]
>
> Physical removal of the cpu is a problem for Linux since 
> Nehalem (when memory controller moved on-die). Take away the 
> cpu, and you lose access to the memory connected to that 
> socket - and we don't have general solutions for memory 
> removal.

It's possible technically but not the easiest of features - also 
i suspect Linus would object to the naive breaking of the 
semi-linear kernel mapping we do today ;-)

But if someone implements that in a sane way, using at least 2MB 
granular mappings [or maybe ORDER_MAX granular mappings], which 
keeps 2MB TLBs, and uses a quick hash table for __pa() and 
__va(), i would definitely take a look at how ugly it ends up 
being. Our hibernation code already gives us a generic way to 
quiescence all DMA activity on the system, so most of the 
building blocks are in place.

Thanks,

	Ingo

  reply	other threads:[~2011-12-08  4:45 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-12  5:26 [PATCH v4 0/7] x86: BSP or CPU0 online/offline Fenghua Yu
2011-11-12  5:26 ` [PATCH v4 1/7] x86/topology.c: Support functions for BSP online/offline Fenghua Yu
2011-11-12  5:26 ` [PATCH v4 2/7] x86/common.c: Init BSP data during BSP online Fenghua Yu
2011-11-12  5:26 ` [PATCH v4 3/7] x86/mtrr/main.c: Ask the first online CPU to save mtrr Fenghua Yu
2011-11-12  5:26 ` [PATCH v4 4/7] x86/smpboot.c: Don't offline BSP if any irq can not be migrated out of it Fenghua Yu
2011-11-12  5:26 ` [PATCH v4 5/7] Documentations/cpu-hotplug.tx, kernel-parameters.txt: Add x86 CPU0 online/offline feature Fenghua Yu
2011-11-12  5:26 ` [PATCH v4 6/7] x86/i387.c: Thread xstate is initialized only on BSP once Fenghua Yu
2011-11-13 15:17   ` Brian Gerst
2011-11-12  5:26 ` [PATCH v4 7/7] x86/power/cpu.c: Don't hibernate/suspend if CPU0 is offline Fenghua Yu
2011-12-06  8:42 ` [PATCH v4 0/7] x86: BSP or CPU0 online/offline Ingo Molnar
2011-12-06  8:58   ` Ingo Molnar
2011-12-06  9:52     ` Srivatsa S. Bhat
2011-12-06 10:35       ` Ingo Molnar
2011-12-06 10:47         ` Srivatsa S. Bhat
2011-12-06 11:25           ` Srivatsa S. Bhat
2011-12-06 13:03             ` Borislav Petkov
2011-12-06 13:52               ` Ingo Molnar
2011-12-07  0:04                 ` Yu, Fenghua
2011-12-07  0:15               ` Yu, Fenghua
2011-12-07  7:40                 ` Ingo Molnar
2011-12-07 17:08                   ` Luck, Tony
2011-12-07 22:21                     ` Ingo Molnar
2011-12-08  0:53                       ` Luck, Tony
2011-12-08  4:43                         ` Ingo Molnar [this message]
2011-12-06 13:00           ` Borislav Petkov
2011-12-06 14:04             ` Srivatsa S. Bhat
2011-12-06 14:15               ` Borislav Petkov
2011-12-06 14:19                 ` Srivatsa S. Bhat
2011-12-06 14:58     ` Van De Ven, Arjan
2011-12-06 14:15   ` Srivatsa S. Bhat
2011-12-09  0:41   ` Yu, Fenghua
2011-12-09  7:28     ` Ingo Molnar
2011-12-15 18:38   ` Yu, Fenghua
2011-12-15 18:57     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111208044303.GA9485@elte.hu \
    --to=mingo@elte.hu \
    --cc=Andreas.Herrmann3@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan.van.de.ven@intel.com \
    --cc=bp@amd64.org \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@xenotime.net \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.