From: Igor Mammedov <imammedo@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: rob@landley.net, tglx@linutronix.de, mingo@redhat.com,
hpa@zytor.com, x86@kernel.org, luto@mit.edu,
suresh.b.siddha@intel.com, avi@redhat.com, imammedo@redhat.com,
a.p.zijlstra@chello.nl, johnstul@us.ibm.com,
arjan@linux.intel.com, linux-doc@vger.kernel.org
Subject: [PATCH 5/5] Do not mark cpu as not present if we failed to boot it
Date: Wed, 9 May 2012 12:25:02 +0200 [thread overview]
Message-ID: <1336559102-28103-6-git-send-email-imammedo@redhat.com> (raw)
In-Reply-To: <1336559102-28103-1-git-send-email-imammedo@redhat.com>
It will allow to boot cpu later if possible.
v2:
Introduce failed_cpu_boots_limit cmd-line parameter.
At startup udev might try to online cpu even if it have failed to boot
first time. And udev will loop there on cpu that refuses to boot.
So disable cpu after failed_cpu_boots_limit is reached to prevent
udev spinning on onlining persistently faulty cpu.
Guest kernel on overcomitted hosts could use this parameter to set
limit to acceptable number of cpu online failures.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
Documentation/kernel-parameters.txt | 6 +++++
arch/x86/kernel/smpboot.c | 36 +++++++++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c1601e5..6b9bbbc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -825,6 +825,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <interval>,<probability>,<space>,<times>
See also Documentation/fault-injection/.
+ failed_cpu_boots_limit=[SMP,X86]
+ Number of tries kernel allowed to boot not responding /
+ stuck cpu. When fail attempts are reached, kernel will
+ disable failed cpu and mark it as not present.
+ Default: 0
+
floppy= [HW]
See Documentation/blockdev/floppy.txt.
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index af63cab..2d72a8a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -136,6 +136,28 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
atomic_t init_deasserted;
+static int failed_cpu_boots_limit = 0;
+static int cpu_boot_error_nr[NR_CPUS];
+
+static int parse_failed_cpu_boots(char *str)
+{
+ unsigned long val;
+ int err;
+
+ if (!str)
+ return -EINVAL;
+
+ err = kstrtoul(str, 0, &failed_cpu_boots_limit);
+ if (err)
+ return -EINVAL;
+
+ printk(KERN_NOTICE "Limit CPU failed boot attempts: %d\n",
+ failed_cpu_boots_limit);
+
+ return 0;
+}
+__setup("failed_cpu_boots_limit=", parse_failed_cpu_boots);
+
/*
* Report back to the Boot Processor.
* Running on AP.
@@ -810,8 +832,18 @@ do_rest:
/* was set by cpu_init() */
cpumask_clear_cpu(cpu, cpu_initialized_mask);
- set_cpu_present(cpu, false);
- per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+ /* was set by smp_callin() */
+ cpumask_clear_cpu(cpu, cpu_callin_mask);
+
+ /* disable CPU if it's failed to boot N times in a row */
+ if (cpu_boot_error_nr[cpu]++ > failed_cpu_boots_limit) {
+ set_cpu_present(cpu, false);
+ per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+ pr_err("CPU%d: repeatedly fails to boot, disabling.\n",
+ cpu);
+ }
+ } else {
+ cpu_boot_error_nr[cpu] = 0;
}
/* mark "stuck" area as not stuck */
--
1.7.1
prev parent reply other threads:[~2012-05-09 8:27 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
2012-05-09 9:19 ` Peter Zijlstra
2012-05-09 12:29 ` Igor Mammedov
2012-05-09 13:12 ` Ingo Molnar
2012-05-10 17:31 ` Rob Landley
2012-05-10 17:39 ` Peter Zijlstra
2012-05-09 10:24 ` [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up Igor Mammedov
2012-05-09 15:04 ` Shuah Khan
2012-05-09 15:22 ` Igor Mammedov
2012-05-09 15:34 ` Shuah Khan
2012-05-10 15:26 ` Shuah Khan
2012-05-10 16:29 ` Igor Mammedov
2012-05-10 16:38 ` Shuah Khan
2012-05-11 11:45 ` Thomas Gleixner
2012-05-11 15:16 ` Igor Mammedov
2012-05-11 21:14 ` Thomas Gleixner
2012-05-12 19:32 ` [RFC] [x86]: abort secondary cpu bringup gracefully Igor Mammedov
2012-05-12 17:39 ` Peter Zijlstra
2012-05-12 18:51 ` Igor Mammedov
2012-05-14 11:09 ` [RFC v2] " Igor Mammedov
2012-05-24 15:41 ` Igor Mammedov
2012-05-25 18:11 ` Rob Landley
2012-05-30 16:38 ` Igor Mammedov
2012-05-09 10:24 ` [PATCH 2/5] Take in account that several cpus might call check_tsc_sync_* at the same time Igor Mammedov
2012-05-09 10:25 ` [PATCH 3/5] Do not wait till next cpu online and abort early if lead cpu do not wait for us anymore Igor Mammedov
2012-05-09 10:25 ` [PATCH 4/5] Cancel secondary CPU bringup if boot cpu abandoned this effort Igor Mammedov
2012-05-09 10:25 ` Igor Mammedov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1336559102-28103-6-git-send-email-imammedo@redhat.com \
--to=imammedo@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=arjan@linux.intel.com \
--cc=avi@redhat.com \
--cc=hpa@zytor.com \
--cc=johnstul@us.ibm.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@mit.edu \
--cc=mingo@redhat.com \
--cc=rob@landley.net \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).