public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
To: Petr Tesarik <ptesarik@suse.cz>
Cc: Fenghua Yu <fenghua.yu@intel.com>, Jingbai Ma <jingbai.ma@hp.com>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Mitchell, Lisa (MCLinux in Fort Collins)" <lisa.mitchell@hp.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	bhelgaas@google.com, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issue after unsetting BSP flag
Date: Tue, 20 Aug 2013 12:13:12 +0900	[thread overview]
Message-ID: <5212DEC8.9000609@jp.fujitsu.com> (raw)
In-Reply-To: <20130819154626.39403f5b@hananiah.suse.cz>

(2013/08/19 22:46), Petr Tesarik wrote:
> On Sun, 18 Aug 2013 19:59:53 -0700
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
>>
>>
>>
>>>
>>> Sorry Eric, I'm not clear to what you mean by ``short one core''...
>>> Which are you suggesting? Disabling BSP if crash happens on AP is
>>> reasonable?
>>> Or restricting cpus to a single one only just as the current kdump
>>> configuration is reasonable?
>>
>> I am suggesting we start every cpu except the BSP from the AP we started on.
>>
>> N-1 cpus seems like a good tradeoff between performance and reliability for those who need it.
>
> FWIW a large customers of ours is fine with such a limitation. And I
> have already tested this approach manually (starting the kdump kernel
> with maxcpus=1 and hot-plugging the remaining APs from user-space).
>

This is a workaround I suggested previously on this mailing list?
The additional merits of disabling BSP in kernel-side on the 2nd kernel is:

- We can assign memory for BSP to another CPU that is available. It's more
   efficient in memory consumption. It's the same reason why distro uses nr_cpus=1
   instead of maxcpus=1. If we don't disable BSP, we allocate some amount of memory
   for BSP in kernel-space although we never use it. 2nd kernel should have as small
   amount of memory as possible.

- Remove BSP from hot-plugging target CPUes. Keeping BSP after crash happens on AP
   means keeping a potential risk of triggering system hang from user-space.
   Can remove awkward configuration to hot-add APs from user-space while avoiding BSP.
   This seems less important than the above.

On practical configuraiton, it's necessary to decide how many cpus we should use
on the 2nd kernel for trade-off of performance and acceptable amount of memory
for additional CPUes. I think it would simply be the number of disks and the number
of threads of makedumpfile in most cases.

> Now that this approach is in line with upstream efforts, I'm going to
> test it on some more machines and see if there are any troubles.
>
> @Hatayama-san:
>> BTW, I have question that does normal kdump work well if crash happens
>> on some AP? I wonder the same issue could happen on the 2nd kernel.
>
> I'm not sure what you mean. Normal kdump starts with "maxcpus=1", and
> yes, that works even if the secondary kernel is booted from an AP. OTOH
> I suspect that not having any BSP in the system may be the cause of some
> mysterious random reboots and/or hangs experienced by some customers.
>
> I'll try setting the BSP flag on the boot CPU unconditionally and see
> if it makes any difference.
>
> Petr Tesarik
>

Ma saw a hang when he tried to reboot his HP systems on the 1st kernel
under the condition that BSP flag was unset on any existing CPUs. I
thought the condition is similar to the 2nd kernel after crash happens
on AP in the sense that there is no BSP in online CPUs. If the hang he saw
was caused by running reboot on the CPU without BSP flag, I guess the same
situation could already happen on only-1-cpu configuration now.

-- 
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2013-08-20  3:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06  9:19 [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issue after unsetting BSP flag HATAYAMA Daisuke
2013-08-06 16:25 ` Bjorn Helgaas
2013-08-07 10:05   ` HATAYAMA Daisuke
2013-08-13 10:55 ` Jingbai Ma
2013-08-14  9:13   ` Jingbai Ma
2013-08-14 19:45     ` Eric W. Biederman
2013-08-19  2:29       ` HATAYAMA Daisuke
2013-08-19  2:59         ` Eric W. Biederman
2013-08-19  9:13           ` HATAYAMA Daisuke
2013-08-19 13:46           ` Petr Tesarik
2013-08-20  3:13             ` HATAYAMA Daisuke [this message]
2013-08-19  1:57     ` HATAYAMA Daisuke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5212DEC8.9000609@jp.fujitsu.com \
    --to=d.hatayama@jp.fujitsu.com \
    --cc=bhelgaas@google.com \
    --cc=ebiederm@xmission.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=jingbai.ma@hp.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lisa.mitchell@hp.com \
    --cc=ptesarik@suse.cz \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox