All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xunlei Pang <xpang@redhat.com>
To: Borislav Petkov <bp@alien8.de>, xlpang@redhat.com
Cc: Prarit Bhargava <prarit@redhat.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	Tony Luck <tony.luck@intel.com>,
	x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Dave Young <dyoung@redhat.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Date: Thu, 16 Feb 2017 19:52:09 +0800	[thread overview]
Message-ID: <58A59269.3050706@redhat.com> (raw)
In-Reply-To: <20170216101845.vkmnde4v6v72dgzx@pd.tnic>

On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
    Enable kdump(crashkernel=256M) and configure kdump kernel to boot with "nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
    (taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100000000000000 0x5 0x0 0x0),
    then mce will be broadcast to the other cpus which are still running
    in the first kernel(i.e. looping in crash_nmi_callback).
    If you own some hardware to inject mce, it would be great, as QEMU does not work correctly for me.
4. Then something like below is expected to happen:

[    1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
         Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.000010] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: bd0000000000017a
[   39.000010] mce: [Hardware Error]: TSC 0 ADDR 61600000 MISC 8c 
[   39.000010] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 0 APIC 0 microcode 1
[   39.000010] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.000010] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[   39.000010] Shutting down cpus with NMI
[    1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[    1.758463] Do you have a strange power saving mode enabled?
[    1.758463] Dazed and confused, but trying to continue
[   39.000010] Rebooting in 30 seconds..

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Xunlei Pang <xpang@redhat.com>
To: Borislav Petkov <bp@alien8.de>, xlpang@redhat.com
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	kexec@lists.infradead.org, Tony Luck <tony.luck@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Dave Young <dyoung@redhat.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Date: Thu, 16 Feb 2017 19:52:09 +0800	[thread overview]
Message-ID: <58A59269.3050706@redhat.com> (raw)
In-Reply-To: <20170216101845.vkmnde4v6v72dgzx@pd.tnic>

On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
    Enable kdump(crashkernel=256M) and configure kdump kernel to boot with "nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
    (taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100000000000000 0x5 0x0 0x0),
    then mce will be broadcast to the other cpus which are still running
    in the first kernel(i.e. looping in crash_nmi_callback).
    If you own some hardware to inject mce, it would be great, as QEMU does not work correctly for me.
4. Then something like below is expected to happen:

[    1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
         Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.000010] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: bd0000000000017a
[   39.000010] mce: [Hardware Error]: TSC 0 ADDR 61600000 MISC 8c 
[   39.000010] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 0 APIC 0 microcode 1
[   39.000010] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.000010] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[   39.000010] Shutting down cpus with NMI
[    1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[    1.758463] Do you have a strange power saving mode enabled?
[    1.758463] Dazed and confused, but trying to continue
[   39.000010] Rebooting in 30 seconds..

Regards,
Xunlei

  reply	other threads:[~2017-02-16 11:50 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-23  8:01 [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic Xunlei Pang
2017-01-23  8:01 ` Xunlei Pang
2017-01-23 12:51 ` Borislav Petkov
2017-01-23 12:51   ` Borislav Petkov
2017-01-23 13:35   ` Xunlei Pang
2017-01-23 13:35     ` Xunlei Pang
2017-01-23 14:50     ` Borislav Petkov
2017-01-23 14:50       ` Borislav Petkov
2017-01-23 17:40       ` Luck, Tony
2017-01-23 17:40         ` Luck, Tony
2017-01-23 17:51         ` Borislav Petkov
2017-01-23 17:51           ` Borislav Petkov
2017-01-23 18:01           ` Luck, Tony
2017-01-23 18:01             ` Luck, Tony
2017-01-23 18:14             ` Borislav Petkov
2017-01-23 18:14               ` Borislav Petkov
2017-01-24  2:33               ` Xunlei Pang
2017-01-24  2:33                 ` Xunlei Pang
2017-01-24  1:46           ` Xunlei Pang
2017-01-24  1:46             ` Xunlei Pang
2017-01-24  1:51             ` Xunlei Pang
2017-01-24  1:51               ` Xunlei Pang
2017-01-24  1:27       ` Xunlei Pang
2017-01-24  1:27         ` Xunlei Pang
2017-01-24 12:22         ` Borislav Petkov
2017-01-24 12:22           ` Borislav Petkov
2017-01-26  6:30           ` Xunlei Pang
2017-01-26  6:30             ` Xunlei Pang
2017-01-26  6:44             ` Borislav Petkov
2017-01-26  6:44               ` Borislav Petkov
2017-02-16  5:36               ` Xunlei Pang
2017-02-16  5:36                 ` Xunlei Pang
2017-02-16 10:18                 ` Borislav Petkov
2017-02-16 10:18                   ` Borislav Petkov
2017-02-16 11:52                   ` Xunlei Pang [this message]
2017-02-16 11:52                     ` Xunlei Pang
2017-02-16 12:22                     ` Borislav Petkov
2017-02-16 12:22                       ` Borislav Petkov
2017-02-17  1:53                       ` Xunlei Pang
2017-02-17  1:53                         ` Xunlei Pang
2017-02-17  9:07                         ` Borislav Petkov
2017-02-17  9:07                           ` Borislav Petkov
2017-02-17 16:21                           ` Xunlei Pang
2017-02-17 16:21                             ` Xunlei Pang
2017-02-21 18:20                             ` Luck, Tony
2017-02-21 18:20                               ` Luck, Tony
2017-02-22  5:50                               ` Xunlei Pang
2017-02-22  5:50                                 ` Xunlei Pang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58A59269.3050706@redhat.com \
    --to=xpang@redhat.com \
    --cc=bp@alien8.de \
    --cc=dyoung@redhat.com \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=prarit@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.