public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, hpa@zytor.com, x86@kernel.org,
	fenghua.yu@intel.com
Subject: Re: [PATCH 1/2] x86/apic: shutdown local APIC before I/O APIC during crash
Date: Fri, 1 Jul 2016 12:36:20 +0200	[thread overview]
Message-ID: <20160701103620.GA12394@gmail.com> (raw)
In-Reply-To: <1467175910-2966-1-git-send-email-weijg.fnst@cn.fujitsu.com>


* Wei Jiangang <weijg.fnst@cn.fujitsu.com> wrote:

> commit <522e66464467> disables I/O APIC before shutdown of
> the local APIC for both reboot and crash path.
> and commit <2885432aaf15> declares that 'it still makes sense to
> quiet IO APIC before disabling Local APIC'.

That's not how we refer to commits in changelogs.

> However, the former introduced a bug for crashdown.

What is 'crashdown'? It's not referred to in the kernel source even once.

> If specify 'notsc' for capture-kernel, and then trigger crashdown.
> The capture-kernel will be blocked at calibrate_delay_converge().

This is a more readable way of saying the same:

  If we specify the 'notsc' boot parameter for the dump-capture kernel,
  and then trigger a crash-down, then the dump-capture kernel will hang
  in calibrate_delay_converge():

(Assuming the changelog first explains what a 'crash-down' is.)

> /* wait for "start of" clock tick */
> ticks = jiffies;
> while (ticks == jiffies)
>     ; /* nothing */

Plase align quoted code to the right with at least a single tab.

> serial console log as following,

  serial log of the hang is as follows:

> ............
> [    0.000000] Linux version 4.7.0-rc2+ (root@localhost.localdomain)
> (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #2 SMP Wed Jun
> 156
> [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.7.0-rc2+
> root=/dev/mapper/centos-root ro rd.lvm.lv=centos/swap
> vconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=256M
> vconsole.keymap=us console=tty0 console=ttyS0,115200n8 LANG=en_US.UTF-8
> irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off
> panic=10 rootflags=nofail acpi_no_memhotplug notsc
> ............
> [    0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable
> TSC completely
> ............
> [    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 133484882848 ns
> [    0.000000] tsc: Fast TSC calibration using PIT
> [    0.000000] tsc: Detected 3192.714 MHz processor
> [    0.000000] Calibrating delay loop...

Just quote the last few lines and skip the useless timestamp column. Also, please 
right-align this too.

> The bug remains and unsolved for a long time, since 2013.
> I find the arch-criminal by bisect.

What is an arch-criminal? Did you want to say:

  The bug has been introduced in 2013. I found the buggy commit via bisection.

?

> The commit <522e66464467> used to fix erratum AVR31 for "Intel Atom
> Processor C2000 Product Family Specification Update".
> You can find the doc at http://www.intel.com/content/dam/www/public/us
> /en/documents/specification-updates/atom-c2000-family-spec-update.pdf.
>
> IMO,
> It doesn't make sense that change the order of disabling between
> I/O APIC and local APIC just for a certain model C2000.
> And I couldn't find any related descriptions for Intel 64 and IA-32 Arch.
> 
> so, I want to revert the crash part of commit <522e66464467>.

So why does the crashdump kernel hang in calibrate_delay_converge()?

To me it appears this is a weakness in the crashdump kernel: it is unable to boot 
if we crash the original host system in a particular hardware state, right?

By reverting this change we'll just paper over the bug and re-introduce the bug 
that can result in certain CPUs hanging if the IO-APIC sends an APIC message if 
the lapic is disabled prematurely.

Thanks,

	Ingo

  parent reply	other threads:[~2016-07-01 10:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-29  4:51 [PATCH 1/2] x86/apic: shutdown local APIC before I/O APIC during crash Wei Jiangang
2016-06-29  4:51 ` [PATCH 2/2] time/tick-schede: fix typos Wei Jiangang
2016-07-01 10:49   ` [tip:timers/core] timers/nohz: Fix several typos tip-bot for Wei Jiangang
2016-07-01 10:36 ` Ingo Molnar [this message]
2016-07-04  9:44   ` [PATCH 1/2] x86/apic: shutdown local APIC before I/O APIC during crash Wei, Jiangang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160701103620.GA12394@gmail.com \
    --to=mingo@kernel.org \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=weijg.fnst@cn.fujitsu.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox