All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, hpa@zytor.com, x86@kernel.org,
	fenghua.yu@intel.com
Subject: Re: [PATCH 1/2] x86/apic: shutdown local APIC before I/O APIC during crash
Date: Fri, 1 Jul 2016 12:36:20 +0200	[thread overview]
Message-ID: <20160701103620.GA12394@gmail.com> (raw)
In-Reply-To: <1467175910-2966-1-git-send-email-weijg.fnst@cn.fujitsu.com>


* Wei Jiangang <weijg.fnst@cn.fujitsu.com> wrote:

> commit <522e66464467> disables I/O APIC before shutdown of
> the local APIC for both reboot and crash path.
> and commit <2885432aaf15> declares that 'it still makes sense to
> quiet IO APIC before disabling Local APIC'.

That's not how we refer to commits in changelogs.

> However, the former introduced a bug for crashdown.

What is 'crashdown'? It's not referred to in the kernel source even once.

> If specify 'notsc' for capture-kernel, and then trigger crashdown.
> The capture-kernel will be blocked at calibrate_delay_converge().

This is a more readable way of saying the same:

  If we specify the 'notsc' boot parameter for the dump-capture kernel,
  and then trigger a crash-down, then the dump-capture kernel will hang
  in calibrate_delay_converge():

(Assuming the changelog first explains what a 'crash-down' is.)

> /* wait for "start of" clock tick */
> ticks = jiffies;
> while (ticks == jiffies)
>     ; /* nothing */

Plase align quoted code to the right with at least a single tab.

> serial console log as following,

  serial log of the hang is as follows:

> ............
> [    0.000000] Linux version 4.7.0-rc2+ (root@localhost.localdomain)
> (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #2 SMP Wed Jun
> 156
> [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.7.0-rc2+
> root=/dev/mapper/centos-root ro rd.lvm.lv=centos/swap
> vconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=256M
> vconsole.keymap=us console=tty0 console=ttyS0,115200n8 LANG=en_US.UTF-8
> irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off
> panic=10 rootflags=nofail acpi_no_memhotplug notsc
> ............
> [    0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable
> TSC completely
> ............
> [    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 133484882848 ns
> [    0.000000] tsc: Fast TSC calibration using PIT
> [    0.000000] tsc: Detected 3192.714 MHz processor
> [    0.000000] Calibrating delay loop...

Just quote the last few lines and skip the useless timestamp column. Also, please 
right-align this too.

> The bug remains and unsolved for a long time, since 2013.
> I find the arch-criminal by bisect.

What is an arch-criminal? Did you want to say:

  The bug has been introduced in 2013. I found the buggy commit via bisection.

?

> The commit <522e66464467> used to fix erratum AVR31 for "Intel Atom
> Processor C2000 Product Family Specification Update".
> You can find the doc at http://www.intel.com/content/dam/www/public/us
> /en/documents/specification-updates/atom-c2000-family-spec-update.pdf.
>
> IMO,
> It doesn't make sense that change the order of disabling between
> I/O APIC and local APIC just for a certain model C2000.
> And I couldn't find any related descriptions for Intel 64 and IA-32 Arch.
> 
> so, I want to revert the crash part of commit <522e66464467>.

So why does the crashdump kernel hang in calibrate_delay_converge()?

To me it appears this is a weakness in the crashdump kernel: it is unable to boot 
if we crash the original host system in a particular hardware state, right?

By reverting this change we'll just paper over the bug and re-introduce the bug 
that can result in certain CPUs hanging if the IO-APIC sends an APIC message if 
the lapic is disabled prematurely.

Thanks,

	Ingo

  parent reply	other threads:[~2016-07-01 10:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-29  4:51 [PATCH 1/2] x86/apic: shutdown local APIC before I/O APIC during crash Wei Jiangang
2016-06-29  4:51 ` [PATCH 2/2] time/tick-schede: fix typos Wei Jiangang
2016-07-01 10:49   ` [tip:timers/core] timers/nohz: Fix several typos tip-bot for Wei Jiangang
2016-07-01 10:36 ` Ingo Molnar [this message]
2016-07-04  9:44   ` [PATCH 1/2] x86/apic: shutdown local APIC before I/O APIC during crash Wei, Jiangang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160701103620.GA12394@gmail.com \
    --to=mingo@kernel.org \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=weijg.fnst@cn.fujitsu.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.