From: Yinghai Lu <yinghai@kernel.org>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@elte.hu>,
Don Zickus <dzickus@redhat.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: tip/master broken with x2apic and kexec
Date: Wed, 14 Jul 2010 19:01:20 -0700 [thread overview]
Message-ID: <4C3E6BF0.4040506@kernel.org> (raw)
In-Reply-To: <1279152189.2849.260.camel@sbs-t61.sc.intel.com>
On 07/14/2010 05:03 PM, Suresh Siddha wrote:
> On Wed, 2010-07-14 at 15:57 -0700, Yinghai Lu wrote:
>> On 07/14/2010 02:23 PM, Yinghai Lu wrote:
>>> On 07/14/2010 01:35 PM, Yinghai Lu wrote:
>>>> On 07/13/2010 04:27 PM, Yinghai Lu wrote:
>>>>> On 07/13/2010 03:00 PM, H. Peter Anvin wrote:
>>>>>> On 07/12/2010 07:59 PM, Yinghai Lu wrote:
>>>>>>> tip/master:
>>>>>>> system1: BIOS enabled x2apic, first kernel boot well, and when kexec second kernel will cause system instant reboot.
>>>>>>>
>>>>>>> system2: BIOS not enable x2apic, first kernel boot well and enable x2apic, and kexec second kernel well. but when kexec third kernel will case system instant reboot.
>>>>>>>
>>>>>>> linus' tree is ok.
>>>>>>>
>>>>>>> but for system2 if boot with nox2apic ,intr-remaping off, iommu off, the kexec loop test will pass.
>>>>>>>
>>>>>>> the problem looks start in recent two or three weeks.
>>>>>>>
>>>>>>> Any idea?
>>>>>>>
>>>>>>> bisecting will take a while, because the system post take a while everytime.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Yinghai Lu
>>>>>>
>>>>>> OK, I found the bug... if you could test out the patch which will be
>>>>>> sent out shortly I would very much appreciate it.
>>>>>
>>>>> not sure if your patch is the offending one now.
>>>>>
>>>>> kL: kernel from linus tree
>>>>> kT1: kernel from tip
>>>>> kT2: kernel from tip with reverting your patch
>>>>>
>>>>> BIOS-->kL ---> kL ---> kL....always working
>>>>> BIOS-->kT1 ---> kT1 ---> kT1 : between second one and third one system reset instant...
>>>>> BIOS-->kT2 ---> kT2 ---> kT2 : between second one and third one system reset instant...
>>>>>
>>>>> BIOS-->kL ---> kL ---> kL ---> then kT1 ---> kT1 .... always working
>>>>> BIOS-->kL ---> kL ---> kL ---> then kT2 ---> kT2 .... always working
>>>>>
>>>>
>>>> bisecting said:
>>>>
>>>>> git bisect good
>>>> 58687acba59266735adb8ccd9b5b9aa2c7cd205b is the first bad commit
>>>> commit 58687acba59266735adb8ccd9b5b9aa2c7cd205b
>>>> Author: Don Zickus <dzickus@redhat.com>
>>>> Date: Fri May 7 17:11:44 2010 -0400
>>>>
>>>> lockup_detector: Combine nmi_watchdog and softlockup detector
>>>>
>>>> The new nmi_watchdog (which uses the perf event subsystem) is very
>>>> similar in structure to the softlockup detector. Using Ingo's
>>>> suggestion, I combined the two functionalities into one file:
>>>> kernel/watchdog.c.
>>>>
>>>> Now both the nmi_watchdog (or hardlockup detector) and softlockup
>>>> detector sit on top of the perf event subsystem, which is run every
>>>> 60 seconds or so to see if there are any lockups.
>>>>
>>>> To detect hardlockups, cpus not responding to interrupts, I
>>>> implemented an hrtimer that runs 5 times for every perf event
>>>> overflow event. If that stops counting on a cpu, then the cpu is
>>>> most likely in trouble.
>>>>
>>>> To detect softlockups, tasks not yielding to the scheduler, I used the
>>>> previous kthread idea that now gets kicked every time the hrtimer fires.
>>>> If the kthread isn't being scheduled neither is anyone else and the
>>>> warning is printed to the console.
>>>>
>>>> I tested this on x86_64 and both the softlockup and hardlockup paths
>>>> work.
>>>>
>>>
>>> with
>>> # CONFIG_LOCKUP_DETECTOR is not set
>>> # CONFIG_HARDLOCKUP_DETECTOR is not set
>>>
>>> kexec loop test could passed.
>>>
>>> also that patch will break x2apic preenabled system 's kexec/kdump.
>>
>> before the combining patch
>>
>> CONFIG_DETECT_SOFTLOCKUP=y
>> CONFIG_NMI_WATCHDOG=y
>>
>> will have the same problem.
>>
>> so the problem should come from NMI_WATCHDOG.
>
> Yinghai, It looks like some timing issue wrt nmi handling/kexec and
> perhaps not directly related to x2apic? Perhaps we should try with
> x2apic disabled but with intr-remapping enabled etc to see if it changes
> anything.
only have "nox2apic", without "nointremap intel_iommu=off"
the kexec loop test work well.
So it is x2apic, nmi_watchdog related...
Also do we know (like serial console log etc) how far ahead we
> went in the kexec before we rebooted?
will add more printk after "Starting new kernel" to check it.
Thanks
Yinghai
next prev parent reply other threads:[~2010-07-15 2:09 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-13 2:59 tip/master broken with x2apic and kexec Yinghai Lu
2010-07-13 3:29 ` Yinghai Lu
2010-07-13 6:40 ` H. Peter Anvin
2010-07-14 0:54 ` [tip:x86/alternatives] x86, alternatives: Fix one more open-coded 8-bit alternative number tip-bot for H. Peter Anvin
2010-07-14 0:54 ` [tip:x86/alternatives] x86, alternatives: BUG on encountering an invalid CPU feature number tip-bot for H. Peter Anvin
2010-07-13 22:00 ` tip/master broken with x2apic and kexec H. Peter Anvin
2010-07-13 23:27 ` Yinghai Lu
2010-07-14 20:35 ` Yinghai Lu
2010-07-14 21:05 ` Don Zickus
2010-07-14 22:07 ` Yinghai Lu
2010-07-14 21:23 ` Yinghai Lu
2010-07-14 22:57 ` Yinghai Lu
2010-07-15 0:03 ` Suresh Siddha
2010-07-15 2:01 ` Yinghai Lu [this message]
2010-07-15 7:00 ` [PATCH] x86: fix x2apic preenabled system with kexec Yinghai Lu
2010-07-15 18:16 ` Suresh Siddha
2010-07-15 20:10 ` Yinghai Lu
2010-07-15 20:40 ` Yinghai Lu
2010-07-17 0:48 ` [tip:x86/urgent] x86: Fix " tip-bot for Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C3E6BF0.4040506@kernel.org \
--to=yinghai@kernel.org \
--cc=dzickus@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).