From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756571Ab0GNXAz (ORCPT ); Wed, 14 Jul 2010 19:00:55 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:52630 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753187Ab0GNXAx (ORCPT ); Wed, 14 Jul 2010 19:00:53 -0400 Message-ID: <4C3E40EC.5060607@kernel.org> Date: Wed, 14 Jul 2010 15:57:48 -0700 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100520 SUSE/3.0.5 Thunderbird/3.0.5 MIME-Version: 1.0 To: "H. Peter Anvin" , Ingo Molnar , Don Zickus , Frederic Weisbecker CC: Thomas Gleixner , Suresh Siddha , "linux-kernel@vger.kernel.org" Subject: Re: tip/master broken with x2apic and kexec References: <4C3BD6AA.3070908@kernel.org> <4C3CE210.2030902@zytor.com> <4C3CF650.30905@kernel.org> <4C3E1FA0.9000107@kernel.org> <4C3E2AE6.30406@kernel.org> In-Reply-To: <4C3E2AE6.30406@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsmt353.oracle.com [141.146.40.153] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090206.4C3E4189.0216,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/14/2010 02:23 PM, Yinghai Lu wrote: > On 07/14/2010 01:35 PM, Yinghai Lu wrote: >> On 07/13/2010 04:27 PM, Yinghai Lu wrote: >>> On 07/13/2010 03:00 PM, H. Peter Anvin wrote: >>>> On 07/12/2010 07:59 PM, Yinghai Lu wrote: >>>>> tip/master: >>>>> system1: BIOS enabled x2apic, first kernel boot well, and when kexec second kernel will cause system instant reboot. >>>>> >>>>> system2: BIOS not enable x2apic, first kernel boot well and enable x2apic, and kexec second kernel well. but when kexec third kernel will case system instant reboot. >>>>> >>>>> linus' tree is ok. >>>>> >>>>> but for system2 if boot with nox2apic ,intr-remaping off, iommu off, the kexec loop test will pass. >>>>> >>>>> the problem looks start in recent two or three weeks. >>>>> >>>>> Any idea? >>>>> >>>>> bisecting will take a while, because the system post take a while everytime. >>>>> >>>>> Thanks >>>>> >>>>> Yinghai Lu >>>> >>>> OK, I found the bug... if you could test out the patch which will be >>>> sent out shortly I would very much appreciate it. >>> >>> not sure if your patch is the offending one now. >>> >>> kL: kernel from linus tree >>> kT1: kernel from tip >>> kT2: kernel from tip with reverting your patch >>> >>> BIOS-->kL ---> kL ---> kL....always working >>> BIOS-->kT1 ---> kT1 ---> kT1 : between second one and third one system reset instant... >>> BIOS-->kT2 ---> kT2 ---> kT2 : between second one and third one system reset instant... >>> >>> BIOS-->kL ---> kL ---> kL ---> then kT1 ---> kT1 .... always working >>> BIOS-->kL ---> kL ---> kL ---> then kT2 ---> kT2 .... always working >>> >> >> bisecting said: >> >>> git bisect good >> 58687acba59266735adb8ccd9b5b9aa2c7cd205b is the first bad commit >> commit 58687acba59266735adb8ccd9b5b9aa2c7cd205b >> Author: Don Zickus >> Date: Fri May 7 17:11:44 2010 -0400 >> >> lockup_detector: Combine nmi_watchdog and softlockup detector >> >> The new nmi_watchdog (which uses the perf event subsystem) is very >> similar in structure to the softlockup detector. Using Ingo's >> suggestion, I combined the two functionalities into one file: >> kernel/watchdog.c. >> >> Now both the nmi_watchdog (or hardlockup detector) and softlockup >> detector sit on top of the perf event subsystem, which is run every >> 60 seconds or so to see if there are any lockups. >> >> To detect hardlockups, cpus not responding to interrupts, I >> implemented an hrtimer that runs 5 times for every perf event >> overflow event. If that stops counting on a cpu, then the cpu is >> most likely in trouble. >> >> To detect softlockups, tasks not yielding to the scheduler, I used the >> previous kthread idea that now gets kicked every time the hrtimer fires. >> If the kthread isn't being scheduled neither is anyone else and the >> warning is printed to the console. >> >> I tested this on x86_64 and both the softlockup and hardlockup paths >> work. >> > > with > # CONFIG_LOCKUP_DETECTOR is not set > # CONFIG_HARDLOCKUP_DETECTOR is not set > > kexec loop test could passed. > > also that patch will break x2apic preenabled system 's kexec/kdump. before the combining patch CONFIG_DETECT_SOFTLOCKUP=y CONFIG_NMI_WATCHDOG=y will have the same problem. so the problem should come from NMI_WATCHDOG. Yinghai