From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751273AbcFXKlj (ORCPT ); Fri, 24 Jun 2016 06:41:39 -0400 Received: from smtp-outbound-1.vmware.com ([208.91.2.12]:52887 "EHLO smtp-outbound-1.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751159AbcFXKlh (ORCPT ); Fri, 24 Jun 2016 06:41:37 -0400 From: Alok Kataria To: "weijg.fnst@cn.fujitsu.com" CC: "tglx@linutronix.de" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "x86@kernel.org" Subject: Re: RFC: Fix kdump failed with 'notsc' Thread-Topic: RFC: Fix kdump failed with 'notsc' Thread-Index: AQHRxiL/Zc2MNGgK40Wm4wQSoMa3bp/4fvGA Date: Fri, 24 Jun 2016 10:41:07 +0000 Message-ID: <1466765165.24676.22.camel@vmware.com> References: <1465898098.16116.52.camel@localhost> In-Reply-To: <1465898098.16116.52.camel@localhost> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=akataria@vmware.com; x-originating-ip: [165.225.104.56] x-ms-office365-filtering-correlation-id: 2d34254d-0c14-4021-c149-08d39c1c0cca x-microsoft-exchange-diagnostics: 1;BY2PR05MB696;20:6fCj56P1m6YiBS6d0gbzCref6Tq4MZ8hNuMC7Rjn04SjmotACKUeDHRG5f9rzRPr9sITFmyd/zBjjt16qOxc7XMWQEMcmUDosRdjbJnS/t+rm9Ibzkkbj233bAldXG7RP4Y8e/ADRXSS/H5A6NLUWavEpLMfgYGnHrHKOocfq7k= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB696; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(230824228335337)(61668805478150); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:BY2PR05MB696;BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB696; x-forefront-prvs: 0983EAD6B2 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(7916002)(377424004)(189002)(199003)(24454002)(19580405001)(87936001)(5002640100001)(10400500002)(19580395003)(99286002)(86362001)(103116003)(66066001)(11100500001)(575784001)(92566002)(586003)(81156014)(33646002)(3280700002)(81166006)(110136002)(76176999)(2906002)(4326007)(54356999)(68736007)(8676002)(305945005)(3846002)(102836003)(6116002)(189998001)(122556002)(36756003)(2900100001)(101416001)(50986999)(2950100001)(2351001)(106356001)(105586002)(7846002)(2501003)(7736002)(106116001)(77096005)(8936002)(3660700001)(97736004);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR05MB696;H:BY2PR05MB696.namprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;CAT:NONE;LANG:en;CAT:NONE; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jun 2016 10:41:07.2189 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR05MB696 X-OriginatorOrg: vmware.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u5OAfiJq024485 Hi Wei, On Tue, 2016-06-14 at 09:56 +0000, Wei, Jiangang wrote: > Hi, > > When I trigger kernel crash and specify 'notsc' for capture-kernel, > The process of kdump will be blocked at calibrate_delay_converge(). > > /* wait for "start of" clock tick */ > ticks = jiffies; > while (ticks == jiffies) > ; /* nothing */ > > The reason is that the jiffies remains the same, no changed. > > serial console log as following, > ............ > [ 0.000000] Linux version 4.7.0-rc2+ (root@localhost.localdomain) > (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #2 SMP Wed Jun > 156 > [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.7.0-rc2+ > root=/dev/mapper/centos-root ro rd.lvm.lv=centos/swap > vconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=256M > vconsole.keymap=us console=tty0 console=ttyS0,115200n8 LANG=en_US.UTF-8 > irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off > panic=10 rootflags=nofail acpi_no_memhotplug notsc > ............ > [ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable > TSC completely > ............ > [ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: > 0xffffffff, max_idle_ns: 133484882848 ns > [ 0.000000] tsc: Fast TSC calibration using PIT > [ 0.000000] tsc: Detected 3192.714 MHz processor > [ 0.000000] Calibrating delay loop... > > # The last log is raised by calibrate_delay(), which calls > calibrate_delay_converge() to compute the lpj value. > > # So far, I don't know why the jiffies stays the same. > # But I found two methods can avoid this problem。 > > 1)specify the 'lpj=' with 'notsc' together. > > 2) revert the 70de9a9. > > commit 70de9a97049e0ba79dc040868564408d5ce697f9 > Author: Alok Kataria > Date: Mon Nov 3 11:18:47 2008 -0800 > > x86: don't use tsc_khz to calculate lpj if notsc is passed > > Impact: fix udelay when "notsc" boot parameter is passed > > With notsc passed on commandline, tsc may not be used for > udelays, make sure that we do not use tsc_khz to calculate > the lpj value in such cases. > > IMO, > The flow of getting tsc_khz as following, > tsc_init()->x86_platform.calibrate_tsc()->native_calibrate_tsc()->quick_pit_calibrate(). > No codes use or call 'rdtsc'. The intent of that change was to skip calculating the lpj value based on the tsc_khz value if notsc is specified. Note that it has noting to do with using rdtsc for tsc frequency calibration, instead we use the tsc frequency (tsc_khz) derived lpj value for udelay (see delay_tsc). If notsc is passed, we skip assigning a value to lpj_fine since tsc is no longer used for implementing delay. Instead we now calibrate lpj value in calibrate_delay and call calibrate_delay_converge. Now looking at calibrate_delay_converge, it expects jiffies to advance. Otherwise you will wait endlessly there static unsigned long calibrate_delay_converge(void) { ... /* wait for "start of" clock tick */ ticks = jiffies; while (ticks == jiffies) ; /* nothing */ You should really look at why is jiffies not incrementing. > > Even if ‘notsc’ is passed, the tsc_khz is credible. > and we can get lpj by it. > > So I want to push a patch to revert the 70de9a9. > Any comments or suggestions is appreciated. As mentioned above reverting change 70de9a9 is wrong and would be just papering over the actual issue. Thanks, Alok > > Thanks, > wei > > > > >