From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stanislaw Gruszka Subject: Re: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190() Date: Tue, 11 Feb 2014 09:23:08 +0100 Message-ID: <20140211082306.GA1528@redhat.com> References: <52F84A9B.5020008@gmail.com> <52F9219B.5020003@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <52F9219B.5020003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kernel-bounces-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org Errors-To: kernel-bounces-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org To: poma Cc: Olaf Hering , Mailing-List fedora-kernel , linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Josh Boyer , Linux Kernel list , "Justin M. Forbes" , Thomas Gleixner List-Id: linux-pm@vger.kernel.org T24gTW9uLCBGZWIgMTAsIDIwMTQgYXQgMDc6NTk6MzlQTSArMDEwMCwgcG9tYSB3cm90ZToKPiBP biAxMC4wMi4yMDE0IDExOjA2LCBUaG9tYXMgR2xlaXhuZXIgd3JvdGU6Cj4gPiBPbiBNb24sIDEw IEZlYiAyMDE0LCBwb21hIHdyb3RlOgo+ID4gCj4gPj4gWyAgIDgzLjU1ODU1MV0gIFs8ZmZmZmZm ZmY4MTAyNWIxNz5dIGFtZF9lNDAwX2lkbGUrMHg4Ny8weDEzMAo+ID4gCj4gPiBTbyB0aGlzIHNl ZW1zIHRvIGhhcHBlbiBvbmx5IG9uIEFNRCBtYWNoaW5lcyB3aGljaCB1c2UgdGhhdCBlNDAwIGlk bGUKPiA+IG1vZGUuIEkgaGF2ZSBubyBpZGVhIGF0IHRoZSBtb21lbnQgd2hhdHMgd3JvbmcgdGhl cmUuIEknbGwgZmluZCBvbmUgb2YKPiA+IHRob3NlIG1hY2hpbmVzIGFuZCB0cnkgdG8gcmVwcm9k dWNlLgoKSSB0cmllZCB0byBkZWJ1ZyB0aGF0IHdhcm4gYXMgd2VsbC4gRXZlbiBpZiBJIGZvdW5k IG1hY2hpbmUgd2l0aCBwcm9wZXIKZmFtaWx5IGFuZCBtb2RlbCBudW1iZXIsIEhXIEMxRSBidWcg ZG8gbm90IGhhcHBlbiB0aGVyZSwgaGVuY2UgSSBqdXN0CmhhY2sga2VybmVsIHRvIGFsd2F5cyB1 c2UgYW1kX2U0MDBfaWRsZSAoYW5kIHJlbW92ZSBBTUQgcmRtc3Igc3BlY2lmaWMKaW5zdHJ1Y3Rp b25zIHRvIGRvIG5vdCBjcmFzaCkuIFRoYXQgbWFrZSBpc3N1ZSAxMDAlIHJlcHJvZHVjaWJsZSB3 aGVuCnN1c3BlbmQvcmVzdW1lLgoKSXQgaGFwcGVucyB3aGVuIGNwdSBiZWNvbWUgaWRsZSwgY2Fs bCBDTE9DS19FVlRfTk9USUZZX0JST0FEQ0FTVF9FTlRFUiwKYnV0IGJlZm9yZSBDTE9DS19FVlRf Tk9USUZZX0JST0FEQ0FTVF9FWElULCBpbnRlcnJ1cHQgdHJpZ2dlciBvbiB0aGF0CmNwdS4gSVJR IGlzIGhhbmRsZWQgYnkgaHJ0aW1lciBjb2RlLCB3aGljaCB3YW50IHRvIHN3aXRjaCB0byBocmVz IGFuZApjYWxsOgoKdGlja19zd2l0Y2hfdG9fb25lc2hvdCgpIC0+IC4uLiAtPiB0aWNrX2Jyb2Fk Y2FzdF9zZXR1cF9vbmVzaG90KCkKClNpbmNlIHdlIGhhdmUgYWxyZWFkeSBwcm9wZXIgaGFuZGxl ciB0aGVyZSwgbGFzdCBwcm9jZWR1cmUgY2xlYXIKdGlja19icm9hZGNhc3Rfb25lc2hvdF9tYXNr LCBidXQgdGlja19icm9hZGNhc3RfcGVuZGluZ19tYXNrIHN0YXkKc2V0LiBXaGVuIGFtZF9lNDAw X2lkbGUgbmV4dCB0aW1lIGNhbGwgQ0xPQ0tfRVZUX05PVElGWV9CUk9BRENBU1RfRU5URVIsCnRo ZSB3YXJuaW5nIHdpbGwgaGFwcGVuLgoKSSBjYW1lIHdpdGggYSBiZWxvdyBwYXRjaCwgd2hpY2gg YWxzbyBjbGVhciBwZW5kaW5nIG1hc2ssIGJ1dCBwZXJoYXBzCm9uZXNob3RfbWFzayBzaG91bGQg bm90IGJlIGNsZWFyZWQgb24gdGlja19icm9hZGNhc3Rfc2V0dXBfb25lc2hvdCgpLApvciBzaG91 bGQgYmUgY2xlYXJlZCBvbmx5IGNvbmRpdGlvbmFsbHksIG9yIHNvbWUgb3RoZXIgc29sdXRpb24g aXMKbmVlZGVkLiBBbnl3YXksIHBhdGNoIG1ha2UgdGhlIHdhcm5pbmcgZ29uZSBvbiBteSBoYWNr ZWQgc2V0dXAsIEkgd2FzCndhaXRpbmcgZm9yIHRlc3RpbmcgcmVzdWx0cyBvbiByZWFsIEMxRSBo YXJkd2FyZS4KClRoYW5rcwpTdGFuaXNsYXcKCmRpZmYgLS1naXQgYS9rZXJuZWwvdGltZS90aWNr LWJyb2FkY2FzdC5jIGIva2VybmVsL3RpbWUvdGljay1icm9hZGNhc3QuYwppbmRleCA0Mzc4MGFi Li45ODk3N2E1IDEwMDY0NAotLS0gYS9rZXJuZWwvdGltZS90aWNrLWJyb2FkY2FzdC5jCisrKyBi L2tlcm5lbC90aW1lL3RpY2stYnJvYWRjYXN0LmMKQEAgLTc1Niw2ICs3NTYsNyBAQCBvdXQ6CiBz dGF0aWMgdm9pZCB0aWNrX2Jyb2FkY2FzdF9jbGVhcl9vbmVzaG90KGludCBjcHUpCiB7CiAJY3B1 bWFza19jbGVhcl9jcHUoY3B1LCB0aWNrX2Jyb2FkY2FzdF9vbmVzaG90X21hc2spOworCWNwdW1h c2tfY2xlYXJfY3B1KGNwdSwgdGlja19icm9hZGNhc3RfcGVuZGluZ19tYXNrKTsKIH0KIAogc3Rh dGljIHZvaWQgdGlja19icm9hZGNhc3RfaW5pdF9uZXh0X2V2ZW50KHN0cnVjdCBjcHVtYXNrICpt YXNrLApfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwprZXJu ZWwgbWFpbGluZyBsaXN0Cmtlcm5lbEBsaXN0cy5mZWRvcmFwcm9qZWN0Lm9yZwpodHRwczovL2Fk bWluLmZlZG9yYXByb2plY3Qub3JnL21haWxtYW4vbGlzdGluZm8va2VybmVs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752188AbaBKIYM (ORCPT ); Tue, 11 Feb 2014 03:24:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49095 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750758AbaBKIYJ (ORCPT ); Tue, 11 Feb 2014 03:24:09 -0500 Date: Tue, 11 Feb 2014 09:23:08 +0100 From: Stanislaw Gruszka To: poma Cc: Thomas Gleixner , Linux Kernel list , linux-pm@vger.kernel.org, Olaf Hering , Dave Jones , "Justin M. Forbes" , Josh Boyer , Mailing-List fedora-kernel Subject: Re: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190() Message-ID: <20140211082306.GA1528@redhat.com> References: <52F84A9B.5020008@gmail.com> <52F9219B.5020003@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52F9219B.5020003@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 10, 2014 at 07:59:39PM +0100, poma wrote: > On 10.02.2014 11:06, Thomas Gleixner wrote: > > On Mon, 10 Feb 2014, poma wrote: > > > >> [ 83.558551] [] amd_e400_idle+0x87/0x130 > > > > So this seems to happen only on AMD machines which use that e400 idle > > mode. I have no idea at the moment whats wrong there. I'll find one of > > those machines and try to reproduce. I tried to debug that warn as well. Even if I found machine with proper family and model number, HW C1E bug do not happen there, hence I just hack kernel to always use amd_e400_idle (and remove AMD rdmsr specific instructions to do not crash). That make issue 100% reproducible when suspend/resume. It happens when cpu become idle, call CLOCK_EVT_NOTIFY_BROADCAST_ENTER, but before CLOCK_EVT_NOTIFY_BROADCAST_EXIT, interrupt trigger on that cpu. IRQ is handled by hrtimer code, which want to switch to hres and call: tick_switch_to_oneshot() -> ... -> tick_broadcast_setup_oneshot() Since we have already proper handler there, last procedure clear tick_broadcast_oneshot_mask, but tick_broadcast_pending_mask stay set. When amd_e400_idle next time call CLOCK_EVT_NOTIFY_BROADCAST_ENTER, the warning will happen. I came with a below patch, which also clear pending mask, but perhaps oneshot_mask should not be cleared on tick_broadcast_setup_oneshot(), or should be cleared only conditionally, or some other solution is needed. Anyway, patch make the warning gone on my hacked setup, I was waiting for testing results on real C1E hardware. Thanks Stanislaw diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 43780ab..98977a5 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -756,6 +756,7 @@ out: static void tick_broadcast_clear_oneshot(int cpu) { cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask); + cpumask_clear_cpu(cpu, tick_broadcast_pending_mask); } static void tick_broadcast_init_next_event(struct cpumask *mask,