From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754984AbbAIDPR (ORCPT ); Thu, 8 Jan 2015 22:15:17 -0500 Received: from mail-pa0-f43.google.com ([209.85.220.43]:43411 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754444AbbAIDPP (ORCPT ); Thu, 8 Jan 2015 22:15:15 -0500 Message-ID: <1420773300.2801.13.camel@cyril> Subject: Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels From: Cyril Bur To: Don Zickus Cc: linux-kernel@vger.kernel.org, mpe@ellerman.id.au, drjones@redhat.com, akpm@linux-foundation.org, mingo@kernel.org, uobergfe@redhat.com, chaiw.fnst@cn.fujitsu.com, cl@linu.com, fabf@skynet.be, atomlin@redhat.com, benzh@chromium.org, mtosatti@redhat.com Date: Fri, 09 Jan 2015 14:15:00 +1100 In-Reply-To: <20150106150157.GF116159@redhat.com> References: <1419224764-11384-1-git-send-email-cyrilbur@gmail.com> <20150105165057.GU116159@redhat.com> <1420502015.2910.6.camel@cyril> <20150106150157.GF116159@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2015-01-06 at 10:01 -0500, Don Zickus wrote: > On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote: > > On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote: > > > cc'ing Marcelo > > > > > > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote: > > > > When the hypervisor pauses a virtualised kernel the kernel will observe a jump > > > > in timebase, this can cause spurious messages from the softlockup detector. > > > > > > > > Whilst these messages are harmless, they are accompanied with a stack trace > > > > which causes undue concern and more problematically the stack trace in the > > > > guest has nothing to do with the observed problem and can only be misleading. > > > > > > > > Futhermore, on POWER8 this is completely avoidable with the introduction of > > > > the Virtual Time Base (VTB) register. > > > > > > Hi Cyril, > > > > > > Your solution seems simple and doesn't disturb the softlockup code as much > > > as the x86 solution does. The only small issue I had was the use of > > > sched_clock instead of local_clock. I keep forgetting the difference > > > (unstable clock is the biggest reason I think). > > My apologies there it appears I stuffed up, local_clock was used > > initially in the softlockup code, I'll send a v2. > > Thanks! > > > > > > Other than that, I am not the biggest fan of putting multiple virtual > > > guest solutions for the same problem into the watchdog code. I would > > > prefer a common solution/framework to leverage. > > Agreed. > > > > > I have the x86 folks focusing on the steal_time stuff. It started with > > > KVM and I believe VMWare is working on utilizing it too (and maybe Xen). > > I'm not sure I've ever seen this, could you please point me towards > > something I can look at? > > I am not too familar with it, but the kernel/watchdog.c code has calls to > kvm_check_and_clear_guest_paused(), which is probably a good place to > start. > Ah yes that, I did initially have a look at what it does when I undertook to solve the problem on power and I suppose the two solutions are similar in that they both just use a virtualised time source. The similarities stop there though, the paravirtualised clock that x86 uses provides (as the name of the function implies) a 'was paused' flag. Obviously the flag isn't something the vtb register on power8 can provide and since we have a vtb, its preferable to use that. Perhaps x86 can do something with running_clock? Regards, Cyril > Cheers, > Don > > > > > > Not sure if that is useful or could be incoporated into the power8 code. > > > Though to be honest I am curious if the steal_time code could be ported to > > > your solution as it seems the watchdog code could remove all the > > > steal_time warts. > > Happy to help sus out the situation here, again, if you could pass on > > what the x86 guys are working on, thanks. > > > > > > Thanks, > > > > Cyril > > > I have cc'd Marcelo into this discussion as he was the last person I > > > remember talking with about this problem. > > > > > > Cheers, > > > Don > > > >