From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754112AbbIIPns (ORCPT ); Wed, 9 Sep 2015 11:43:48 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:16383 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753802AbbIIPnq (ORCPT ); Wed, 9 Sep 2015 11:43:46 -0400 Date: Wed, 9 Sep 2015 08:43:05 -0700 From: Shaohua Li To: Thomas Gleixner CC: Mathieu Desnoyers , LKML , Daniel Lezcano , John Stultz , Peter Zijlstra , Ingo Molnar , Gleb Natapov , Paolo Bonzini Subject: Re: [RFC PATCH v3] Fix: clocksource watchdog marks TSC unstable on guest VM Message-ID: <20150909154249.GA3563967@devbig257.prn2.facebook.com> References: <1441721953-12108-1-git-send-email-mathieu.desnoyers@efficios.com> <20150909010339.GA1565763@devbig257.prn2.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-12-10) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000 definitions=2015-09-09_08:2015-09-08,2015-09-09,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 09, 2015 at 11:51:43AM +0200, Thomas Gleixner wrote: > On Tue, 8 Sep 2015, Shaohua Li wrote: > > On Tue, Sep 08, 2015 at 05:08:03PM +0200, Thomas Gleixner wrote: > > > For non paravirt kernels which can read the TSC directly, we'd need a > > > way to transport that information. A simple mechanism would be to > > > query an emulated MSR from the watchdog which tells the guest the > > > state of affairs on the host side. That would be a sensible and > > > minimal invasive change on both host and guests. > > > > This will require every hypervisor supports the MSR, so not a solution > > we can expect immediately. > > I know. > > > I'm wondering why we can't just make the watchdog better to detect this > > watchdog wrap. > > Again, I'm not opposed to make it better. I'm just trying to prevent > making the watchdog a total mess for no reason. > > > It can happen in physical machine as I said before, but I > > can't find a simple way to trigger it, so it's not very convincing. But > > the watchdog doesn't work for specific environment (for exmaple, a bogus > > hardware doesn't responsond for some time) for sure, we shouldn't assume > > the world is perfect. > > Sigh. If the damned hardware blocks long enough to wreckage the > watchdog then we have more serious problems than that. There is difference. If hardware blocks, we can choose reset the hardware or we can just ignore it if it's a serial console or netconsole (these are what happend in our side) for example. These impact the system very little. But if HPET is the clocksource, the performance of the system will be quite poor and makes the whole system useless. There is no method to reset the clocksource to TSC. If there is a reset mechanism, it's fine too. > Can you please stop this handwaving and provide some proper proof for > your arguments? I'm really tired of this. I'm sorry I can't provide a simple way to trigger it in real hardware, but it's not hard to trigger this issue in kvm. Just make your host busy and keep rebooting your virtual machine, you will find it. Thanks, Shaohua