From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753574AbcHPOLe (ORCPT ); Tue, 16 Aug 2016 10:11:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50142 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751455AbcHPOLd (ORCPT ); Tue, 16 Aug 2016 10:11:33 -0400 Message-ID: <1471356104.32433.38.camel@redhat.com> Subject: Re: [PATCH] time,virt: resync steal time when guest & host lose sync From: Rik van Riel To: Wanpeng Li Cc: Frederic Weisbecker , Ingo Molnar , LKML , Paolo Bonzini , Peter Zijlstra , Wanpeng Li , Thomas Gleixner , Radim Krcmar , Mike Galbraith Date: Tue, 16 Aug 2016 10:01:44 -0400 In-Reply-To: References: <1468421405-20056-1-git-send-email-fweisbec@gmail.com> <1468421405-20056-2-git-send-email-fweisbec@gmail.com> <1470751579.13905.77.camel@redhat.com> <20160810125212.78564dc2@annuminas.surriel.com> <1470969892.13905.120.camel@redhat.com> <20160812115803.0f26211c@annuminas.surriel.com> <1471273244.32433.22.camel@redhat.com> <1471313483.32433.33.camel@redhat.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-TcZcDllR5LdvpqDL94We" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 16 Aug 2016 14:01:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-TcZcDllR5LdvpqDL94We Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2016-08-16 at 14:54 +0800, Wanpeng Li wrote: > 2016-08-16 10:11 GMT+08:00 Rik van Riel : > > On Tue, 2016-08-16 at 09:31 +0800, Wanpeng Li wrote: > > > 2016-08-15 23:00 GMT+08:00 Rik van Riel : > > > > On Mon, 2016-08-15 at 16:53 +0800, Wanpeng Li wrote: > > > > > 2016-08-12 23:58 GMT+08:00 Rik van Riel : > > > > > [...] > > > > > > Wanpeng, does the patch below work for you? > > > > >=20 > > > > > It will break steal time for full dynticks guest, and there > > > > > is a > > > > > calltrace of thread_group_cputime_adjusted call stack, RIP is > > > > > cputime_adjust+0xff/0x130. > > > >=20 > > > > How?=C2=A0=C2=A0This patch is equivalent to passing ULONG_MAX to > > > > steal_account_process_time, which you tried to no ill > > > > effect before. > > >=20 > > > https://lkml.org/lkml/2016/6/8/404/ Paolo original suggested to > > > add > > > the max cputime limit to the vtime, when the cpu is running in > > > nohz > > > full mode and stop the tick, jiffies will be updated depends on > > > clock > > > source instead of clock event device in > > > guest(tick_nohz_update_jiffies() callsite, ktime_get()), so it > > > will > > > not be affected by lost clock ticks, my patch keeps the limit for > > > vtime and remove the limit to non-vtime. However, your patch > > > removes > > > the limit for both scenarios and results in the below calltrace > > > for > > > vtime. > >=20 > > I understand what it does. > >=20 > > What I would like to understand is WHY enforcing the limit > > is the right thing when using vtime, and the wrong thing > > in all other scenarios. >=20 > I observed that function get_vtime_delta() underflow which means that > delta < other when debugging your bugfix patch, I believe that is why > Paolo suggested to add the max cputime limit to vtime, he also > pointed > out the potentional underflow before > https://lkml.org/lkml/2016/6/8/404/ Looking at get_vtime_delta() I can see exactly how the underflow can happen. =C2=A0The interval returned by account_other_time() is NOT rounded down to the nearest jiffy, while the base interval it is subtracted from is. Furthermore, even if we did not have that rounding issue, a guest could get preempted in-between determining delta, and calling account_other_time(), which could also cause the issue. Could you re-send your patch with a comment in get_vtime_delta(), as well as the changelog, explaining exactly why account_other_time() should be limited from get_vtime_delta(), but not from the other three call sites? Documentation could save future developers a bunch of debugging time on this code. thanks, Rik --=20 All Rights Reversed. --=-TcZcDllR5LdvpqDL94We Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXsxzIAAoJEM553pKExN6DbDAH/AmA0/kbOAt+ENWABL6Y5tE5 Hp0Kd89lGk9n1Epmu0ZhPFAzabnr3lD6cAvPWEomly4W9zkyPHKDZEiHyKjSB1Uw FFBrT7166xE6ybcAo7vpdYFZk7S4FO3O2LQSIRJQbMV6/+CAPlxZYQL7jr1fXrsK 3f30HU7Ux3e3zdxzZxM9AwZ3MPwWnlJS5p3/bmqXxO3Lu5nGn+wkHd+y6s3uGdwE yItQPdy0+PRsC5SagtNKY2lf01wlx0pWyTd939HqyT5KldoCvAt/idlM7dqqyypz VkGLgvCxDEG5ndo3q3wOT59FZcHBm/Akh2Dp6yHlB5CLxky7mGb4wgaANR8CP40= =Y0Ym -----END PGP SIGNATURE----- --=-TcZcDllR5LdvpqDL94We--