From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76DB4C3DA63 for ; Wed, 24 Jul 2024 22:25:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:Cc:To:From:Subject:Message-ID:References:Mime-Version: In-Reply-To:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=L/QRUtBT695zkiPc+baUEFMOUTjFEHLVEpxt5NTb5GY=; b=tPxFH0yak/9ni6EI+wmrUQwDLV AuKYgcMYzmJeApy4nXvfvhR7U8RQS4Aaw2z7BJLjY3pGyED8LWvMemA9tx3nENcYpR/LJrd4G32cr ACNdAxVUphDkF6c/A/m9LEikZi90im4rCv9u/lFytM2s248PFOncBCvLdwMj0xwKBmowC8e77xU+O VUW596aiS2I0DFoh0hy24gr5sDYDmIBJfYnlMmP3i0gmBtQQcPU4VzeHXBD04zKFBJCgQT+PmcHNX lcWMxYoUL6IZjsAGH46WGyq+nB4Aw3uORywO+pVCPXOY7JVz4wGmoyiSEV7yckpVvP1AKU0SsZL4e 8YBvTuTQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sWkPz-0000000GhQw-1V67; Wed, 24 Jul 2024 22:25:07 +0000 Received: from mail-pj1-x104a.google.com ([2607:f8b0:4864:20::104a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sWkPb-0000000GhMB-0PMv for linux-arm-kernel@lists.infradead.org; Wed, 24 Jul 2024 22:24:44 +0000 Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-2cb6f2b965dso346711a91.3 for ; Wed, 24 Jul 2024 15:24:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721859882; x=1722464682; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=L/QRUtBT695zkiPc+baUEFMOUTjFEHLVEpxt5NTb5GY=; b=cO97HPwJZTSp+a5WsNLas80FEHhMbun/bpLSD9LYmgTOU8z63toXJ487Vt4THTnz6l 7ZnWtoJia/7ybW64zb/NUHfDgC+0cBG3yvX1d250MCxwUHQmTImA02NfXJ7EZCvqb9fS k++1D9kzk7MtZz97sB3HWHwMGn1pNRD3iGuglXPqMMx7ORyjX9WQtRlhw2IPpFNDMcAg dv1af9ljoAfemKiS1EEDR0/H3ufOIelmmLv6304LO8H0baU1x+KCsgrugi3ikbUuZrWq NzXzKvRI7K5v4ycXNuF9+4Hf+tUnbegsuHxxABS9H2HOkxrrJJOgJf7C5g+BVe7rr4/D ZR/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721859882; x=1722464682; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=L/QRUtBT695zkiPc+baUEFMOUTjFEHLVEpxt5NTb5GY=; b=RtVvkLd+rxY6CGRL4LnyK3iNpZ0F0Ra7RjPMMeyvV70z0LT7UNp91bZiOU4Pwwyece eMB3wDBQ1kY/ORIr8ML+J1WgEZSPEy5SVwEw9GWyMOI9nz01klhAvd3VUt85rW+9FxK9 G5ev5RTeS9E2ee0LmPb7evWHNRVJ3OxD2ENilPk+cuQ7VfudGUbRReaZDJYYA0OTmatv HPBOxh1b7hspbQnDyVXDXhbQVPcZVH6FlsaWsbRx4Ny8QXaOlD9XgO8OxNLKCzf3xHXN d3az4F4Kt1XnHb83H7wdqvFH357/uluKVDSnlH4gu/95HnD+PkdLVkifvQxQ9bI4BQii J+1g== X-Forwarded-Encrypted: i=1; AJvYcCUbTIN/QWrCTITSW9P1YvEaaSt8mBxpcpH0pX/ZNKBB1DSAm0z3lfKKoiz6zmZjLTpaIN24/8/oHxpo8MThY/Id0XwStpsxjoNAF57u8sZZ6HhbnkY= X-Gm-Message-State: AOJu0YwEe1rQII0H3qtd4kubUNC9sRLvKxwnqDnapMF09/3/UtZJBmCa g/t+ngMjSO2+l2r565kCPeKA2hKgSvoh7SWjyoBr6WUKvLFF/GOF3GbVL1tRpX1UNvATQe80tDD pYA== X-Google-Smtp-Source: AGHT+IGu5m/+SZT9ZZ6qzY5AiuNFEc1kp6NZ2DBSazvUNi30Px3Mmj/+EXBaw9mrLuXT9EHO+2AfxV+YyjA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:128e:b0:2c9:98bc:3584 with SMTP id 98e67ed59e1d1-2cf23e1ac85mr2602a91.6.1721859881485; Wed, 24 Jul 2024 15:24:41 -0700 (PDT) Date: Wed, 24 Jul 2024 15:24:39 -0700 In-Reply-To: Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> <20210916181538.968978-5-oupton@google.com> Message-ID: Subject: Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK' From: Sean Christopherson To: David Woodhouse Cc: Oliver Upton , kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Paolo Bonzini , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240724_152443_178681_147703FB X-CRM114-Status: GOOD ( 30.79 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org /cast On Wed, Jan 17, 2024, David Woodhouse wrote: > On Thu, 2021-09-16 at 18:15 +0000, Oliver Upton wrote: > >=20 > > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *k= vm, void __user *argp) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * is slightly ahead) h= ere we risk going negative on unsigned > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * 'system_time' when '= data.clock' is very small. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (kvm->arch.use_master_clo= ck) > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0now_ns =3D ka->master_kernel_ns; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (data.flags & KVM_CLOCK_R= EALTIME) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0u64 now_real_ns =3D ktime_get_real_ns(); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 * Avoid stepping the kvmclock backwards. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0if (now_real_ns > data.realtime) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0data.c= lock +=3D now_real_ns - data.realtime; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (ka->use_master_clock) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0now_raw_ns =3D ka->master_kernel_ns; >=20 > This looks wrong to me. >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0now_ns =3D get_kvmclock_base_ns(); > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ka->kvmclock_offset =3D data= .clock - now_ns; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0now_raw_ns =3D get_kvmclock_base_ns(); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ka->kvmclock_offset =3D data= .clock - now_raw_ns; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0kvm_end_pvclock_update(= kvm); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return 0; > > =C2=A0} >=20 > We use the host CLOCK_MONOTONIC_RAW plus the boot offset, as a > 'kvmclock base clock', and get_kvmclock_base_ns() returns that. The KVM > clocks for each VMs are based on this 'kvmclock base clock', each > offset by a ka->kvmclock_offset which represents the time at which that > VM was started =E2=80=94 so each VM's clock starts from zero. >=20 > The values of ka->master_kernel_ns and ka->master_cycle_now represent a > single point in time, the former being the value of > get_kvmclock_base_ns() at that moment and the latter being the host TSC > value. In pvclock_update_vm_gtod_copy(), kvm_get_time_and_clockread() > is used to return both values at precisely the same moment, from the > *same* rdtsc(). >=20 > This allows the current 'kvmclock base clock' to be calculated at any > moment by reading the TSC, calculating a delta to that reading from > ka->master_cycle_now to determine how much time has elapsed since > ka->master_kernel_ns. We can then add ka->kvmclock_offset to get the > kvmclock for this particular VM. >=20 > Now, looking at the code quoted above. It's given a kvm_clock_data > struct which contains a value of the KVM clock which is to be set as > the time "now", and all it does is adjust ka->kvmclock_offset > accordingly. Which is really simple: >=20 > now_raw_ns =3D get_kvmclock_base_ns(); > ka->kvmclock_offset =3D data.clock - now_raw_ns; >=20 > Et voil=C3=A0, now get_kvmclock_base_ns() + ka->kvmclock_offset at any gi= ven > moment in time will result in a kvmclock value according to what was > just set. Yay! >=20 > Except... in the case where the TSC is constant, we actually set > 'now_raw_ns' to a value that doesn't represent *now*. Instead, we set > it to ka->master_kernel_ns which represents some point in the *past*. > We should add the number of TSC ticks since ka->master_cycle_now if > we're going to use that, surely? Somewhat ironically, without the KVM_CLOCK_REALTIME goo, there's no need to re-read TSC, because the rdtsc() in pvclock_update_vm_gtod_copy() *just* ha= ppened. But the call to ktime_get_real_ns() could theoretically spin for a non-triv= ial amount of time if the clock is being refreshed.