From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [Bug 59521] KVM linux guest reads uninitialized pvclock values before executing rdmsr MSR_KVM_WALL_CLOCK Date: Mon, 17 Jun 2013 17:50:21 +0200 Message-ID: <51BF303D.4000001@redhat.com> References: <20130615171719.98CD711FB38@bugzilla.kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: bugzilla-daemon@bugzilla.kernel.org Return-path: Received: from mail-we0-f179.google.com ([74.125.82.179]:51558 "EHLO mail-we0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751304Ab3FQPu0 (ORCPT ); Mon, 17 Jun 2013 11:50:26 -0400 Received: by mail-we0-f179.google.com with SMTP id w59so2535497wes.10 for ; Mon, 17 Jun 2013 08:50:25 -0700 (PDT) In-Reply-To: <20130615171719.98CD711FB38@bugzilla.kernel.org> Sender: kvm-owner@vger.kernel.org List-ID: Il 15/06/2013 19:17, bugzilla-daemon@bugzilla.kernel.org ha scritto: > The problem is in cpu_init() which is called earlier. > cpu_init() calls printk and possibly other stuff which can use timestamps. > printk calls local_clock() to obtain a timestamp of a log message. On KVM > guests call sequence usually ends up in kvm_clock_read but needed rdmsr is > executed only in x86_cpuinit.early_percpu_clock_init(). > > I consider two approaches to fix the problem: > 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init(); > + Simple > - We will get excessive restrictions on operations which allowed to be > performed in early_percpu_clock_init() because percpu specific data is > initialized only in cpu_init(). Considering how simple kvm_register_clock is, I think this is preferrable if it works. Ironically, commit 7069ed6 (x86: kvmclock: allocate pvclock shared memory area, 2012-11-27), which introduced the regression, is what should make this simpler fix possible. Paolo > 2. Return 0ULL from kvm_clock_read untill it is initialized. > + Simple too > - Additional if statement inside kvm_clock_read (not serious even for > performance paranoiacs) > - Returning 0ULL looks ok because it is the same thing which kernel bootstrap > CPU does on early boot stages. But I am not quite sure. Better to ask guys who > maintain the needed relevant subsystem. > > I prefer the second way. It doesn't add complex restrictions to CPU bootup > code. I'll send a patch soon which fixes the problem in the second way. > > I don't propagate such a logic to levels higher then KVM clocksource > (pv_time_ops level for example) because of the following code: > void __init kvmclock_init(void) > ... > 263 pv_time_ops.sched_clock = kvm_clock_read; > 264 x86_platform.calibrate_tsc = kvm_get_tsc_khz; > 265 x86_platform.get_wallclock = kvm_get_wallclock; > 266 x86_platform.set_wallclock = kvm_set_wallclock; > 267 #ifdef CONFIG_X86_LOCAL_APIC > 268 x86_cpuinit.early_percpu_clock_init = > 269 kvm_setup_secondary_clock; > 270 #endif > 271 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; > 272 x86_platform.restore_sched_clock_state = > kvm_restore_sched_clock_state; > > To propagate the logic I need to make changes both in x86_platform and > pv_time_ops also I should make a similar fix for ia64 arch. It needs some > subsystems refactoring to make the changes clean. Dont' think that its worth to > fix the bug. Better to make a simple fix. >