From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [Qemu-devel] [RFC][PATCH] KVM: Introduce modification context for cpu_synchronize_state Date: Fri, 29 Jan 2010 09:44:33 -0600 Message-ID: <4B630261.40307@codemonkey.ws> References: <4B605390.9020709@siemens.com> <4B62F5CD.2040709@codemonkey.ws> <4B62FDD0.4050904@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Avi Kivity , Alexander Graf , Anthony Liguori , kvm , Gleb Natapov , qemu-devel To: Jan Kiszka Return-path: Received: from mail-iw0-f186.google.com ([209.85.223.186]:38670 "EHLO mail-iw0-f186.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751059Ab0A2Poh (ORCPT ); Fri, 29 Jan 2010 10:44:37 -0500 Received: by iwn16 with SMTP id 16so2082505iwn.5 for ; Fri, 29 Jan 2010 07:44:36 -0800 (PST) In-Reply-To: <4B62FDD0.4050904@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On 01/29/2010 09:25 AM, Jan Kiszka wrote: > Anthony Liguori wrote: > >> On 01/27/2010 08:54 AM, Jan Kiszka wrote: >> >>> This patch originates in the mp_state writeback issue: During runtime >>> and even on reset, we must not write the previously saved VCPU state >>> back into the kernel in an uncontrolled fashion. E.g mp_state should >>> only written on reset or on VCPU setup. Certain clocks (e.g. the TSC) >>> may only be written on setup or after vmload. >>> >>> By introducing additional information about the context of the planned >>> vcpu state manipulation, we can simply skip sensitive states like >>> mp_state when updating the in-kernel state. The planned modifications >>> are defined when calling cpu_synchronize_state. They accumulate, ie. >>> once a full writeback was requested, it will stick until it was >>> performed. >>> >>> This patch already fixes existing writeback issues in upstream KVM by >>> only selectively writing MSR_IA32_TSC, MSR_KVM_SYSTEM_TIME, >>> MSR_KVM_WALL_CLOCK, the mp_state and the vcpu_events. >>> >>> Signed-off-by: Jan Kiszka >>> >>> >> I think the context argument makes the function very difficult to call >> correctly. >> >> I'd suggest making CPU_MODIFY_RUNTIME the behaviour of >> cpu_synchronize_state. I'd suggest adding another function to handle >> init like cpu_init_state(). Likewise, if an explicit reset state is >> needed, I think a cpu_init_state_after_reset() makes sense. >> >> I don't quite understand the context that NONE should be used in. >> > 'context' was the wrong term, it should rather be 'scheduled vcpu state > modifications'. > > >> I think the key point though is to handle RUNTIME mostly transparently >> since it's the most common case. >> > This whole topic is complex and requires at least some cooperation from > the users of this API. Previous attempts to make this transparent caused > way too many bugs. E.g. the idea that writeback could simply be handled > on vcpu exec didn't fly, and qemu-kvm demonstrates the result (lots of > kvm hooks, fragile code). > > So I'm about to carefully remove some transparency. The key to this is > proper announcement of planned and/or performed changes (abstracted to > those three levels "runtime", "reset", and "init"). > > I will think about your suggestions. Maybe it makes sense to > (re-)introduce explicit writeback points as generic services, and we > should keep the common case as is (dropping my optimization > CPU_MODIFY_NONE). > I can understand the argument you're making but if you do decide to keep the context argument, then I'd strongly suggest adding a bucket full of documentation explaining when each state should be used. At the moment, it's not clear at first glance. Regards, Anthony Liguori > Jan > >