From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state Date: Tue, 17 Nov 2009 10:05:00 +0200 Message-ID: <4B02592C.6060004@redhat.com> References: <4B018542.3020602@siemens.com> <4B01A487.3020808@redhat.com> <4B01C2B0.3000205@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm , Gleb Natapov To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:49667 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753309AbZKQIE4 (ORCPT ); Tue, 17 Nov 2009 03:04:56 -0500 In-Reply-To: <4B01C2B0.3000205@web.de> Sender: kvm-owner@vger.kernel.org List-ID: On 11/16/2009 11:22 PM, Jan Kiszka wrote: > Avi Kivity wrote: > >> On 11/16/2009 07:00 PM, Jan Kiszka wrote: >> >>> This patch aims at addressing the mp_state writeback issue in a cleaner >>> fashion. >>> >> What's the issue? the fact that mp_state is updated whenever state is >> synchronized, while it could be simultaneously updated from other vcpus >> (which latter updates are then lost)? >> > Right, the issue b8a7857071 addressed. But that approach spreads more > kvm_* fragments in unrelated qemu code, e.g. the monitor, and fails to > update other parts (gdbstub). And it doesn't care about what happens if > kvm is off at build or runtime. Such things are better addressed in > upstream by encapsulating kvm calls in synchronization points. > Note we have the same issue with nmi and the sipi vector - any vcpu state that is updated outside the vcpu thread. These are particularly bad since we can't exclude them from updates without excluding other state as well. The whole issue is tricky. I'm inclined to pretend we never meant any vcpu state (outside lapic) to be asynchronous and declare the whole thing a bug. We could fix it by modeling external changes to state (INIT, SIPI, NMI) as messages queued to the vcpu, to be processed in the vcpu thread. The queue would be drained before running the vcpu or before reading state from userspace, so the message queue contents can never be observed and never lost. Of course, we can't really implement this as a queue (SIGSTOP vcpu thread -> overflow), but a word is sufficient. INIT writes the word, everything else uses compare-and-swap or set_bit to raise events (e.g. SIPI = do { oldq = vcpu->queue; newq = (oldq & ~SIPI_MASK) | sipi_vector | RUNNING; } while (!cas(&vcpu->queue, oldq, newq))) -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.