From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=39629 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PcORm-00056W-Ut for qemu-devel@nongnu.org; Mon, 10 Jan 2011 15:34:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PcORl-0007T2-AX for qemu-devel@nongnu.org; Mon, 10 Jan 2011 15:34:34 -0500 Received: from fmmailgate01.web.de ([217.72.192.221]:53078) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PcORk-0007Ss-OH for qemu-devel@nongnu.org; Mon, 10 Jan 2011 15:34:33 -0500 Message-ID: <4D2B6D56.1090205@web.de> Date: Mon, 10 Jan 2011 21:34:30 +0100 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments References: <4D2616D6.4080309@linux.vnet.ibm.com> <4D26D6CF.5070405@web.de> <4D27A16F.9030809@linux.vnet.ibm.com> <4D282489.90506@web.de> <4D2B6506.6070907@linux.vnet.ibm.com> <4D2B6845.7050809@web.de> <4D2B6ADD.4090505@codemonkey.ws> In-Reply-To: <4D2B6ADD.4090505@codemonkey.ws> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig685DCD12346AF976F04C916D" Sender: jan.kiszka@web.de List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Anthony Liguori , Marcelo Tosatti , qemu-devel@nongnu.org, kvm@vger.kernel.org, Alexander Graf This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig685DCD12346AF976F04C916D Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Am 10.01.2011 21:23, Anthony Liguori wrote: > On 01/10/2011 02:12 PM, Jan Kiszka wrote: >> Am 10.01.2011 20:59, Anthony Liguori wrote: >> =20 >>> On 01/08/2011 02:47 AM, Jan Kiszka wrote: >>> =20 >>>> Am 08.01.2011 00:27, Anthony Liguori wrote: >>>> >>>> =20 >>>>> On 01/07/2011 03:03 AM, Jan Kiszka wrote: >>>>> >>>>> =20 >>>>>> Am 06.01.2011 20:24, Anthony Liguori wrote: >>>>>> >>>>>> >>>>>> =20 >>>>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote: >>>>>>> >>>>>>> >>>>>>> =20 >>>>>>>> From: Jan Kiszka >>>>>>>> >>>>>>>> QEMU supports only one VM, so there is only one kvm_state per >>>>>>>> process, >>>>>>>> and we gain nothing passing a reference to it around. Eliminate = any >>>>>>>> need >>>>>>>> to refer to it outside of kvm-all.c. >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka >>>>>>>> CC: Alexander Graf >>>>>>>> Signed-off-by: Marcelo Tosatti >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> =20 >>>>>>> I think this is a big mistake. >>>>>>> >>>>>>> >>>>>>> =20 >>>>>> Obviously, I don't share your concerns. :) >>>>>> >>>>>> >>>>>> >>>>>> =20 >>>>>>> Having to manage kvm_state keeps the abstraction lines well defin= ed. >>>>>>> >>>>>>> >>>>>>> =20 >>>>>> How does it help? >>>>>> >>>>>> >>>>>> >>>>>> =20 >>>>>>> Otherwise, it's far too easy for portions of code to call into KV= M >>>>>>> functions that really shouldn't. >>>>>>> >>>>>>> >>>>>>> =20 >>>>>> I can't imagine we gain anything from requiring kvm_check_extensio= n >>>>>> callers to hold a kvm_state "capability". Yes, it's now much >>>>>> easier to >>>>>> call kvm_[vm_]ioctl, but that's the key point of this change: >>>>>> >>>>>> So far we primarily complicated the internal interface between >>>>>> generic >>>>>> and arch-dependent kvm parts by requiring kvm_state joggling. But >>>>>> external users already find interfaces without this restriction >>>>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least >>>>>> complicated to _cleanly_ pass kvm_state references to all users th= at >>>>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel >>>>>> irqchips. >>>>>> >>>>>> >>>>>> =20 >>>>> I think you're basically making my point for me. >>>>> >>>>> ioeventfd is a broken interface. It shouldn't be a VM ioctl but >>>>> rather >>>>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.= >>>>> >>>>> =20 >>>> OK, but I don't want to argue about the ioeventfd API. So let's put >>>> this >>>> case aside. :) >>>> >>>> >>>> =20 >>>>> kvm_state is available as part of CPU state so it's quite easy to >>>>> get at >>>>> if these interfaces just took a CPUState argument (and they should)= =2E >>>>> >>>>> =20 >>>> My point is definitely NOT about cpu-bound devices. That case is cle= ar >>>> and is not touched at all by this patch. >>>> >>>> My point is about devices that have clear system scope like kvmclock= , >>>> ioapic, pit, pic, >>>> =20 >>> I don't see how ioapic, pit, or pic have a system scope. >>> =20 >> They are not bound to any CPU like the APIC which you may have in mind= =2E >> =20 >=20 > And none of the above interact with KVM. >=20 > They may be replaced by KVM but if you look at the PIT, this is done by= > having two distinct devices. The KVM specific device can (and should) > be instantiated with kvm_state. >=20 > The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty. The kerne= l > devices are separate devices and that should be reflected in the device= > tree. If separate device or hack to existing one - both need to sync their user space state with the kernel when QEMU asks them to. That's how they have to interact with KVM all the time. Same for kvmclock if you want to look at a really trivial example. >=20 >>> I don't know enough about kvmclock. >>> =20 >> It's just the same. >> >> =20 >>> =20 >>>> whatever-the-future-will-bring. And about KVM services >>>> that have global scope like capability checks and other feature >>>> explorations or VM configurations done by the KVM arch code. You sti= ll >>>> didn't explain what we gain in these concrete scenarios by handing t= he >>>> technically redundant abstraction kvm_state around, especially _insi= de_ >>>> the KVM core. >>>> >>>> =20 >>> If you have to pass around a KVMState pointer, you establish an expli= cit >>> relationship and communication between subsystems. Any place where t= he >>> global KVMState is used is a red flag that something is wrong. >>> =20 >> It is and will be _only_ used inside kvm-all.c. Again: What is the >> benefit of restricting access to kvm_check_extension this way? >> =20 >=20 > The more places that need to deal with KVM compatibility code, the wors= e > we will be because it's more opportunities to get it wrong. That code belongs where the related logic is. IMHO, it would be a needless abstraction to push in-kernel access services and workaround definitions in the KVM core instead of the KVM device model code - provided there is only one user. But this discussion is a bit abstract right now as we do not yet have anything more complex than kvmclock on the table for QEMU. >=20 >>> I don't see what the advantage to making all of the KVMState global a= nd >>> implicit. It seems like a big step backwards to me. Can you give a >>> very concrete example of where you think it results in easier to >>> understand code as I don't see how making relationships implicit ever= >>> makes code easier to understand? >>> =20 >> The best example does not yet exist (fortunately): Just look at patch = 28 >> and then try to pass some kvm_state reference to the kvmclock device. = Is >> this handle worth changing the sysbus API? >> =20 >=20 > Let me look at that patch and reply there. >=20 OK, great. Jan --------------enig685DCD12346AF976F04C916D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk0rbVYACgkQitSsb3rl5xTvhgCgt7NBU/hRAFpYhgHmzsPwJ1D1 acUAoL8Pwcm4c+aEzv+SREadH35gb+9F =vo9Q -----END PGP SIGNATURE----- --------------enig685DCD12346AF976F04C916D--