From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:41757) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RvWv5-0006jP-L2 for qemu-devel@nongnu.org; Thu, 09 Feb 2012 11:32:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RvWux-0004wY-3l for qemu-devel@nongnu.org; Thu, 09 Feb 2012 11:32:27 -0500 Received: from fmmailgate03.web.de ([217.72.192.234]:35199) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RvWuw-0004vf-Ri for qemu-devel@nongnu.org; Thu, 09 Feb 2012 11:32:19 -0500 Received: from moweb002.kundenserver.de (moweb002.kundenserver.de [172.19.20.108]) by fmmailgate03.web.de (Postfix) with ESMTP id 4DFA91B0BFDF4 for ; Thu, 9 Feb 2012 17:32:15 +0100 (CET) Message-ID: <4F33F50C.2050104@web.de> Date: Thu, 09 Feb 2012 17:32:12 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <5c058df627b83bf0c35c2e1dcd92b0a3fd301181.1328445531.git.jan.kiszka@web.de> <4F33E3B3.8000205@redhat.com> <4F33E8C1.3070906@web.de> <4F33EDAD.9020000@redhat.com> In-Reply-To: <4F33EDAD.9020000@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigD13F8E2C74B7C39324288ED4" Subject: Re: [Qemu-devel] [PATCH 3/6] kvmvapic: Introduce TPR access optimization for Windows guests List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Anthony Liguori , Marcelo Tosatti , qemu-devel , kvm@vger.kernel.org, Gleb Natapov This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigD13F8E2C74B7C39324288ED4 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 2012-02-09 17:00, Avi Kivity wrote: > On 02/09/2012 05:39 PM, Jan Kiszka wrote: >> On 2012-02-09 16:18, Avi Kivity wrote: >>> On 02/05/2012 02:39 PM, Jan Kiszka wrote: >>>> From: Jan Kiszka >>>> >>>> This enables acceleration for MMIO-based TPR registers accesses of >>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled, >>>> either on older Intel CPUs (without flexpriority feature, can also b= e >>>> manually disabled for testing) or any current AMD processor. >>>> >>>> The approach introduced here is derived from the original version of= >>>> qemu-kvm. It was refactored, documented, and extended by support for= >>>> user space APIC emulation, both with and without KVM acceleration.=20 >>> >>> However, it's presented as a rewrite instead of a series of changes, = so >>> we can't see what the changes are. >> >> Yes, but the original code also depends on interfaces we don't have in= >> upstream. >=20 > The usual rant: patch qemu-kvm until it's suitable for upsteam, then > submit the end result for upstream. But you don't deserve this today. >=20 >>>> + >>>> + if (access =3D=3D TPR_ACCESS_WRITE && kvm_enabled() && >>>> + !kvm_irqchip_in_kernel()) { >>>> + /* >>>> + * KVM without TPR access reporting calls into the user spa= ce APIC on >>>> + * write with IP pointing after the accessing instruction. = So we need >>>> + * to look backward to find the reason. >>>> + */ >>> >>> Why do we need to do anything at all? >> >> We need to patch the causing instruction, so we have to know where it >> starts. Or what do you mean? >=20 > Just don't deal with this at all, no one runs on kernels without kernel= > irqchip. Not true for upstream, and not a design goal of this approach, specifically when considering that it also works with TCG. Would be a pity to lose this generality. >=20 >>> >>> I'm not sure if the ABI guarantees anything about %rip. >> >> That's indeed a point. It's likely coupled to the emulator's internals= >> and when it calls out to user space for MMIO write. How to deal with i= t? >=20 > One way is to verify that it worked this way at least N versions back, > and then retro-doc it. The downside is that it reduces our flexibility= > in the future, but I think that's a small downside. It need not reduce our flexibility, we just need to signal to user space when our behaviour changes again. >=20 > Another way is not to do it at all. >=20 >>>> +static int get_kpcr_number(CPUState *env) >>>> +{ >>>> + struct kpcr { >>>> + uint8_t fill1[0x1c]; >>>> + uint32_t self; >>>> + uint8_t fill2[0x31]; >>>> + uint8_t number; >>>> + } QEMU_PACKED kpcr; >>>> + >>>> + if (smp_cpus =3D=3D 1) { >>>> + return 0; >>>> + } >>>> + if (cpu_memory_rw_debug(env, env->segs[R_FS].base, >>>> + (void *)&kpcr, sizeof(kpcr), 0) < 0 || >>>> + kpcr.self !=3D env->segs[R_FS].base) { >>> >>> Ah, so it works. We may want to do it for UP as well, as a way of >>> verifying that the guest is compatible with these hacks. >> >> I'm not sure if Windows has this properly set up for the UP HAL. I >> rather think this was a bug in the original implementation. The ROM us= es >> 0 as CPU index in UP mode unconditionally, so should we in QEMU. >=20 > I mean just check kpcr.self. Yes, clear, but that means that Windows must have initialized FS.base to point to the KPCR also in UP mode. Is that really the case? E.g. when ACPI is off?! I wonder if that explains the reported bug of qemu-kvm with -no-acpi and in-kernel irqchip... >=20 > The reason up and smp are so different is that for a long while smp > didn't work at all, and for some time it used even uglier hacks than we= > have today (like stashing the cpu id in TR.sel), so I didn't want to > pollute th up code with the smp ugliness. It's also marginally faster > (less locked ops), though I doubt it matters on today's processors. >=20 >>> >>>> + >>>> + memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, r= om_paddr, >>>> + rom_size); >>>> + memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 100= 0); >>> >>> This is incredibly hacky, so at least the spirit of the original code= is >>> preserved. >> >> I know, and it caused some pain to write it (not only to find out how = to >> solve it technically). We would need to pass the RAM memory region dow= n >> to this freaky device, like we do to the i440fx for PAM purposes. But,= >> well, that is not straightforward right now. Better ideas welcome. >=20 > Could we make it a child<> of i440FX, and have i440FX pass it the > MemoryRegion directly? >=20 > It means we'll need to redo the glue for new chipsets, but it should be= > just a few lines. Hmm... not really nice. It is rather a child of the APIC than of the chipset IMHO. Moving it over would also mean establishing logical link to the APIC from there. It is really an ugly beast... Jan --------------enigD13F8E2C74B7C39324288ED4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8z9QwACgkQitSsb3rl5xQEVACbBUlFVjRwoyBbFMVcs7urs2Am XIcAoOOyN+Z22ehdUChJJdbXxX++t4I3 =xrSl -----END PGP SIGNATURE----- --------------enigD13F8E2C74B7C39324288ED4--