From: Jan Kiszka <jan.kiszka@siemens.com>
To: Blue Swirl <blauwirbel@gmail.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Gleb Natapov <gleb@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
Date: Mon, 13 Feb 2012 20:22:21 +0100 [thread overview]
Message-ID: <4F3962ED.1070109@siemens.com> (raw)
In-Reply-To: <CAAu8pHuqpSMnuO+3O-yBmXniNs5+Qze=5X8xj0-yt2Xk7QDMNQ@mail.gmail.com>
On 2012-02-13 19:50, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-02-11 16:25, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> This enables acceleration for MMIO-based TPR registers accesses of
>>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>>> either on older Intel CPUs (without flexpriority feature, can also be
>>>> manually disabled for testing) or any current AMD processor.
>>>>
>>>> The approach introduced here is derived from the original version of
>>>> qemu-kvm. It was refactored, documented, and extended by support for
>>>> user space APIC emulation, both with and without KVM acceleration. The
>>>> VMState format was kept compatible, so was the ABI to the option ROM
>>>> that implements the guest-side para-virtualized driver service. This
>>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>>> between KVM and TCG mode.
>>>>
>>>> The basic concept goes like this:
>>>> - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>> irqchip) a vmcall hypercall is registered
>>>> - VAPIC option ROM is loaded into guest
>>>> - option ROM activates TPR MMIO access reporting via port 0x7e
>>>> - TPR accesses are trapped and patched in the guest to call into option
>>>> ROM instead, VAPIC support is enabled
>>>> - option ROM TPR helpers track state in memory and invoke hypercall to
>>>> poll for pending IRQs if required
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> I must say that I find the approach horrible, patching guests and ROMs
>>> and looking up Windows internals. Taking the same approach to extreme,
>>> we could for example patch Xen guest to become a KVM guest. Not that I
>>> object merging.
>>
>> Yes, this is horrible. But there is no real better way in the absence of
>> hardware assisted virtualization of the TPR. I think MS is recommending
>> this patching approach as well.
>
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.
Gleb answered it already.
>>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>>> {
>>>> APICCommonState *s = APIC_COMMON(dev);
>>>> APICCommonClass *info;
>>>> + static DeviceState *vapic;
>>>> static int apic_no;
>>>>
>>>> if (apic_no >= MAX_APICS) {
>>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>>> info = APIC_COMMON_GET_CLASS(s);
>>>> info->init(s);
>>>>
>>>> - sysbus_init_mmio(&s->busdev, &s->io_memory);
>>>> + sysbus_init_mmio(dev, &s->io_memory);
>>>> +
>>>> + if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>>>> + vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>>>> + }
>>>> + s->vapic = vapic;
>>>> + if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>>
>>> I think you should not rely on apic_report_tpr_access being in sane
>>> condition during class init.
>>
>> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
>> consistent accross all VCPUs. Therefore it is a static global, set to
>> false initially. However, you are right, we lack proper clearing of the
>> access report feature on reset, not only in this variable.
>
> I'd also set it to false initially.
It's a global variable, thus initialized to false by definition.
>>>> +
>>>> +#define VAPIC_CPU_SHIFT 7
>>>> +
>>>> +#define ROM_BLOCK_SIZE 512
>>>> +#define ROM_BLOCK_MASK (~(ROM_BLOCK_SIZE - 1))
>>>> +
>>>> +typedef struct VAPICHandlers {
>>>> + uint32_t set_tpr;
>>>> + uint32_t set_tpr_eax;
>>>> + uint32_t get_tpr[8];
>>>> + uint32_t get_tpr_stack;
>>>> +} QEMU_PACKED VAPICHandlers;
>>>> +
>>>> +typedef struct GuestROMState {
>>>> + char signature[8];
>>>> + uint32_t vaddr;
>>>
>>> This does not look 64 bit clean.
>>
>> It's packed.
>
> I meant "virtual address could be 64 bits on a 64 bit host", not
> structure packing.
This is for 32-bit guests only. 64-bit Windows doesn't access the TPR
via MMIO, thus is not activating the VAPIC.
>>>> + uint32_t state;
>>>> + uint32_t rom_state_paddr;
>>>> + uint32_t rom_state_vaddr;
>>>> + uint32_t vapic_paddr;
>>>> + uint32_t real_tpr_addr;
>>>> + GuestROMState rom_state;
>>>> + size_t rom_size;
>>>> +} VAPICROMState;
>>>> +
>>>> +#define TPR_INSTR_IS_WRITE 0x1
>>>> +#define TPR_INSTR_ABS_MODRM 0x2
>>>> +#define TPR_INSTR_MATCH_MODRM_REG 0x4
>>>> +
>>>> +typedef struct TPRInstruction {
>>>> + uint8_t opcode;
>>>> + uint8_t modrm_reg;
>>>> + unsigned int flags;
>>>> + size_t length;
>>>> + off_t addr_offset;
>>>> +} TPRInstruction;
>>>
>>> Also here the order is pessimized.
>>
>> Don't see the gain here, though.
>
> There are two bytes' hole between modrm_reg and flags, maybe also 4
> bytes between length and addr_offset (if size_t is 32 bits but off_t
> 64 bits). I'd reverse the order so that members with largest alignment
> needs come first.
Well, but this won't make the struct smaller. I prefer to keep the
ordering in which we also initialize it.
>
>>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>>>> +{
>>>> + target_phys_addr_t paddr;
>>>> + target_ulong addr;
>>>> +
>>>> + if (s->state == VAPIC_ACTIVE) {
>>>> + return 0;
>>>> + }
>>>> + for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>>>> + paddr = cpu_get_phys_page_debug(env, addr);
>>>> + if (paddr != APIC_DEFAULT_ADDRESS) {
>>>> + continue;
>>>> + }
>>>> + s->real_tpr_addr = addr + 0x80;
>>>> + update_guest_rom_state(s);
>>>> + return 0;
>>>> + }
>>>
>>> This loop looks odd, what should it do, probe for unused address?
>>
>> Seems to deserve a comment: We have to scan for the guest's mapping of
>> the APIC as we enter here without a hint from an TPR accessing
>> instruction. So we probe the potential range, trying to find the page
>> that maps to that known physical address (known in the sense that
>> Windows does not remap the APIC physically - nor does QEMU support that
>> so far).
>
> Yes, more comments would be nice, especially on theory of operation.
>
>>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>>>> + target_ulong *pip, int access)
>>>> +{
>>>> + const TPRInstruction *instr;
>>>> + target_ulong ip = *pip;
>>>> + uint8_t opcode[2];
>>>> + uint32_t real_tpr_addr;
>>>> + int i;
>>>> +
>>>> + if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
>>>
>>> The constants should be using ULL suffix because target_ulong could be
>>> 64 bit, though maybe this is more optimal.
>>
>> target_ulong is 64-bit unconditionally on x86. I'll add this.
>
> No, target_phys_addr_t is now 64 bits, but target_ulong (register
> size) is 32 bits for i386-softmmu.
Ah, right.
>
>>>> +
>>>> +/*
>>>> + * Tries to read the unique processor number from the Kernel Processor Control
>>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>>>> + * or is considered invalid.
>>>> + */
>>>
>>> Horrible hack. Is guest OS type or version checked somewhere?
>>
>> This is all about hacking Windows 32-bit. And this check encodes that
>> even stronger. The other important binding is the expected virtual
>> address of the ROM mapping under Windows. I would have preferred
>> checking the version directly, but no one has a complete list of
>> supported guests and their codes.
>
> Then it would be nice to only enable this with a command line switch,
> so that some random poor non-Windows OS is not patched incorrectly.
I had the same concern, but there is no need to worry, we have
sufficient checks that no other guest is affected. And we do have a
switch as well, the APIC property. But this is better left on by default
to please our guests with optimal performance.
>
>>>
>>>> +static int get_kpcr_number(CPUState *env)
>>>> +{
>>>> + struct kpcr {
>>>> + uint8_t fill1[0x1c];
>>>> + uint32_t self;
>>>> + uint8_t fill2[0x31];
>>>> + uint8_t number;
>>>> + } QEMU_PACKED kpcr;
>>>
>>> KPCR. Pointers to Windows documentation would be nice.
>>
>> Oops, yes.
>>
>> Unfortunately, this is only an internal structure, not officially
>> documented by MS. However, all supported OS versions a legacy by now, no
>> longer changing its structure.
>
> This and a note about the supported OS versions could be added as comment.
OK.
For the folks that developed it in qemu-kvm: This targets Windows XP,
Vista and Server 2003, all 32-bit, right?
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2012-02-13 19:22 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 18:31 [Qemu-devel] [PATCH v2 0/8] uq/master: TPR access optimization for Windows guests Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once Jan Kiszka
2012-02-11 10:02 ` Blue Swirl
2012-02-11 10:06 ` Jan Kiszka
2012-02-11 11:25 ` Blue Swirl
2012-02-11 11:49 ` Andreas Färber
2012-02-11 12:43 ` Jan Kiszka
2012-02-11 13:06 ` Andreas Färber
2012-02-11 13:07 ` Jan Kiszka
2012-02-11 13:21 ` Andreas Färber
2012-02-11 13:35 ` Jan Kiszka
2012-02-11 13:59 ` Andreas Färber
2012-02-11 14:02 ` Jan Kiszka
2012-02-11 14:12 ` Andreas Färber
2012-02-11 14:24 ` Jan Kiszka
2012-02-11 14:49 ` Andreas Färber
2012-02-13 8:17 ` Paolo Bonzini
2012-02-11 13:54 ` Blue Swirl
2012-02-11 14:00 ` Jan Kiszka
2012-02-11 14:11 ` Blue Swirl
2012-02-11 14:18 ` Jan Kiszka
2012-02-11 14:23 ` Blue Swirl
2012-02-11 12:41 ` Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context Jan Kiszka
2012-02-11 14:16 ` Blue Swirl
2012-02-11 14:31 ` Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses Jan Kiszka
2012-02-11 14:32 ` Blue Swirl
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 4/8] kvmvapic: Add option ROM Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests Jan Kiszka
2012-02-11 15:25 ` Blue Swirl
2012-02-13 10:16 ` Jan Kiszka
2012-02-13 18:50 ` Blue Swirl
2012-02-13 19:11 ` Gleb Natapov
2012-02-13 19:22 ` Jan Kiszka [this message]
2012-02-14 7:54 ` Gleb Natapov
2012-02-14 8:55 ` Jan Kiszka
2012-02-14 8:59 ` Gleb Natapov
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 6/8] kvmvapic: Simplify mp/up_set_tpr Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 7/8] optionsrom: Reserve space for checksum Jan Kiszka
2012-02-11 11:46 ` Andreas Färber
2012-02-11 12:45 ` Jan Kiszka
2012-02-11 12:51 ` Andreas Färber
2012-02-11 12:57 ` Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] [PATCH v2 8/8] kvmvapic: Use optionrom helpers Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F3962ED.1070109@siemens.com \
--to=jan.kiszka@siemens.com \
--cc=aliguori@us.ibm.com \
--cc=avi@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).