From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH v3] x86: svm: use kvm_fast_pio_in() Date: Tue, 07 Apr 2015 14:55:45 +0200 Message-ID: <5523D3D1.7090909@redhat.com> References: <20150302210202.2951.56810.stgit@joelvmguard2.amd.com> <20150303164235.GB2494@potion.brq.redhat.com> <54F61029.3060101@amd.com> <20150303204206.GH25123@potion.brq.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gleb Natapov , kvm@vger.kernel.org, David Kaplan , Joerg Roedel , linux-kernel@vger.kernel.org, Borislav Petkov To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Joel Schopp Return-path: Received: from mx1.redhat.com ([209.132.183.28]:40033 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753903AbbDGMzw (ORCPT ); Tue, 7 Apr 2015 08:55:52 -0400 In-Reply-To: <20150303204206.GH25123@potion.brq.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 03/03/2015 21:42, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > 2015-03-03 13:48-0600, Joel Schopp: >>>> + unsigned long new_rax =3D kvm_register_read(vcpu, VCPU_REGS_RAX)= ; >>> Shouldn't we handle writes in EAX differently than in AX and AL, be= cause >>> of implicit zero extension. >> I don't think the implicit zero extension hurts us here, but maybe t= here >> is something I'm missing that I need understand. Could you explain t= his >> further? >=20 > According to APM vol.2, 2.5.3 Operands and Results, when using EAX, > we should zero upper 32 bits of RAX: >=20 > Zero Extension of Results. In 64-bit mode, when performing 32-bit > operations with a GPR destination, the processor zero-extends the 3= 2-bit > result into the full 64-bit destination. Both 8-bit and 16-bit > operations on GPRs preserve all unwritten upper bits of the destina= tion > GPR. This is consistent with legacy 16-bit and 32-bit semantics for > partial-width results. >=20 > Is IN not covered? It is. You need to zero the upper 32 bits. >>>> + BUG_ON(!vcpu->arch.pio.count); >>>> + BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_r= ax)); >>> (Looking at it again, a check for 'vcpu->arch.pio.count =3D=3D 1' w= ould be >>> sufficient.) >> I prefer the checks that are there now after your last review, >> especially since surrounded by BUG_ON they only run on debug kernels= =2E >=20 > BUG_ON is checked on essentially all kernels that run KVM. > (All distribution-based configs should have it.) Correct. > If we wanted to validate the size, then this is strictly better: > BUG_ON(vcpu->arch.pio.count !=3D 1 || vcpu->arch.pio.size > sizeof(= new_rax)) That would be a very weird assertion considering that vcpu->arch.pio.size will architecturally be at most 4. The first arm of the || is sufficient. >>>> + memcpy(&new_rax, vcpu, sizeof(new_rax)); >>>> + trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.si= ze, >>>> + vcpu->arch.pio.count, vcpu->arch.pio_data); >>>> + kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); >>>> + vcpu->arch.pio.count =3D 0; >>> I think it is better to call emulator_pio_in_emulated directly, lik= e >>> >>> emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.si= ze, >>> vcpu->arch.pio.port, &new_rax, 1); >>> kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); >>> >>> because we know that vcpu->arch.pio.count !=3D 0. >=20 > Pasting the same code creates bug opportunities when we forget to mod= ify > all places. This class of problems can be harder to deal with, that = (c) > and (d), because we can't simply print all callers. I agree with this and prefer calling emulator_pio_in_emulated in complete_fast_pio_in, indeed. >>> Refactoring could avoid the weird vcpu->ctxt->vcpu conversion. >>> (A better name is always welcome.) No need for that. >> The pointer chasing is making me dizzy. I'm not sure why >> emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does i= t >> immediately translate that to a vcpu and never use the x86_emulate_c= txt, >> why not pass the vcpu in the first place? Because the emulator is written to be usable outside the Linux kernel a= s well. Also, the fast path (used if kernel_pio returns 0) doesn't read VCPU_REGS_RAX, thus using an uninitialized variable here: >>> + unsigned long val; >>> + int ret =3D emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, si= ze, >>> + port, &val, 1); >>> + >>> + if (ret) >>> + kvm_register_write(vcpu, VCPU_REGS_RAX, val); Thanks, Paolo