From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH 3/7] KVM: x86: Allow the guest to run with dirty debug registers Date: Sun, 09 Mar 2014 21:07:08 +0100 Message-ID: <531CC9EC.1000804@redhat.com> References: <1394192571-11056-1-git-send-email-pbonzini@redhat.com> <1394192571-11056-4-git-send-email-pbonzini@redhat.com> <20140309182825.GA11078@potion.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, alex.williamson@redhat.com, mtosatti@redhat.com, gleb@kernel.org, jan.kiszka@siemens.com To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Return-path: In-Reply-To: <20140309182825.GA11078@potion.redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org Il 09/03/2014 19:28, Radim Kr=C4=8Dm=C3=A1=C5=99 ha scritto: >> > /* >> > + * Do this here before restoring debug registers on the host. A= nd >> > + * since we do this before handling the vmexit, a DR access vmex= it >> > + * can (a) read the correct value of the debug registers, (b) se= t >> > + * KVM_DEBUGREG_WONT_EXIT again. > We re-enable intercepts on the next exit for the sake of simplicity? > (Batched accesses make perfect sense, but it seems we don't have to c= are > about DRs at all, without guest-debug.) It's not just for the sake of simplicity. Synchronizing debug register= s=20 on entry does have some cost, and it's required for running without=20 debug register intercepts. You would incur this cost always, since=20 no-guest-debug is actually the common case. We're well into diminishing returns; Alex timed this patch vs. one that= =20 completely ignores MOV DR exits and (to the surprise of both of us) thi= s=20 patch completely restored performance despite still having a few tens o= f=20 thousands MOV DR exits/second. In the problematic case, each context switch will do a save and restore= =20 of debug registers; that's in total 13 debug register accesses (read 6=20 registers, write DR7 to disable all registers, write 6 registers=20 including DR7 which enables breakpoints), all very close in time. It's= =20 quite likely that the final DR7 write is very expensive (e.g. it might=20 have to flush the pipeline). Also, this test case must be very heavy o= n=20 context switches, and each context switch already incurs a few exits=20 (for example to inject the interrupt that causes it, to read the curren= t=20 time). A different optimization could be to skip the LOAD_DEBUG_CONTROLS=20 vm-entry control if DR7 =3D=3D host DR7 =3D=3D 0x400 && MOV DR exits ar= e=20 enabled. Not sure it's worthwhile though, and there's also the DEBUGCT= L=20 MSR to take into account. Better do these kinds of tweaks if they=20 actually show up in the profile: Alex's testcase shows that when they=20 do, the impact is massive. >> + kvm_x86_ops->sync_dirty_debug_regs(vcpu); > > Sneaky functionality ... it would be nicer to split 'enable DR > intercepts' into a new kvm op. Why? "Disable DR intercepts" is already folded into the handler for MO= V=20 DR exits. > And we don't have to modify DR intercepts then, which is probably the > main reason why sync_dirty_debug_regs() does two things. Another is that indirect calls are relatively expensive and add=20 complexity; in this case they would always be used back-to-back. Paolo