From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 3/7] KVM: x86: Allow the guest to run with dirty debug
 registers
Date: Sun, 09 Mar 2014 21:07:08 +0100
Message-ID: <531CC9EC.1000804@redhat.com>
References: <1394192571-11056-1-git-send-email-pbonzini@redhat.com> <1394192571-11056-4-git-send-email-pbonzini@redhat.com> <20140309182825.GA11078@potion.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	alex.williamson@redhat.com, mtosatti@redhat.com, gleb@kernel.org,
	jan.kiszka@siemens.com
To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>
Return-path: <linux-kernel-owner@vger.kernel.org>
In-Reply-To: <20140309182825.GA11078@potion.redhat.com>
Sender: linux-kernel-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

Il 09/03/2014 19:28, Radim Kr=C4=8Dm=C3=A1=C5=99 ha scritto:
>> >  	/*
>> > +	 * Do this here before restoring debug registers on the host.  A=
nd
>> > +	 * since we do this before handling the vmexit, a DR access vmex=
it
>> > +	 * can (a) read the correct value of the debug registers, (b) se=
t
>> > +	 * KVM_DEBUGREG_WONT_EXIT again.
> We re-enable intercepts on the next exit for the sake of simplicity?
> (Batched accesses make perfect sense, but it seems we don't have to c=
are
>  about DRs at all, without guest-debug.)

It's not just for the sake of simplicity.  Synchronizing debug register=
s=20
on entry does have some cost, and it's required for running without=20
debug register intercepts.  You would incur this cost always, since=20
no-guest-debug is actually the common case.

We're well into diminishing returns; Alex timed this patch vs. one that=
=20
completely ignores MOV DR exits and (to the surprise of both of us) thi=
s=20
patch completely restored performance despite still having a few tens o=
f=20
thousands MOV DR exits/second.

In the problematic case, each context switch will do a save and restore=
=20
of debug registers; that's in total 13 debug register accesses (read 6=20
registers, write DR7 to disable all registers, write 6 registers=20
including DR7 which enables breakpoints), all very close in time.  It's=
=20
quite likely that the final DR7 write is very expensive (e.g. it might=20
have to flush the pipeline).  Also, this test case must be very heavy o=
n=20
context switches, and each context switch already incurs a few exits=20
(for example to inject the interrupt that causes it, to read the curren=
t=20
time).

A different optimization could be to skip the LOAD_DEBUG_CONTROLS=20
vm-entry control if DR7 =3D=3D host DR7 =3D=3D 0x400 && MOV DR exits ar=
e=20
enabled.  Not sure it's worthwhile though, and there's also the DEBUGCT=
L=20
MSR to take into account.  Better do these kinds of tweaks if they=20
actually show up in the profile: Alex's testcase shows that when they=20
do, the impact is massive.

>> +		kvm_x86_ops->sync_dirty_debug_regs(vcpu);
>
> Sneaky functionality ... it would be nicer to split 'enable DR
> intercepts' into a new kvm op.

Why?  "Disable DR intercepts" is already folded into the handler for MO=
V=20
DR exits.

> And we don't have to modify DR intercepts then, which is probably the
> main reason why sync_dirty_debug_regs() does two things.

Another is that indirect calls are relatively expensive and add=20
complexity; in this case they would always be used back-to-back.

Paolo