From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: KVM: x86: fix kvmclock write race (v2) Date: Mon, 20 Apr 2015 14:54:22 +0200 Message-ID: <20150420125422.GA26491@potion.brq.redhat.com> References: <20150417233800.GA18714@amt.cnet> <20150418002010.GA23227@amt.cnet> <20150418002310.GA23898@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Andy Lutomirski , kvm list , Paolo Bonzini To: Marcelo Tosatti Return-path: Received: from mx1.redhat.com ([209.132.183.28]:36477 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755011AbbDTMy2 (ORCPT ); Mon, 20 Apr 2015 08:54:28 -0400 Content-Disposition: inline In-Reply-To: <20150418002310.GA23898@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: 2015-04-17 21:23-0300, Marcelo Tosatti: >=20 > From: Radim Kr=C4=8Dm=C3=A1=C5=99 >=20 > As noted by Andy Lutomirski, kvm does not follow the documented versi= on > protocol. Fix it. >=20 > Note: this bug results in a race which can occur if the following thr= ee > conditions are met: >=20 > 1) There is KVM guest time update (there is one every 5 minutes). >=20 > 2) Which races with a thread in the guest in the following way: > The execution of these 29 instructions has to take at _least_ > 2 seconds (rebalance interval is 1 second). >=20 > lsl %r9w,%esi > mov %esi,%r8d > and $0x3f,%esi > and $0xfff,%r8d > test $0xfc0,%r8d > jne 0xa12 > shl $0x6,%rsi > mov -0xa01000(%rsi),%r10d > data32 xchg %ax,%ax > data32 xchg %ax,%ax > rdtsc =20 > shl $0x20,%rdx > mov %eax,%eax > movsbl -0xa00fe4(%rsi),%ecx > or %rax,%rdx > sub -0xa00ff8(%rsi),%rdx > mov -0xa00fe8(%rsi),%r11d > mov %rdx,%rax > shl %cl,%rax > test %ecx,%ecx > js 0xa08 > mov %r11d,%edx > movzbl -0xa00fe3(%rsi),%ecx > mov -0xa00ff0(%rsi),%r11 > mul %rdx > shrd $0x20,%rdx,%rax > data32 xchg %ax,%ax > data32 xchg %ax,%ax > lsl %r9w,%edx >=20 > 3) Scheduler moves the task, while executing these 29 instructions, t= o a > destination processor, then back to the source processor. >=20 > 4) Source processor, after has been moved back from destination, > perceives data out of order as written by processor performing guest > time update (item 1), with string mov. >=20 > Given the rarity of this condition, and the fact it was never observe= d > or reported, reverting pvclock vsyscall on systems whose host is > susceptible to the race, seems an excessive measure. >=20 > Signed-off-by: Marcelo Tosatti > Cc: stable@kernel.org Thanks. Reviewed-or-Signed-off-by: Radim Kr=C4=8Dm=C3=A1=C5=99 Like most code, I would have written it differently now :) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &guest_hv_clock, > + sizeof(guest_hv_clock)); The easiest optimization is replacing sizeof(guest_hv_clock) with offsetof(typeof(guest_hv_clock), version) + sizeof(guest_hv_clock.ver= sion) because kvm_write_guest_cached() allows writing of prefixes. This still won't get optimized to a simple MOV at compile time, but saves few mov bytes. (Offset of version is 0 now, so using 'sizeof guest_hv_clock.version' i= s just a minor offence sand saves some hard to read code.)