From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: XP machine freeze Date: Mon, 13 Apr 2015 16:01:07 +0200 Message-ID: <552BCC23.9030409@redhat.com> References: <009701d05ffb$5e37a740$1aa6f5c0$@astim.si> <550EE047.3030605@fnarfbargle.com> <5519BBF4.7080600@redhat.com> <552B40F7.5080107@fnarfbargle.com> <552BB8D5.7060200@redhat.com> <838B5A50-4E6F-4006-8A49-A464C43BF97F@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Brad Campbell , Saso Slavicic , kvm@vger.kernel.org, =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= To: Nadav Amit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:58519 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932289AbbDMOBO (ORCPT ); Mon, 13 Apr 2015 10:01:14 -0400 In-Reply-To: <838B5A50-4E6F-4006-8A49-A464C43BF97F@gmail.com> Sender: kvm-owner@vger.kernel.org List-ID: On 13/04/2015 15:34, Nadav Amit wrote: > Paolo, >=20 > I hope I am not misleading or interrupting, and I am obviously very b= iased =E2=80=94 > but couldn=E2=80=99t it be related to the issue that patch f210f7572b= ed ("KVM: x86: > Fix lost interrupt on irr_pending race=E2=80=9D) deals with? >=20 > I got this issue first when I upgraded to 3.17 in my testing environm= ent, > since apparently a race got worse due to patch 56cc2406d68c. Did anyo= ne try > 3.19 that has this fix? That's a much better guess than mine. Especially because it would also explain how Saso is reproducing it on CentOS 6 (but still less easily than Brad who has 56cc2406d68c0f09505c389e276f27a99f495cbd. Paolo > Regards, > Nadav >=20 > Paolo Bonzini wrote: >=20 >> >> >> On 13/04/2015 06:07, Brad Campbell wrote: >>> On 31/03/15 05:11, Paolo Bonzini wrote: >>>> On 22/03/2015 16:31, Brad Campbell wrote: >>>>> No help I'm afraid, but at least I can conclusively say that 3.16= is >>>>> good, and 3.17 is bad. >>>> Can you try more specifically around the first KVM pull request? = That >>>> would be between c9b88e958182 (presumed good) and 8533ce727188 (pr= esumed >>>> bad)? >>> >>> >>> G'day Paolo. >>> >>> I can confirm that the fault appears to lie between good and bad as >>> specified above. >>> Bad failed before 48 hours, good ran for 143 hours. I'm bisecting n= ow. >> >> Thanks! Remember to bisect only with arch/x86/kvm. >> >> Also: >> >> 1) Brad, I see you are on AMD. Have you ever reproduced it on Intel= ? >> Saso, are you on AMD as well? >> >> If so, the most likely culprit is this: >> >> commit 6addfc42992be4b073c39137ecfdf4b2aa2d487f >> Author: Paolo Bonzini >> Date: Thu Mar 27 11:29:28 2014 +0100 >> >> KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation >> >> Despite the provisions to emulate up to 130 consecutive instructi= ons, in >> practice KVM will emulate just one before exiting handle_invalid_= guest_state, >> because x86_emulate_instruction always sets KVM_REQ_EVENT. >> >> However, we only need to do this if an interrupt could be injecte= d, >> which happens a) if an interrupt shadow bit (STI or MOV SS) has g= one >> away; b) if the interrupt flag has just been set (other instructi= ons >> than STI can set it without enabling an interrupt shadow). >> >> This cuts another 700-900 cycles from the cost of emulating an >> instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles >> before the patch on kvm-unit-tests, 925-1700 afterwards). >> >> Signed-off-by: Paolo Bonzini >> >> I would first try this one, and see if it is bad. >> >> Radim, do you think this could cause a missed interrupt injection >> after Windows does a TPR write? >> >> 2) For bisection feel free to "git bisect skip" the following: >> >> 03916db9348c079d8d214f971cc114bb51c6b869 Replace NR_VMX_MSR with its= definition >> 9a2a05b9ed618b1bb6d4cbec0c2e1f80d6636609 KVM: nVMX: clean up nested_= release_vmcs12 and code around it >> 4fa7734c62cdd8c07edd54fa5a5e91482273071a KVM: nVMX: fix lifetime iss= ues for vmcs02 >> c9cdd085bb75226879fd468b88e2e7eb467325b7 KVM: x86: Defining missing = x86 vectors >> 0123be429fef40f067e5b1811576c3994229f59e KVM: x86: Assertions to che= ck no overrun in MSR lists >> 296f047502f1b3ddfd63adbc192624ce80740081 KVM: vmx: remove duplicate = vmx_mpx_supported() prototype >> 963fee1656603ce2e91ebb988cd5a92f2af41369 KVM: nVMX: Fix virtual inte= rrupt delivery injection >> 6cbc5f5a80a9ae5a80bc81efc574b5a85bfd4a84 KVM: nSVM: Set correct port= for IOIO interception evaluation >> 6493f1574e898b46370e2b2315836d76a1980f2c KVM: nSVM: Fix IOIO size re= ported on emulation >> 9bf418335e24da995ea682a028926d7e1036be6f KVM: nSVM: Fix IOIO bitmap = evaluation >> 62baf44cad3bc6b37115cc21e4228fe53d4f3474 KVM: nSVM: Do not report CL= TS via SVM_EXIT_WRITE_CR0 to L1 >> 5381417f6a51293e7b8af1eb18aefa5d47976a71 KVM: nVMX: Fix returned val= ue of MSR_IA32_VMX_VMCS_ENUM >> 2996fca0690f03a5220203588f4a0d8c5acba2b0 KVM: nVMX: Allow to disable= VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS >> 560b7ee12ca5e1ebc1675d7eb4008bb22708277a KVM: nVMX: Fix returned val= ue of MSR_IA32_VMX_PROCBASED_CTLS >> 3dcdf3ec6e48d918741ea11349d4436d0c5aac93 KVM: nVMX: Allow to disable= CR3 access interception >> 3dbcd8da7b564194f93271b003a1c46ef404cbdb KVM: nVMX: Advertise suppor= t for MSR_IA32_VMX_TRUE_*_CTLS >> bc39c4db7110f88f338cbbabe53d3e43c7400a59 arch/x86/kvm/vmx.c: use PAG= E_ALIGNED instead of IS_ALIGNED(PAGE_SIZE >> e4aa5288ff07766d101751de9a8420d666c61735 KVM: x86: Fix constant valu= e of VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS >> 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b x86/kvm: Resolve shadow war= nings in macro expansion >> b55a8144d1807f9e74c51cb584f0dd198483d86c x86/kvm: Resolve shadow war= ning from min macro >> 98eff52ab5c0ff5cb96940a93e99a1aeb2f11c89 KVM: x86: Fix lapic.c debug= prints >> 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23 arch: x86: kvm: x86.c: Clea= ning up variable is set more than once >> 80112c89ed872c725e7dc39ccf6c37d1a585e161 KVM: Synthesize G bit for a= ll segments. >> 27e6fb5dae2819d17f38dc9224692b771e989981 KVM: vmx: vmx instructions = handling does not consider cs.l >> bdc907222c5e4edd848da0c031deb55b59f1cf9a KVM: emulate: fix harmless = typo in MMX decoding >> 10e38fc7cabc668738e6a7b7b57cbcddb2234440 KVM: x86: Emulator flag for= instruction that only support 16-bit addresses in real mode >> 68efa764f3429f2bd71f431e91c04b0bcb7d34f1 KVM: x86: Emulator support = for #UD on CPL>0 >> >> The following can be skipped assuming you are on 32-bit XP: >> >> 1e32c07955b43e7f827174bf320ed35971117275 KVM: vmx: handle_cr ignores= 32/64-bit mode >> a449c7aa51e10c9bde0ea9bee4e682d6d067ebab KVM: x86: Hypercall handlin= g does not considers opsize correctly >> 5777392e83c96e3a0799dd2985598e0fc76cf4aa KVM: x86: check DR6/7 high-= bits are clear only on long-mode >> a825f5cc4a8455663562809748240169cb9bc2c0 KVM: x86: NOP emulation cle= ars (incorrectly) the high 32-bits of RAX >> 140bad89fd25db1aab60f80ed7874e9a9bdbae3b KVM: x86: emulation of dwor= d cmov on long-mode should clear [63:32] >> 7dec5603b6b8dc4c3e1c65d318bd2a5a8c62a424 KVM: x86: bit-ops emulation= ignores offset on 64-bit >> 2eedcac8a97cef43c9c5236398fc8c9d0fd9cc0c KVM: x86: Loading segments = on 64-bit mode may be wrong >> e37a75a13cdae5deaa2ea2cbf8d55b5dd08638b6 KVM: x86: Emulator ignores = LDTR/TR extended base on LLDT/LTR >> >> And I think the following can be skipped safely too: >> >> 9e8919ae793f4edfaa29694a70f71a515ae9942a KVM: x86: Inter-privilege l= evel ret emulation is not implemeneted >> 3b32004a66e96e17d2a031c08d3304245c506dfc KVM: x86: movnti minimum op= size of 32-bit is not kept >> 606b1c3e87597c2d6c9f3eb833a7251262390295 KVM: x86: sgdt and sidt are= not privilaged >> 7fe864dc942c041cc4f56e287c4025d54a8e6c1e KVM: x86: Mark VEX-prefix i= nstructions emulation as unimplemented >> 22d48b2d2aa0b078816eaa1e15e485811a2d03fa KVM: svm: writes to MSR_K7_= HWCR generates GPE in guest >> >> and if on AMD: >> >> 98eb2f8b145cee711984d42eff5d6f19b6b1df69 KVM: vmx: speed up emulatio= n of invalid guest state >> >> >> >> This is the remaining set of commits. Unfortunately I couldn't get = it >> down to 32 or less, but at least it cleans up the picture a bit. An= d >> I do not see anything except the commit I mentioned above: >> >> d6e8c8545651b05a86c5b9d29d2fe11ad4cbb9aa KVM: x86: set rflags.rf dur= ing fault injection >> b9a1ecb909e8f772934cc4bf1f164124c9fbb0d0 KVM: x86: Setting rflags.rf= during rep-string emulation >> 6f43ed01e87c8a8dbd8c826eaf0f714c1342c039 KVM: x86: DR6/7.RTM cannot = be written >> 4161a569065b17954848069d5209182083ce876b KVM: x86: emulator injects = #DB when RFLAGS.RF is set >> 6c6cb69b8e974049cca2cc4480052fb9e7df767b KVM: x86: Cleanup of rflags= =2Erf cleaning >> 4467c3f1ad16e3640e2b61e1a5e0bd55281a925d KVM: x86: Clear rflags.rf o= n emulated instructions >> 163b135e7b09e9158f7eb0aa74e716865e3005d2 KVM: x86: popf emulation sh= ould not change RF >> bb663c7ada380f3c89c2f83fdbe2b3626621385d KVM: x86: Clearing rflags.r= f upon skipped emulated instruction >> 44583cba9188b29b20ceeefe8ae23ad19e26d9a4 KVM: x86: use kvm_read_gues= t_page for emulator accesses >> 719d5a9b2487e0562f178f61e323c3dc18a8b200 KVM: x86: ensure emulator f= etches do not span multiple pages >> 17052f16a51af6d8f4b7eee0631af675ac204f65 KVM: emulate: put pointers = in the fetch_cache >> 9506d57de3bc8277a4e306e0d439976862f68c6d KVM: emulate: avoid per-byt= e copying in instruction fetches >> 5cfc7e0f5e5e1adf998df94f8e36edaf5d30d38e KVM: emulate: avoid repeate= d calls to do_insn_fetch_bytes >> 285ca9e948fa047e51fe47082528034de5369e8d KVM: emulate: speed up do_i= nsn_fetch >> 41061cdb98a0bec464278b4db8e894a3121671f5 KVM: emulate: do not initia= lize memopp >> 573e80fe04db1aa44e8303037f65716ba5c3a343 KVM: emulate: rework seg_ov= erride >> c44b4c6ab80eef3a9c52c7b3f0c632942e6489aa KVM: emulate: clean up init= ializations in init_decode_cache >> 02357bdc8c30a60cd33dd438f851c1306c34f435 KVM: emulate: cleanup decod= e_modrm >> 685bbf4ac406364a84a1d4237b4970dc570fd4cb KVM: emulate: Remove ctxt->= intercept and ctxt->check_perm checks >> 1498507a47867596de158d4db8728e92385a4919 KVM: emulate: move init_dec= ode_cache to emulate.c >> f5f87dfbc777f89148c3c66438741139845d3ac6 KVM: emulate: simplify writ= eback >> 54cfdb3e95d4f70409a7d3432a42cffc9a232be7 KVM: emulate: speed up emul= ated moves >> d40a6898e50c2589ca3d345ef5ca6671e2b35b1a KVM: emulate: protect check= s on ctxt->d by a common "if (unlikely())" >> e24186e097b80c5995ff75e1bbcd541d09c9e42b KVM: emulate: move around s= ome checks >> 6addfc42992be4b073c39137ecfdf4b2aa2d487f KVM: x86: avoid useless set= of KVM_REQ_EVENT after emulation >> 37ccdcbe0757196ec98c0dcf9754bec8423807a5 KVM: x86: return all bits f= rom get_interrupt_shadow >> 5f7552d4a56c21a882c9854ac63c6eb73ca7d7c8 KVM: x86: Pending interrupt= may be delivered after INIT >> 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c KVM: x86: fix TSC matching >> ee212297cd425620867d4398d55d068c4203768c KVM: x86: Wrong emulation o= n 'xadd X, X' >> 968889771749d8e730d794deed2bd2e363a98a54 KVM: emulate: simplify BitO= p handling >> a5457e7bcf9a76ec5c2de5d311d9b0d3b724edc6 KVM: emulate: POP SS trigge= rs a MOV SS shadow too >> 32e94d0696c26c6ba4f3ff53e70f6e0e825979bc KVM: x86: smsw emulation is= incorrect in 64-bit mode >> aaa05f2437b9450f30b301db962ec4d45ec90fbb KVM: x86: Return error on c= mpxchg16b emulation >> 67f4d4288c353734d29c45f6725971c71af96791 KVM: x86: rdpmc emulation c= hecks the counter incorrectly >> 37c564f2854bf75969d0ac26e03f5cf2bb7d639f KVM: x86: cmpxchg emulation= should compare in reverse order >> >> Thanks, >> >> Paolo >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 >=20