From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mukesh Rathor Subject: Re: HYBRID: PV in HVM container Date: Wed, 27 Jul 2011 18:58:28 -0700 Message-ID: <20110727185828.55099372@mantra.us.oracle.com> References: <20110627122404.23d2d0ce@mantra.us.oracle.com> <20110630185431.3ea308c6@mantra.us.oracle.com> <20110708185301.4b040a21@mantra.us.oracle.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/lCEwAM=c9nqZOQB7fZHkfJC" Return-path: In-Reply-To: <20110708185301.4b040a21@mantra.us.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Xen-devel@lists.xensource.com" Cc: Keir Fraser , Ian Campbell List-Id: xen-devel@lists.xenproject.org --MP_/lCEwAM=c9nqZOQB7fZHkfJC Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi folks, Well, I did some benchmarking and found interesting results. Following runs are on a westmere with 2 sockets and 10GB RAM. Xen was booted with maxcpus=2 and entire RAM. All guests were started with 1vcpu and 2GB RAM. dom0 started with 1 vcpu and 704MB. Baremetal was booted with 2GB and 1 cpu. HVM guest has EPT enabled. HT is on. So, unless the NUMA'ness interfered with results (using some memory on remote socket), it appears HVM does very well. To the point that it seems a hybrid is not going to be worth it. I am currently running tests on a single socket system just to be sure. I am attaching my diff's in case any one wants to see what I did. I used xen 4.0.2 and linux 2.6.39. thanks, Mukesh L M B E N C H 3 . 0 S U M M A R Y Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- PV Linux 2.6.39f 2639 0.65 0.88 2.14 4.59 3.77 0.79 3.62 535. 1294 3308 Hybrid Linux 2.6.39f 2639 0.13 0.21 0.89 1.96 3.08 0.24 1.10 529. 1294 3246 HVM Linux 2.6.39f 2639 0.12 0.21 0.64 1.76 3.04 0.24 3.37 113. 354. 1324 Baremetal Linux 2.6.39+ 2649 0.13 0.23 0.74 1.93 3.46 0.28 1.58 127. 386. 1434 Basic integer operations - times in nanoseconds - smaller is better ------------------------------------------------------------------- Host OS intgr intgr intgr intgr intgr bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ PV Linux 2.6.39f 0.3800 0.0100 0.1700 9.1000 9.0400 Hybrid Linux 2.6.39f 0.3800 0.0100 0.1700 9.1100 9.0300 HVM Linux 2.6.39f 0.3800 0.0100 0.1700 9.1100 9.0600 Baremetal Linux 2.6.39+ 0.3800 0.0100 0.1700 9.0600 8.9800 Basic float operations - times in nanoseconds - smaller is better ----------------------------------------------------------------- Host OS float float float float add mul div bogo --------- ------------- ------ ------ ------ ------ PV Linux 2.6.39f 1.1300 1.5200 5.6200 5.2900 Hybrid Linux 2.6.39f 1.1300 1.5200 5.6300 5.2900 HVM Linux 2.6.39f 1.1400 1.5200 5.6300 5.3000 Baremetal Linux 2.6.39+ 1.1300 1.5100 5.6000 5.2700 Basic double operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS double double double double add mul div bogo --------- ------------- ------ ------ ------ ------ PV Linux 2.6.39f 1.1300 1.9000 8.6400 8.3200 Hybrid Linux 2.6.39f 1.1400 1.9000 8.6600 8.3200 HVM Linux 2.6.39f 1.1400 1.9000 8.6600 8.3300 Baremetal Linux 2.6.39+ 1.1300 1.8900 8.6100 8.2800 Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- PV Linux 2.6.39f 5.2800 5.7600 6.3600 6.3200 7.3600 6.69000 7.46000 Hybrid Linux 2.6.39f 4.9200 4.9300 5.2200 5.7600 6.9600 6.12000 7.31000 HVM Linux 2.6.39f 1.3100 1.2200 1.6200 1.9200 3.2600 2.23000 3.48000 Baremetal Linux 2.6.39+ 1.5500 1.4100 2.0600 2.2500 3.3900 2.44000 3.38000 *Local* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- PV Linux 2.6.39f 5.280 16.6 21.3 25.9 33.7 34.7 41.8 87. Hybrid Linux 2.6.39f 4.920 11.2 14.4 19.6 26.1 27.5 32.9 71. HVM Linux 2.6.39f 1.310 4.416 6.15 9.386 14.8 15.8 20.1 45. Baremetal Linux 2.6.39+ 1.550 4.625 7.34 14.3 19.8 21.4 26.4 66. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- PV Linux 2.6.39f 24.0K 0.746 3.55870 2.184 Hybrid Linux 2.6.39f 24.6K 0.238 4.00100 1.480 HVM Linux 2.6.39f 4716.0 0.202 0.96600 1.468 Baremetal Linux 2.6.39+ 6898.0 0.325 0.93610 1.620 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- PV Linux 2.6.39f 1661 2081 1041 3293.3 5528.3 3106.6 2800.0 4472 5633. Hybrid Linux 2.6.39f 1974 2450 1183 3481.5 5529.6 3114.9 2786.6 4470 5672. HVM Linux 2.6.39f 3232 2929 1622 3541.3 5527.5 3077.1 2765.6 4453 5634. Baremetal Linux 2.6.39+ 3320 2800 1666 3523.6 5578.9 3147.0 2841.6 4541 5752. Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ------------------------------------------------------------------------------ Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses --------- ------------- --- ---- ---- -------- -------- ------- PV Linux 2.6.39f 2639 1.5160 5.9170 29.7 97.5 Hybrid Linux 2.6.39f 2639 1.5170 7.5000 29.7 97.4 HVM Linux 2.6.39f 2639 1.5190 4.0210 29.8 105.4 Baremetal Linux 2.6.39+ 2649 1.5090 3.8370 29.2 78.0 --MP_/lCEwAM=c9nqZOQB7fZHkfJC Content-Type: text/x-patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=hyb.xen.diff diff -r f2cf898c7ff8 xen/arch/x86/domain.c --- a/xen/arch/x86/domain.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/domain.c Wed Jul 27 18:45:58 2011 -0700 @@ -354,7 +354,12 @@ int vcpu_initialise(struct vcpu *v) paging_vcpu_init(v); - if ( is_hvm_domain(d) ) + if ( is_hybrid_domain(d) ) + { + if ( (rc = hybrid_vcpu_initialise(v)) != 0 ) + return rc; + } + else if ( is_hvm_domain(d) ) { if ( (rc = hvm_vcpu_initialise(v)) != 0 ) return rc; @@ -515,9 +520,17 @@ int arch_domain_create(struct domain *d, iommu_domain_destroy(d); goto fail; } - } - else - { + + } else if ( is_hybrid_domain(d) ) { + + if ( (rc = hybrid_domain_initialise(d)) != 0 ) + { + iommu_domain_destroy(d); + goto fail; + } + + } else { + /* 32-bit PV guest by default only if Xen is not 64-bit. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = (CONFIG_PAGING_LEVELS != 4); @@ -608,6 +621,95 @@ unsigned long pv_guest_cr4_fixup(unsigne return (hv_cr4 & hv_cr4_mask) | (guest_cr4 & ~hv_cr4_mask); } +extern void hybrid_update_cr3(struct vcpu *); +int hybrid_arch_set_info_guest( struct vcpu *v, vcpu_guest_context_u c) +{ + struct domain *d = v->domain; + unsigned long cr3_pfn = INVALID_MFN; + unsigned long flags; + int i; + +#define c(fld) (c.nat->fld) + + flags = c(flags); + v->fpu_initialised = !!(flags & VGCF_I387_VALID); + v->arch.flags |= TF_kernel_mode; + + memcpy(&v->arch.guest_context, c.nat, sizeof(*c.nat)); + + /* IOPL privileges are virtualised. */ + v->arch.iopl = (v->arch.guest_context.user_regs.eflags >> 12) & 3; + + v->arch.guest_context.user_regs.eflags |= 2; + + memset(v->arch.guest_context.debugreg, 0, + sizeof(v->arch.guest_context.debugreg)); + for ( i = 0; i < 8; i++ ) + (void)set_debugreg(v, i, c(debugreg[i])); + + if ( v->is_initialised ) + goto out; + + if ( v->vcpu_id == 0 ) + d->vm_assist = c.nat->vm_assist; + + cr3_pfn = gmfn_to_mfn(d, xen_cr3_to_pfn(c.nat->ctrlreg[3])); + + if ( !mfn_valid(cr3_pfn) || + (paging_mode_refcounts(d) + ? !get_page(mfn_to_page(cr3_pfn), d) + : !get_page_and_type(mfn_to_page(cr3_pfn), d, + PGT_base_page_table)) ) + { + destroy_gdt(v); + return -EINVAL; + } + + v->arch.guest_table = pagetable_from_pfn(cr3_pfn); + + if ( c.nat->ctrlreg[1] ) + { + cr3_pfn = gmfn_to_mfn(d, xen_cr3_to_pfn(c.nat->ctrlreg[1])); + + if ( !mfn_valid(cr3_pfn) || + (paging_mode_refcounts(d) + ? !get_page(mfn_to_page(cr3_pfn), d) + : !get_page_and_type(mfn_to_page(cr3_pfn), d, + PGT_base_page_table)) ) + { + cr3_pfn = pagetable_get_pfn(v->arch.guest_table); + v->arch.guest_table = pagetable_null(); + if ( paging_mode_refcounts(d) ) + put_page(mfn_to_page(cr3_pfn)); + else + put_page_and_type(mfn_to_page(cr3_pfn)); + destroy_gdt(v); + return -EINVAL; + } + v->arch.guest_table_user = pagetable_from_pfn(cr3_pfn); + } + + if ( v->vcpu_id == 0 ) + update_domain_wallclock_time(d); + + /* Don't redo final setup */ + v->is_initialised = 1; + + if ( paging_mode_enabled(d) ) + paging_update_paging_modes(v); + update_cr3(v); + + hybrid_update_cr3(v); + + out: + if ( flags & VGCF_online ) + clear_bit(_VPF_down, &v->pause_flags); + else + set_bit(_VPF_down, &v->pause_flags); + return 0; +#undef c +} + /* This is called by arch_final_setup_guest and do_boot_vcpu */ int arch_set_info_guest( struct vcpu *v, vcpu_guest_context_u c) @@ -628,6 +730,9 @@ int arch_set_info_guest( #endif flags = c(flags); + if (is_hybrid_vcpu(v)) + return hybrid_arch_set_info_guest(v, c); + if ( !is_hvm_vcpu(v) ) { if ( !compat ) @@ -1347,7 +1452,7 @@ static void update_runstate_area(struct static inline int need_full_gdt(struct vcpu *v) { - return (!is_hvm_vcpu(v) && !is_idle_vcpu(v)); + return (!is_hvm_vcpu(v) && !is_idle_vcpu(v) && !is_hybrid_vcpu(v)); } static void __context_switch(void) @@ -2115,7 +2220,7 @@ void vcpu_mark_events_pending(struct vcp if ( already_pending ) return; - if ( is_hvm_vcpu(v) ) + if ( is_hvm_vcpu(v) || is_hybrid_vcpu(v)) hvm_assert_evtchn_irq(v); else vcpu_kick(v); diff -r f2cf898c7ff8 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/hvm.c Wed Jul 27 18:45:58 2011 -0700 @@ -227,6 +227,9 @@ void hvm_do_resume(struct vcpu *v) { ioreq_t *p; + if (is_hybrid_vcpu(v)) + return; + pt_restore_timer(v); /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */ @@ -414,6 +417,42 @@ int hvm_domain_initialise(struct domain return rc; } +int hybrid_domain_initialise(struct domain *d) +{ + if ( !hvm_enabled ) { + gdprintk(XENLOG_WARNING, "Attempt to create a Hybrid guest " + "on a non-VT/AMDV platform.\n"); + return -EINVAL; + } + spin_lock_init(&d->arch.hvm_domain.irq_lock); + hvm_init_guest_time(d); + return 0; +} + +extern int hybrid_vcpu_initialize(struct vcpu *v); +int hybrid_vcpu_initialise(struct vcpu *v) +{ + int rc; + + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) { + if ( (rc = hybrid_vcpu_initialize(v)) != 0) + return rc; + } else { + printk("MUK: fixme on AMD\n"); + return -EINVAL; + } + tasklet_init(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet, + (void(*)(unsigned long))hvm_assert_evtchn_irq, + (unsigned long)v); + + if ( v->vcpu_id == 0 ) + hvm_set_guest_tsc(v, 0); + + /* PV guests by default have a 100Hz ticker. */ + v->periodic_period = MILLISECS(10); /* ???? */ + + return 0; +} extern void msixtbl_pt_cleanup(struct domain *d); void hvm_domain_relinquish_resources(struct domain *d) @@ -2222,7 +2261,17 @@ static long hvm_vcpu_op( case VCPUOP_stop_singleshot_timer: rc = do_vcpu_op(cmd, vcpuid, arg); break; + + case VCPUOP_is_up: + case VCPUOP_up: + if (is_hybrid_domain(current->domain)) { + rc = do_vcpu_op(cmd, vcpuid, arg); + break; + } + default: + kdbp("MUK: unhandled vcpu op:%d\n", cmd); + kdb_trap_immed(KDB_TRAP_NONFATAL); rc = -ENOSYS; break; } @@ -2292,14 +2341,23 @@ static long hvm_vcpu_op_compat32( return rc; } -static hvm_hypercall_t *hvm_hypercall64_table[NR_hypercalls] = { +hvm_hypercall_t *hvm_hypercall64_table[NR_hypercalls] = { [ __HYPERVISOR_memory_op ] = (hvm_hypercall_t *)hvm_memory_op, [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op, [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op, + HYPERCALL(set_trap_table), + HYPERCALL(set_debugreg), + HYPERCALL(update_descriptor), + HYPERCALL(multicall), + HYPERCALL(update_va_mapping), HYPERCALL(xen_version), + HYPERCALL(console_io), + HYPERCALL(mmuext_op), HYPERCALL(event_channel_op), + HYPERCALL(set_segment_base), HYPERCALL(sched_op), HYPERCALL(set_timer_op), + HYPERCALL(vm_assist), HYPERCALL(hvm_op), HYPERCALL(tmem_op) }; @@ -2321,6 +2379,29 @@ static hvm_hypercall_t *hvm_hypercall32_ #endif /* defined(__x86_64__) */ +/* returns: 1 hcall is valid */ +static int hcall_valid(uint32_t eax) +{ +#ifndef __x86_64__ + if ( unlikely(eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] ) +#else + if ( unlikely(eax >= NR_hypercalls) || !hvm_hypercall64_table[eax] || + (!is_hybrid_vcpu(current) && + ( (eax==__HYPERVISOR_set_trap_table) || + (eax==__HYPERVISOR_set_debugreg) || + (eax==__HYPERVISOR_update_descriptor) || + (eax==__HYPERVISOR_multicall) || + (eax==__HYPERVISOR_update_va_mapping) || + (eax==__HYPERVISOR_console_io) || + (eax==__HYPERVISOR_set_segment_base) || + (eax==__HYPERVISOR_vm_assist) || + (eax==__HYPERVISOR_mmuext_op) ) ) ) +#endif + return 0; + + return 1; +} + int hvm_do_hypercall(struct cpu_user_regs *regs) { struct vcpu *curr = current; @@ -2349,8 +2430,7 @@ int hvm_do_hypercall(struct cpu_user_reg if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) ) return viridian_hypercall(regs); - if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] ) - { + if ( !hcall_valid(eax)) { regs->eax = -ENOSYS; return HVM_HCALL_completed; } @@ -2734,12 +2814,54 @@ static int hvmop_flush_tlb_all(void) return 0; } +static long _do_hybrid_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) +{ + long rc = -EINVAL; + struct xen_hvm_param a; + struct domain *d; + + if (op == HVMOP_set_param) { + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + rc = rcu_lock_target_domain_by_id(a.domid, &d); + if ( rc != 0 ) + return rc; + + if (a.index == HVM_PARAM_CALLBACK_IRQ) { + struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; + uint64_t via = a.value; + uint8_t via_type = (uint8_t)(via >> 56) + 1; + + if (via_type == HVMIRQ_callback_vector) { + hvm_irq->callback_via_type = HVMIRQ_callback_vector; + hvm_irq->callback_via.vector = (uint8_t)via; + kdbp("MUK: callback vector:%d\n", (uint8_t)via); + rc = 0; + } + } + } + return rc; +} + long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) { struct domain *curr_d = current->domain; long rc = 0; + if (is_hybrid_domain(curr_d)) { +#if 0 + struct xen_hvm_param a; + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + if (op != HVMOP_set_param && a.index != HVM_PARAM_CALLBACK_IRQ) + return -EINVAL; +#endif + return (_do_hybrid_op(op, arg)); + } + switch ( op ) { case HVMOP_set_param: diff -r f2cf898c7ff8 xen/arch/x86/hvm/irq.c --- a/xen/arch/x86/hvm/irq.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/irq.c Wed Jul 27 18:45:58 2011 -0700 @@ -333,6 +333,9 @@ struct hvm_intack hvm_vcpu_has_pending_i && vcpu_info(v, evtchn_upcall_pending) ) return hvm_intack_vector(plat->irq.callback_via.vector); + if (is_hybrid_vcpu(v)) + return hvm_intack_none; + if ( unlikely(v->nmi_pending) ) return hvm_intack_nmi; diff -r f2cf898c7ff8 xen/arch/x86/hvm/vmx/Makefile --- a/xen/arch/x86/hvm/vmx/Makefile Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/vmx/Makefile Wed Jul 27 18:45:58 2011 -0700 @@ -5,3 +5,4 @@ obj-y += vmcs.o obj-y += vmx.o obj-y += vpmu.o obj-y += vpmu_core2.o +obj-y += hybrid.o diff -r f2cf898c7ff8 xen/arch/x86/hvm/vmx/intr.c --- a/xen/arch/x86/hvm/vmx/intr.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/vmx/intr.c Wed Jul 27 18:45:58 2011 -0700 @@ -125,8 +125,9 @@ asmlinkage void vmx_intr_assist(void) return; } - /* Crank the handle on interrupt state. */ - pt_update_irq(v); + if (!is_hybrid_vcpu(v)) + /* Crank the handle on interrupt state. */ + pt_update_irq(v); do { intack = hvm_vcpu_has_pending_irq(v); diff -r f2cf898c7ff8 xen/arch/x86/hvm/vmx/vmcs.c --- a/xen/arch/x86/hvm/vmx/vmcs.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/vmx/vmcs.c Wed Jul 27 18:45:58 2011 -0700 @@ -593,6 +593,265 @@ void vmx_disable_intercept_for_msr(struc } } + +void hybrid_update_cr3(struct vcpu *v) +{ + vmx_vmcs_enter(v); + __vmwrite(GUEST_CR3, v->arch.cr3); + __vmwrite(HOST_CR3, v->arch.cr3); + + vpid_sync_all(); + /* hvm_asid_flush_vcpu(v); */ + vmx_vmcs_exit(v); + +} + +static int hybrid_construct_vmcs(struct vcpu *v) +{ + uint16_t sysenter_cs; + unsigned long sysenter_eip; + u32 vmexit_ctl = vmx_vmexit_control; + u32 vmentry_ctl = vmx_vmentry_control; + u64 u64val; + + vmx_vmcs_enter(v); + + /* VMCS controls. */ + vmx_pin_based_exec_control &= ~PIN_BASED_VIRTUAL_NMIS; + __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control); + + v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control; + + if ( v->domain->arch.vtsc ) + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING; + + v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_ACTIVATE_IO_BITMAP; /* ??? */ + v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_MSR_BITMAP; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING; + + kdbp("MUK: writing proc based exec controls:%x\n", + v->arch.hvm_vmx.exec_control); + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control); + + /* MSR access bitmap. */ + if ( cpu_has_vmx_msr_bitmap ) + { + unsigned long *msr_bitmap = alloc_xenheap_page(); + + if ( msr_bitmap == NULL ) + return -ENOMEM; + + memset(msr_bitmap, ~0, PAGE_SIZE); + v->arch.hvm_vmx.msr_bitmap = msr_bitmap; + __vmwrite(MSR_BITMAP, virt_to_maddr(msr_bitmap)); + + vmx_disable_intercept_for_msr(v, MSR_FS_BASE); + vmx_disable_intercept_for_msr(v, MSR_GS_BASE); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP); + + /* pure hvm doesn't do this. safe? see: long_mode_do_msr_write() */ + vmx_disable_intercept_for_msr(v, MSR_STAR); + vmx_disable_intercept_for_msr(v, MSR_LSTAR); + vmx_disable_intercept_for_msr(v, MSR_CSTAR); + vmx_disable_intercept_for_msr(v, MSR_SYSCALL_MASK); + vmx_disable_intercept_for_msr(v, MSR_SHADOW_GS_BASE); + + kdbp("MUK: disabled intercepts for few msrs\n"); + + } else { + kdbp("MUK: CPU does NOT have msr bitmap\n"); + for (;;) cpu_relax(); + } + + if ( !cpu_has_vmx_vpid ) { + printk("ERROR: no vpid support. perf will suck without it\n"); + return -ESRCH; + } + + v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control; + + if ( cpu_has_vmx_secondary_exec_control ) { + v->arch.hvm_vmx.secondary_exec_control &= ~0x4FF; /* turn off all */ +#if 0 + v->arch.hvm_vmx.secondary_exec_control &= + ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; + v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; + v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_RDTSCP; + + v->arch.hvm_vmx.secondary_exec_control &= + ~SECONDARY_EXEC_UNRESTRICTED_GUEST; +#endif + v->arch.hvm_vmx.secondary_exec_control |= + SECONDARY_EXEC_PAUSE_LOOP_EXITING; + v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_VPID; + + kdbp("MUK: muk_construct_vmcs: sec exec:0x%x\n", + v->arch.hvm_vmx.secondary_exec_control); + __vmwrite(SECONDARY_VM_EXEC_CONTROL, + v->arch.hvm_vmx.secondary_exec_control); + } else { + printk("ERROR: NO Secondary Exec control\n"); + return -ESRCH; + } + + __vmwrite(VIRTUAL_PROCESSOR_ID, v->arch.hvm_vcpu.asid); + + vmexit_ctl &= ~(VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT); + __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl); + + #define VM_ENTRY_LOAD_DEBUG_CTLS 0x4 + #define VM_ENTRY_LOAD_EFER 0x8000 + #define GUEST_EFER 0x2806 /* see page 23-20 */ + #define GUEST_EFER_HIGH 0x2807 /* see page 23-20 */ + vmentry_ctl &= ~VM_ENTRY_LOAD_DEBUG_CTLS; + vmentry_ctl &= ~VM_ENTRY_LOAD_EFER; + vmentry_ctl &= ~VM_ENTRY_SMM; + vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR; + vmentry_ctl |= VM_ENTRY_IA32E_MODE; + vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_PAT; + kdbp("MUK:muk_construct_vmcs(). vmentry_ctl:0x%x\n", vmentry_ctl); + __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl); + + /* MSR intercepts. */ + __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, 0); + __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0); + __vmwrite(VM_EXIT_MSR_STORE_COUNT, 0); + + /* Host data selectors. */ + __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_FS_SELECTOR, 0); + __vmwrite(HOST_GS_SELECTOR, 0); + __vmwrite(HOST_FS_BASE, 0); + __vmwrite(HOST_GS_BASE, 0); + + vmx_set_host_env(v); + + /* Host control registers. */ + v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; + __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); + __vmwrite(HOST_CR4, mmu_cr4_features|(cpu_has_xsave ? X86_CR4_OSXSAVE : 0)); + + /* Host CS:RIP. */ + __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); + __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); + + /* Host SYSENTER CS:RIP. */ + rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); + __vmwrite(HOST_SYSENTER_CS, sysenter_cs); + rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); + __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); + + __vmwrite(VM_ENTRY_INTR_INFO, 0); + + __vmwrite(CR3_TARGET_COUNT, 0); + + __vmwrite(GUEST_ACTIVITY_STATE, 0); + + __vmwrite(GUEST_CS_BASE, 0); + __vmwrite(GUEST_CS_LIMIT, ~0u); + __vmwrite(GUEST_CS_AR_BYTES, 0xa09b); /* CS.L == 1 */ + __vmwrite(GUEST_CS_SELECTOR, 0x10); + + __vmwrite(GUEST_DS_BASE, 0); + __vmwrite(GUEST_DS_LIMIT, ~0u); + __vmwrite(GUEST_DS_AR_BYTES, 0xc093); + __vmwrite(GUEST_DS_SELECTOR, 0x18); + + __vmwrite(GUEST_SS_BASE, 0); /* use same seg as DS */ + __vmwrite(GUEST_SS_LIMIT, ~0u); + __vmwrite(GUEST_SS_AR_BYTES, 0xc093); + __vmwrite(GUEST_SS_SELECTOR, 0x18); + + __vmwrite(GUEST_ES_SELECTOR, 0); + __vmwrite(GUEST_FS_SELECTOR, 0); + __vmwrite(GUEST_GS_SELECTOR, 0); + + /* Guest segment bases. */ + __vmwrite(GUEST_ES_BASE, 0); + __vmwrite(GUEST_FS_BASE, 0); + __vmwrite(GUEST_GS_BASE, 0); + + /* Guest segment limits. */ + __vmwrite(GUEST_ES_LIMIT, ~0u); + __vmwrite(GUEST_FS_LIMIT, ~0u); + __vmwrite(GUEST_GS_LIMIT, ~0u); + + /* Guest segment AR bytes. */ + __vmwrite(GUEST_ES_AR_BYTES, 0xc093); /* read/write, accessed */ + __vmwrite(GUEST_FS_AR_BYTES, 0xc093); + __vmwrite(GUEST_GS_AR_BYTES, 0xc093); + + /* Guest IDT. */ + __vmwrite(GUEST_GDTR_BASE, 0); + __vmwrite(GUEST_GDTR_LIMIT, 0); + + /* Guest LDT. */ + __vmwrite(GUEST_LDTR_AR_BYTES, 0x82); /* LDT */ + __vmwrite(GUEST_LDTR_SELECTOR, 0); + __vmwrite(GUEST_LDTR_BASE, 0); + __vmwrite(GUEST_LDTR_LIMIT, 0); + + /* Guest TSS. */ + __vmwrite(GUEST_TR_AR_BYTES, 0x8b); /* 32-bit TSS (busy) */ + __vmwrite(GUEST_TR_BASE, 0); + __vmwrite(GUEST_TR_LIMIT, 0xff); + + __vmwrite(GUEST_INTERRUPTIBILITY_INFO, 0); + __vmwrite(GUEST_DR7, 0); + __vmwrite(VMCS_LINK_POINTER, ~0UL); + + /* vmexit only on write to protected page, err code: 0x3 */ + __vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0xffffffff); + __vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, 0x3); + + /* __vmwrite(EXCEPTION_BITMAP, 0xffffffff); */ + __vmwrite(EXCEPTION_BITMAP, + HVM_TRAP_MASK | TRAP_debug | TRAP_gp_fault | + (1U<arch.hvm_vcpu.guest_cr[0] = X86_CR0_PG | X86_CR0_PE | X86_CR0_ET; + hvm_update_guest_cr(v, 0); + + v->arch.hvm_vcpu.guest_cr[4] = 0; + hvm_update_guest_cr(v, 4); +#endif + +#if 0 + u64val = X86_CR0_PG | X86_CR0_PE | X86_CR0_ET | X86_CR0_TS | + X86_CR0_NE | X86_CR0_WP; +#endif + /* make sure to set WP bit so rdonly pages are not written from CPL 0 */ + u64val = X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP; + __vmwrite(GUEST_CR0, u64val); + __vmwrite(CR0_READ_SHADOW, u64val); + v->arch.hvm_vcpu.guest_cr[0] = u64val; + + u64val = X86_CR4_PAE | X86_CR4_VMXE; + __vmwrite(GUEST_CR4, u64val); + __vmwrite(CR4_READ_SHADOW, u64val); + v->arch.hvm_vcpu.guest_cr[4] = u64val; + + __vmwrite(CR0_GUEST_HOST_MASK, ~0UL); + __vmwrite(CR4_GUEST_HOST_MASK, ~0UL); + + v->arch.hvm_vmx.vmx_realmode = 0; + + vmx_vmcs_exit(v); +#if 0 + paging_update_paging_modes(v); /* will update HOST & GUEST_CR3 as reqd */ +#endif + return 0; +} + static int construct_vmcs(struct vcpu *v) { struct domain *d = v->domain; @@ -601,6 +860,9 @@ static int construct_vmcs(struct vcpu *v u32 vmexit_ctl = vmx_vmexit_control; u32 vmentry_ctl = vmx_vmentry_control; + if (is_hybrid_vcpu(v)) + return hybrid_construct_vmcs(v); + vmx_vmcs_enter(v); /* VMCS controls. */ @@ -1001,8 +1263,10 @@ void vmx_do_resume(struct vcpu *v) vmx_clear_vmcs(v); vmx_load_vmcs(v); - hvm_migrate_timers(v); - hvm_migrate_pirqs(v); + if (!is_hybrid_vcpu(v)) { + hvm_migrate_timers(v); + hvm_migrate_pirqs(v); + } vmx_set_host_env(v); hvm_asid_flush_vcpu(v); } @@ -1026,6 +1290,11 @@ static unsigned long vmr(unsigned long f return rc ? 0 : val; } +ulong hybrid_vmr(unsigned long field) +{ + return ((ulong)vmr(field)); +} + static void vmx_dump_sel(char *name, uint32_t selector) { uint32_t sel, attr, limit; @@ -1263,6 +1532,8 @@ static void noinline kdb_print_vmcs(stru vmx_dump_sel("LDTR", GUEST_LDTR_SELECTOR); vmx_dump_sel2("IDTR", GUEST_IDTR_LIMIT); vmx_dump_sel("TR", GUEST_TR_SELECTOR); + kdbp("Guest EFER = 0x%08x%08x\n", + (uint32_t)vmr(GUEST_EFER_HIGH), (uint32_t)vmr(GUEST_EFER)); kdbp("Guest PAT = 0x%08x%08x\n", (uint32_t)vmr(GUEST_PAT_HIGH), (uint32_t)vmr(GUEST_PAT)); x = (unsigned long long)vmr(TSC_OFFSET_HIGH) << 32; @@ -1276,6 +1547,10 @@ static void noinline kdb_print_vmcs(stru (int)vmr(GUEST_INTERRUPTIBILITY_INFO), (int)vmr(GUEST_ACTIVITY_STATE)); + kdbp("MSRs: entry_load:$%d exit_load:$%d exit_store:$%d\n", + vmr(VM_ENTRY_MSR_LOAD_COUNT), vmr(VM_EXIT_MSR_LOAD_COUNT), + vmr(VM_EXIT_MSR_STORE_COUNT)); + kdbp("\n*** Host State ***\n"); kdbp("RSP = 0x%016llx RIP = 0x%016llx\n", (unsigned long long)vmr(HOST_RSP), @@ -1316,6 +1591,9 @@ static void noinline kdb_print_vmcs(stru (uint32_t)vmr(VM_EXIT_CONTROLS)); kdbp("ExceptionBitmap=%08x\n", (uint32_t)vmr(EXCEPTION_BITMAP)); + kdbp("PAGE_FAULT_ERROR_CODE MASK:0x%lx MATCH:0x%lx\n", + (unsigned long)vmr(PAGE_FAULT_ERROR_CODE_MASK), + (unsigned long)vmr(PAGE_FAULT_ERROR_CODE_MATCH)); kdbp("VMEntry: intr_info=%08x errcode=%08x ilen=%08x\n", (uint32_t)vmr(VM_ENTRY_INTR_INFO), (uint32_t)vmr(VM_ENTRY_EXCEPTION_ERROR_CODE), @@ -1344,8 +1622,7 @@ static void noinline kdb_print_vmcs(stru * do __vmreads. So, the VMCS pointer can't be left cleared. * - Doing __vmpclear will set the vmx state to 'clear', so to resume a * vmlaunch must be done and not vmresume. This means, we must clear - * arch_vmx->launched. Just call __vmx_clear_vmcs(), hopefully it won't keep - * changing... + * arch_vmx->launched. */ void kdb_curr_cpu_flush_vmcs(void) { @@ -1358,12 +1635,14 @@ void kdb_curr_cpu_flush_vmcs(void) /* looks like we got one. unfortunately, current_vmcs points to vmcs * and not VCPU, so we gotta search the entire list... */ for_each_domain (dp) { - if ( !is_hvm_domain(dp) || dp->is_dying) + if ( (!is_hybrid_domain(dp) && !is_hvm_domain(dp)) || dp->is_dying ) continue; for_each_vcpu (dp, vp) { if (vp->arch.hvm_vmx.active_cpu == smp_processor_id()) { - __vmx_clear_vmcs(vp); + __vmpclear(virt_to_maddr(vp->arch.hvm_vmx.vmcs)); __vmptrld(virt_to_maddr(vp->arch.hvm_vmx.vmcs)); + vp->arch.hvm_vmx.launched = 0; + kdbp("KDB:[%d] vmcs flushed\n", smp_processor_id()); } } } @@ -1382,7 +1661,7 @@ void kdb_dump_vmcs(domid_t did, int vid) ASSERT(!local_irq_is_enabled()); /* kdb should always run disabled */ for_each_domain (dp) { - if ( !is_hvm_domain(dp) || dp->is_dying) + if ( (!is_hybrid_domain(dp) && !is_hvm_domain(dp)) || dp->is_dying ) continue; if (did != 0 && did != dp->domain_id) continue; @@ -1400,7 +1679,7 @@ void kdb_dump_vmcs(domid_t did, int vid) kdbp("\n"); } /* restore orig vmcs pointer for __vmreads in vmx_vmexit_handler() */ - if (is_hvm_vcpu(current)) + if (is_hvm_vcpu(current) || is_hybrid_vcpu(current)) __vmptrld(virt_to_maddr(current->arch.hvm_vmx.vmcs)); } #endif diff -r f2cf898c7ff8 xen/arch/x86/hvm/vmx/vmx.c --- a/xen/arch/x86/hvm/vmx/vmx.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/vmx/vmx.c Wed Jul 27 18:45:58 2011 -0700 @@ -101,6 +101,28 @@ static void vmx_domain_destroy(struct do vmx_free_vlapic_mapping(d); } +int hybrid_vcpu_initialize(struct vcpu *v) +{ + int rc; + + spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); + + v->arch.schedule_tail = vmx_do_resume; + v->arch.ctxt_switch_from = vmx_ctxt_switch_from; + v->arch.ctxt_switch_to = vmx_ctxt_switch_to; + + if ( (rc = vmx_create_vmcs(v)) != 0 ) + { + dprintk(XENLOG_WARNING, "Failed to create VMCS for vcpu %d: err=%d.\n", + v->vcpu_id, rc); + return rc; + } + + /* for hvm_long_mode_enabled(v) */ + v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME; + return 0; +} + static int vmx_vcpu_initialise(struct vcpu *v) { int rc; @@ -1299,6 +1321,11 @@ void vmx_inject_hw_exception(int trap, i if ( unlikely(intr_info & INTR_INFO_VALID_MASK) && (((intr_info >> 8) & 7) == X86_EVENTTYPE_HW_EXCEPTION) ) { + if (is_hybrid_vcpu(curr)) { + kdbp("MUK: vmx_inject_hw_exception() unexpected\n"); + kdb_trap_immed(KDB_TRAP_NONFATAL); + } + trap = hvm_combine_hw_exceptions((uint8_t)intr_info, trap); if ( trap == TRAP_double_fault ) error_code = 0; @@ -1463,7 +1490,7 @@ void start_vmx(void) * Not all cases receive valid value in the VM-exit instruction length field. * Callers must know what they're doing! */ -static int __get_instruction_length(void) +int __get_instruction_length(void) { int len; len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ @@ -1471,7 +1498,7 @@ static int __get_instruction_length(void return len; } -static void __update_guest_eip(unsigned long inst_len) +void __update_guest_eip(unsigned long inst_len) { struct cpu_user_regs *regs = guest_cpu_user_regs(); unsigned long x; @@ -1531,7 +1558,7 @@ static void vmx_cpuid_intercept( HVMTRACE_5D (CPUID, input, *eax, *ebx, *ecx, *edx); } -static void vmx_do_cpuid(struct cpu_user_regs *regs) +void vmx_do_cpuid(struct cpu_user_regs *regs) { unsigned int eax, ebx, ecx, edx; @@ -2037,7 +2064,7 @@ gp_fault: return X86EMUL_EXCEPTION; } -static void vmx_do_extint(struct cpu_user_regs *regs) +void vmx_do_extint(struct cpu_user_regs *regs) { unsigned int vector; @@ -2182,9 +2209,16 @@ static void vmx_failed_vmentry(unsigned break; } +#if defined(XEN_KDB_CONFIG) + { extern void kdb_dump_vmcs(domid_t did, int vid); + printk("\n************* VMCS Area **************\n"); + kdb_dump_vmcs(curr->domain->domain_id, (curr)->vcpu_id); + } +#else printk("************* VMCS Area **************\n"); vmcs_dump_vcpu(curr); printk("**************************************\n"); +#endif domain_crash(curr->domain); } @@ -2268,12 +2302,19 @@ err: return -1; } +extern void hybrid_vmx_vmexit_handler(struct cpu_user_regs *regs); + asmlinkage void vmx_vmexit_handler(struct cpu_user_regs *regs) { unsigned int exit_reason, idtv_info, intr_info = 0, vector = 0; unsigned long exit_qualification, inst_len = 0; struct vcpu *v = current; + if (is_hybrid_vcpu(v)) { + hybrid_vmx_vmexit_handler(regs); + return; + } + if ( paging_mode_hap(v->domain) && hvm_paging_enabled(v) ) v->arch.hvm_vcpu.guest_cr[3] = v->arch.hvm_vcpu.hw_cr[3] = __vmread(GUEST_CR3); diff -r f2cf898c7ff8 xen/arch/x86/hvm/vpt.c --- a/xen/arch/x86/hvm/vpt.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/hvm/vpt.c Wed Jul 27 18:45:58 2011 -0700 @@ -289,6 +289,11 @@ void pt_intr_post(struct vcpu *v, struct if ( intack.source == hvm_intsrc_vector ) return; +if (is_hybrid_vcpu(v)) { + kdbp("MUK: unexpected intack src:%d vec:%d\n", intack.source,intack.vector); + kdb_trap_immed(KDB_TRAP_NONFATAL); +} + spin_lock(&v->arch.hvm_vcpu.tm_lock); pt = is_pt_irq(v, intack); diff -r f2cf898c7ff8 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/mm.c Wed Jul 27 18:45:58 2011 -0700 @@ -490,9 +490,12 @@ void make_cr3(struct vcpu *v, unsigned l #endif /* !defined(__i386__) */ +/* calling hybrid_update_cr3 doesnt work because during context switch + * vmcs is not completely setup? */ void write_ptbase(struct vcpu *v) { - write_cr3(v->arch.cr3); + if (!is_hybrid_vcpu(v)) + write_cr3(v->arch.cr3); } /* @@ -2482,6 +2485,7 @@ int get_page_type_preemptible(struct pag } +extern void hybrid_update_cr3(struct vcpu *v); int new_guest_cr3(unsigned long mfn) { struct vcpu *curr = current; @@ -2530,6 +2534,9 @@ int new_guest_cr3(unsigned long mfn) write_ptbase(curr); + if (is_hybrid_vcpu(curr)) + hybrid_update_cr3(curr); + if ( likely(old_base_mfn != 0) ) { if ( paging_mode_refcounts(d) ) @@ -2863,10 +2870,23 @@ int do_mmuext_op( #endif case MMUEXT_TLB_FLUSH_LOCAL: + /* do this for both, flush_tlb_user and flush_tlb_kernel, for now. + * To debug: hvm_asid_flush_vcpu for flush_tlb_user, and + * vpid_sync_all for flush_tlb_kernel */ + if (is_hybrid_vcpu(current)) { + extern void hybrid_flush_tlb(void); + hybrid_flush_tlb(); + break; + } flush_tlb_local(); break; case MMUEXT_INVLPG_LOCAL: + if (is_hybrid_vcpu(current)) { + extern void hybrid_do_invlpg(ulong); + hybrid_do_invlpg(op.arg1.linear_addr); + break; + } if ( !paging_mode_enabled(d) || paging_invlpg(curr, op.arg1.linear_addr) != 0 ) flush_tlb_one_local(op.arg1.linear_addr); @@ -2877,6 +2897,11 @@ int do_mmuext_op( { cpumask_t pmask; + if (is_hybrid_vcpu(current)) { + kdbp("MUK:FIX: MMUEXT_TLB_FLUSH_MULTI/MMUEXT_INVLPG_MULTI\n"); + kdb_trap_immed(KDB_TRAP_NONFATAL); + break; + } if ( unlikely(vcpumask_to_pcpumask(d, op.arg2.vcpumask, &pmask)) ) { okay = 0; @@ -4181,7 +4206,7 @@ long do_update_descriptor(u64 pa, u64 de mfn = gmfn_to_mfn(dom, gmfn); if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) || !mfn_valid(mfn) || - !check_descriptor(dom, &d) ) + (!is_hybrid_domain(dom) && !check_descriptor(dom, &d)) ) return -EINVAL; page = mfn_to_page(mfn); diff -r f2cf898c7ff8 xen/arch/x86/time.c --- a/xen/arch/x86/time.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/time.c Wed Jul 27 18:45:58 2011 -0700 @@ -879,7 +879,7 @@ static void __update_vcpu_system_time(st _u.tsc_to_system_mul = t->tsc_scale.mul_frac; _u.tsc_shift = (s8)t->tsc_scale.shift; } - if ( is_hvm_domain(d) ) + if ( is_hvm_domain(d) || is_hybrid_domain(d)) _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset; /* Don't bother unless timestamp record has changed or we are forced. */ @@ -947,7 +947,7 @@ static void update_domain_rtc(void) rcu_read_lock(&domlist_read_lock); for_each_domain ( d ) - if ( is_hvm_domain(d) ) + if ( is_hvm_domain(d) || is_hybrid_domain(d)) rtc_update_clock(d); rcu_read_unlock(&domlist_read_lock); @@ -956,7 +956,7 @@ static void update_domain_rtc(void) void domain_set_time_offset(struct domain *d, int32_t time_offset_seconds) { d->time_offset_seconds = time_offset_seconds; - if ( is_hvm_domain(d) ) + if ( is_hvm_domain(d) || is_hybrid_domain(d)) rtc_update_clock(d); } @@ -1708,7 +1708,7 @@ struct tm wallclock_time(void) u64 gtime_to_gtsc(struct domain *d, u64 tsc) { - if ( !is_hvm_domain(d) ) + if ( !is_hvm_domain(d) || is_hybrid_domain(d)) tsc = max_t(s64, tsc - d->arch.vtsc_offset, 0); return scale_delta(tsc, &d->arch.ns_to_vtsc); } @@ -1856,6 +1856,8 @@ void tsc_set_info(struct domain *d, d->arch.vtsc = 0; return; } + if ( is_hybrid_domain(d) ) + kdbp("INFO: setting vtsc for hybrid tsc_mode:%d \n", tsc_mode); switch ( d->arch.tsc_mode = tsc_mode ) { @@ -1901,7 +1903,7 @@ void tsc_set_info(struct domain *d, break; } d->arch.incarnation = incarnation + 1; - if ( is_hvm_domain(d) ) + if ( is_hvm_domain(d) || is_hybrid_domain(d) ) hvm_set_rdtsc_exiting(d, d->arch.vtsc); } diff -r f2cf898c7ff8 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/traps.c Wed Jul 27 18:45:58 2011 -0700 @@ -1217,13 +1217,14 @@ static int spurious_page_fault( return is_spurious; } -static int fixup_page_fault(unsigned long addr, struct cpu_user_regs *regs) +int fixup_page_fault(unsigned long addr, struct cpu_user_regs *regs) { struct vcpu *v = current; struct domain *d = v->domain; /* No fixups in interrupt context or when interrupts are disabled. */ - if ( in_irq() || !(regs->eflags & X86_EFLAGS_IF) ) + if ( in_irq() || + (!is_hybrid_vcpu(v) && !(regs->eflags & X86_EFLAGS_IF)) ) return 0; /* Faults from external-mode guests are handled by shadow/hap */ diff -r f2cf898c7ff8 xen/arch/x86/x86_64/traps.c --- a/xen/arch/x86/x86_64/traps.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/arch/x86/x86_64/traps.c Wed Jul 27 18:45:58 2011 -0700 @@ -617,7 +617,7 @@ static void hypercall_page_initialise_ri void hypercall_page_initialise(struct domain *d, void *hypercall_page) { memset(hypercall_page, 0xCC, PAGE_SIZE); - if ( is_hvm_domain(d) ) + if ( is_hvm_domain(d) || is_hybrid_domain(d)) hvm_hypercall_page_initialise(d, hypercall_page); else if ( !is_pv_32bit_domain(d) ) hypercall_page_initialise_ring3_kernel(hypercall_page); diff -r f2cf898c7ff8 xen/common/domain.c --- a/xen/common/domain.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/common/domain.c Wed Jul 27 18:45:58 2011 -0700 @@ -208,6 +208,7 @@ static void __init parse_extra_guest_irq } custom_param("extra_guest_irqs", parse_extra_guest_irqs); +volatile int mukhybrid = 0; struct domain *domain_create( domid_t domid, unsigned int domcr_flags, ssidref_t ssidref) { @@ -238,7 +239,9 @@ struct domain *domain_create( spin_lock_init(&d->shutdown_lock); d->shutdown_code = -1; - if ( domcr_flags & DOMCRF_hvm ) + if (domid != 0 && mukhybrid) + d->is_hybrid = 1; + else if ( domcr_flags & DOMCRF_hvm ) d->is_hvm = 1; if ( domid == 0 ) diff -r f2cf898c7ff8 xen/common/kernel.c --- a/xen/common/kernel.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/common/kernel.c Wed Jul 27 18:45:58 2011 -0700 @@ -246,6 +246,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL else fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | (1U << XENFEAT_hvm_callback_vector); + if ( is_hybrid_vcpu(current) ) + fi.submap |= (1U << XENFEAT_hvm_callback_vector); #endif break; default: diff -r f2cf898c7ff8 xen/common/timer.c --- a/xen/common/timer.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/common/timer.c Wed Jul 27 18:45:58 2011 -0700 @@ -546,13 +546,20 @@ void kdb_dump_timer_queues(void) struct timers *ts; unsigned long sz, offs; char buf[KSYM_NAME_LEN+1]; - int cpu, j; - s_time_t now = NOW(); + int cpu, j; + u64 tsc; for_each_online_cpu( cpu ) { ts = &per_cpu(timers, cpu); - kdbp("CPU[%02d]: NOW:0x%08x%08x\n", cpu, (u32)(now>>32), (u32)now); + kdbp("CPU[%02d]:", cpu); + + if (cpu == smp_processor_id()) { + s_time_t now = NOW(); + rdtscll(tsc); + kdbp("NOW:0x%08x%08x TSC:0x%016lx\n", (u32)(now>>32),(u32)now, tsc); + } else + kdbp("\n"); /* timers in the heap */ for ( j = 1; j <= GET_HEAP_SIZE(ts->heap); j++ ) { diff -r f2cf898c7ff8 xen/include/asm-x86/desc.h --- a/xen/include/asm-x86/desc.h Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/include/asm-x86/desc.h Wed Jul 27 18:45:58 2011 -0700 @@ -58,7 +58,8 @@ #ifndef __ASSEMBLY__ #if defined(__x86_64__) -#define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3) +#define GUEST_KERNEL_RPL(d) (is_hybrid_domain(d) ? 0 : \ + is_pv_32bit_domain(d) ? 1 : 3) #elif defined(__i386__) #define GUEST_KERNEL_RPL(d) ((void)(d), 1) #endif @@ -67,6 +68,10 @@ #define __fixup_guest_selector(d, sel) \ ({ \ uint16_t _rpl = GUEST_KERNEL_RPL(d); \ + if (is_hybrid_domain(d)) { \ + printk("MUK: hybrid domain fixing up selector\n"); \ + kdb_trap_immed(KDB_TRAP_NONFATAL); \ + } \ (sel) = (((sel) & 3) >= _rpl) ? (sel) : (((sel) & ~3) | _rpl); \ }) diff -r f2cf898c7ff8 xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/include/asm-x86/domain.h Wed Jul 27 18:45:58 2011 -0700 @@ -18,7 +18,7 @@ #endif #define is_pv_32on64_vcpu(v) (is_pv_32on64_domain((v)->domain)) -#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \ +#define is_hvm_pv_evtchn_domain(d) ((is_hvm_domain(d) || is_hybrid_domain(d))&&\ d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector) #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain)) diff -r f2cf898c7ff8 xen/include/asm-x86/event.h --- a/xen/include/asm-x86/event.h Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/include/asm-x86/event.h Wed Jul 27 18:45:58 2011 -0700 @@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struc static inline int local_events_need_delivery(void) { struct vcpu *v = current; - return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) : + return ( (is_hvm_vcpu(v)||is_hybrid_vcpu(v)) ? hvm_local_events_need_delivery(v) : (vcpu_info(v, evtchn_upcall_pending) && !vcpu_info(v, evtchn_upcall_mask))); } diff -r f2cf898c7ff8 xen/include/asm-x86/hvm/hvm.h --- a/xen/include/asm-x86/hvm/hvm.h Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/include/asm-x86/hvm/hvm.h Wed Jul 27 18:45:58 2011 -0700 @@ -144,10 +144,12 @@ struct hvm_function_table { extern struct hvm_function_table hvm_funcs; extern int hvm_enabled; +int hybrid_domain_initialise(struct domain *d); int hvm_domain_initialise(struct domain *d); void hvm_domain_relinquish_resources(struct domain *d); void hvm_domain_destroy(struct domain *d); +int hybrid_vcpu_initialise(struct vcpu *v); int hvm_vcpu_initialise(struct vcpu *v); void hvm_vcpu_destroy(struct vcpu *v); void hvm_vcpu_down(struct vcpu *v); diff -r f2cf898c7ff8 xen/include/asm-x86/hvm/vmx/vmx.h --- a/xen/include/asm-x86/hvm/vmx/vmx.h Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/include/asm-x86/hvm/vmx/vmx.h Wed Jul 27 18:45:58 2011 -0700 @@ -110,6 +110,7 @@ void vmx_update_debug_state(struct vcpu #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_RDTSCP 51 +#define EXIT_REASON_INVVPID 53 #define EXIT_REASON_WBINVD 54 #define EXIT_REASON_XSETBV 55 diff -r f2cf898c7ff8 xen/include/xen/sched.h --- a/xen/include/xen/sched.h Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/include/xen/sched.h Wed Jul 27 18:45:58 2011 -0700 @@ -228,6 +228,7 @@ struct domain /* Is this an HVM guest? */ bool_t is_hvm; + bool_t is_hybrid; /* Does this guest need iommu mappings? */ bool_t need_iommu; /* Is this guest fully privileged (aka dom0)? */ @@ -590,6 +591,8 @@ uint64_t get_cpu_idle_time(unsigned int #define is_hvm_domain(d) ((d)->is_hvm) #define is_hvm_vcpu(v) (is_hvm_domain(v->domain)) +#define is_hybrid_domain(d) ((d)->is_hybrid) +#define is_hybrid_vcpu(v) (is_hybrid_domain(v->domain)) #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \ cpus_weight((v)->cpu_affinity) == 1) #define need_iommu(d) ((d)->need_iommu) diff -r f2cf898c7ff8 xen/kdb/kdb_cmds.c --- a/xen/kdb/kdb_cmds.c Fri Jul 15 23:21:24 2011 +0000 +++ b/xen/kdb/kdb_cmds.c Wed Jul 27 18:45:58 2011 -0700 @@ -977,7 +977,7 @@ kdb_cmdf_ss(int argc, const char **argv, kdbp("kdb: Failed to read byte at: %lx\n", regs->KDBIP); return KDB_CPU_MAIN_KDB; } - if (guest_mode(regs) && is_hvm_vcpu(current)) + if (guest_mode(regs) && (is_hybrid_vcpu(current) || is_hvm_vcpu(current))) current->arch.hvm_vcpu.single_step = 1; else regs->eflags |= X86_EFLAGS_TF; @@ -2240,8 +2240,8 @@ kdb_display_dom(struct domain *dp) kdbp(" mapcnt:"); kdb_print_spin_lock("mapcnt: lk:", &gp->lock, "\n"); } - kdbp(" hvm:%d priv:%d dbg:%d dying:%d paused:%d\n", - dp->is_hvm, dp->is_privileged, dp->debugger_attached, + kdbp(" hvm:%d hybrid:%d priv:%d dbg:%d dying:%d paused:%d\n", + dp->is_hvm, dp->is_hybrid, dp->is_privileged, dp->debugger_attached, dp->is_dying, dp->is_paused_by_controller); kdb_print_spin_lock(" shutdown: lk:", &dp->shutdown_lock, "\n"); kdbp(" shutn:%d shut:%d code:%d \n", dp->is_shutting_down, --MP_/lCEwAM=c9nqZOQB7fZHkfJC Content-Type: text/x-patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=hyb.lin.diff diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 615e188..7791d31 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -1,7 +1,7 @@ menu "Kernel hacking" config TRACE_IRQFLAGS_SUPPORT - def_bool y + def_bool n source "lib/Kconfig.debug" diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 91f3e08..b69a989 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -271,7 +271,7 @@ extern const char * const x86_power_flags[32]; #define cpu_has_pebs boot_cpu_has(X86_FEATURE_PEBS) #define cpu_has_clflush boot_cpu_has(X86_FEATURE_CLFLSH) #define cpu_has_bts boot_cpu_has(X86_FEATURE_BTS) -#define cpu_has_gbpages boot_cpu_has(X86_FEATURE_GBPAGES) +#define cpu_has_gbpages 0 #define cpu_has_arch_perfmon boot_cpu_has(X86_FEATURE_ARCH_PERFMON) #define cpu_has_pat boot_cpu_has(X86_FEATURE_PAT) #define cpu_has_xmm4_1 boot_cpu_has(X86_FEATURE_XMM4_1) diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h index 7a15153..f0bbb51 100644 --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -49,6 +49,7 @@ extern const struct hypervisor_x86 *x86_hyper; extern const struct hypervisor_x86 x86_hyper_vmware; extern const struct hypervisor_x86 x86_hyper_ms_hyperv; extern const struct hypervisor_x86 x86_hyper_xen_hvm; +extern const struct hypervisor_x86 x86_hyper_xen_hybrid; static inline bool hypervisor_x2apic_available(void) { diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index 8095f86..5ec8dbb 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -37,6 +37,7 @@ static const __initconst struct hypervisor_x86 * const hypervisors[] = #ifdef CONFIG_XEN_PVHVM &x86_hyper_xen_hvm, #endif + &x86_hyper_xen_hybrid, }; const struct hypervisor_x86 *x86_hyper; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 7942335..414cbb8 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -202,7 +202,10 @@ void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte) pmd = fill_pmd(pud, vaddr); pte = fill_pte(pmd, vaddr); - set_pte(pte, new_pte); + if (xen_domain_type == XEN_PV_IN_HVM) { + set_pte_at(&init_mm, vaddr, pte, new_pte); + } else + set_pte(pte, new_pte); /* * It's enough to flush this one mapping. diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 0369843..5810ef4 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -425,7 +425,11 @@ static void __init __early_set_fixmap(enum fixed_addresses idx, pte = early_ioremap_pte(addr); if (pgprot_val(flags)) - set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags)); + set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags)); +#if 0 + set_pte_at(&init_mm, addr, pte, pfn_pte(phys >> PAGE_SHIFT, + flags)); +#endif else pte_clear(&init_mm, addr, pte); __flush_tlb_one(addr); diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c index bd33620..eb6d4d7 100644 --- a/arch/x86/pci/direct.c +++ b/arch/x86/pci/direct.c @@ -282,6 +282,9 @@ int __init pci_direct_probe(void) { struct resource *region, *region2; + if (xen_domain_type == XEN_PV_IN_HVM) + return 0; + if ((pci_probe & PCI_PROBE_CONF1) == 0) goto type2; region = request_region(0xCF8, 8, "PCI conf1"); diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index e3c6a06..bb430f8 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -110,7 +110,7 @@ struct shared_info *HYPERVISOR_shared_info = (void *)&xen_dummy_shared_info; * * 0: not available, 1: available */ -static int have_vcpu_info_placement = 1; +static int have_vcpu_info_placement = 0; static void clamp_max_cpus(void) { @@ -222,8 +222,10 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx, maskebx = 0; break; } - - asm(XEN_EMULATE_PREFIX "cpuid" + if (xen_domain_type == XEN_PV_IN_HVM) { + native_cpuid(ax, bx, cx, dx); + } else + asm(XEN_EMULATE_PREFIX "cpuid" : "=a" (*ax), "=b" (*bx), "=c" (*cx), @@ -244,6 +246,7 @@ static __init void xen_init_cpuid_mask(void) ~((1 << X86_FEATURE_MCE) | /* disable MCE */ (1 << X86_FEATURE_MCA) | /* disable MCA */ (1 << X86_FEATURE_MTRR) | /* disable MTRR */ + (1 << X86_FEATURE_PSE) | /* disable 2M pages */ (1 << X86_FEATURE_ACC)); /* thermal monitoring */ if (!xen_initial_domain()) @@ -393,6 +396,10 @@ static void xen_load_gdt(const struct desc_ptr *dtr) make_lowmem_page_readonly(virt); } + if (xen_domain_type == XEN_PV_IN_HVM) { + native_load_gdt(dtr); + return; + } if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct))) BUG(); } @@ -431,6 +438,10 @@ static __init void xen_load_gdt_boot(const struct desc_ptr *dtr) frames[f] = mfn; } + if (xen_domain_type == XEN_PV_IN_HVM) { + native_load_gdt(dtr); + return; + } if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct))) BUG(); } @@ -944,6 +955,11 @@ static const struct pv_init_ops xen_init_ops __initdata = { .patch = xen_patch, }; +extern void native_iret(void); +extern void native_irq_enable_sysexit(void); +extern void native_usergs_sysret32(void); +extern void native_usergs_sysret64(void); + static const struct pv_cpu_ops xen_cpu_ops __initdata = { .cpuid = xen_cpuid, @@ -957,48 +973,48 @@ static const struct pv_cpu_ops xen_cpu_ops __initdata = { .read_cr4 = native_read_cr4, .read_cr4_safe = native_read_cr4_safe, - .write_cr4 = xen_write_cr4, + .write_cr4 = native_write_cr4, .wbinvd = native_wbinvd, .read_msr = native_read_msr_safe, - .write_msr = xen_write_msr_safe, + .write_msr = native_write_msr_safe, .read_tsc = native_read_tsc, .read_pmc = native_read_pmc, - .iret = xen_iret, - .irq_enable_sysexit = xen_sysexit, + .iret = native_iret, + .irq_enable_sysexit = native_irq_enable_sysexit, #ifdef CONFIG_X86_64 - .usergs_sysret32 = xen_sysret32, - .usergs_sysret64 = xen_sysret64, + .usergs_sysret32 = native_usergs_sysret32, + .usergs_sysret64 = native_usergs_sysret64, #endif - .load_tr_desc = paravirt_nop, - .set_ldt = xen_set_ldt, - .load_gdt = xen_load_gdt, - .load_idt = xen_load_idt, - .load_tls = xen_load_tls, + .load_tr_desc = native_load_tr_desc, + .set_ldt = native_set_ldt, + .load_gdt = native_load_gdt, + .load_idt = native_load_idt, + .load_tls = native_load_tls, #ifdef CONFIG_X86_64 - .load_gs_index = xen_load_gs_index, + .load_gs_index = native_load_gs_index, #endif - .alloc_ldt = xen_alloc_ldt, - .free_ldt = xen_free_ldt, + .alloc_ldt = paravirt_nop, + .free_ldt = paravirt_nop, .store_gdt = native_store_gdt, .store_idt = native_store_idt, - .store_tr = xen_store_tr, + .store_tr = native_store_tr, - .write_ldt_entry = xen_write_ldt_entry, - .write_gdt_entry = xen_write_gdt_entry, - .write_idt_entry = xen_write_idt_entry, - .load_sp0 = xen_load_sp0, + .write_ldt_entry = native_write_ldt_entry, + .write_gdt_entry = native_write_gdt_entry, + .write_idt_entry = native_write_idt_entry, + .load_sp0 = native_load_sp0, - .set_iopl_mask = xen_set_iopl_mask, + .set_iopl_mask = native_set_iopl_mask, .io_delay = xen_io_delay, /* Xen takes care of %gs when switching to usermode for us */ - .swapgs = paravirt_nop, + .swapgs = native_swapgs, .start_context_switch = paravirt_start_context_switch, .end_context_switch = xen_end_context_switch, @@ -1071,6 +1087,10 @@ static const struct machine_ops __initdata xen_machine_ops = { */ static void __init xen_setup_stackprotector(void) { + if (xen_domain_type == XEN_PV_IN_HVM) { + switch_to_new_gdt(0); + return; + } pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot; pv_cpu_ops.load_gdt = xen_load_gdt_boot; @@ -1091,7 +1111,7 @@ asmlinkage void __init xen_start_kernel(void) if (!xen_start_info) return; - xen_domain_type = XEN_PV_DOMAIN; + xen_domain_type = XEN_PV_IN_HVM; xen_setup_machphys_mapping(); @@ -1214,11 +1234,13 @@ asmlinkage void __init xen_start_kernel(void) * were early_cpu_init (run before ->arch_setup()) calls early_amd_init * which pokes 0xcf8 port. */ +#if 0 set_iopl.iopl = 1; rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl); if (rc != 0) xen_raw_printk("physdev_op failed %d\n", rc); +#endif #ifdef CONFIG_X86_32 /* set up basic CPUID stuff */ cpu_detect(&new_cpu_data); @@ -1388,3 +1410,20 @@ const __refconst struct hypervisor_x86 x86_hyper_xen_hvm = { }; EXPORT_SYMBOL(x86_hyper_xen_hvm); #endif + +static bool __init xen_pvhvm_platform_detect(void) +{ + return (xen_domain_type == XEN_PV_IN_HVM); +} + +static void __init xen_pvhvm_guest_init(void) +{ + if (xen_feature(XENFEAT_hvm_callback_vector)) + xen_have_vector_callback = 1; +} + +const __refconst struct hypervisor_x86 x86_hyper_xen_hybrid = { + .name = "Xen Hybrid", + .detect = xen_pvhvm_platform_detect, + .init_platform = xen_pvhvm_guest_init, +}; diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c index 6a6fe89..fb4777c 100644 --- a/arch/x86/xen/irq.c +++ b/arch/x86/xen/irq.c @@ -100,6 +100,9 @@ PV_CALLEE_SAVE_REGS_THUNK(xen_irq_enable); static void xen_safe_halt(void) { + if (xen_domain_type == XEN_PV_IN_HVM) + local_irq_enable(); + /* Blocking includes an implicit local_irq_enable(). */ if (HYPERVISOR_sched_op(SCHEDOP_block, NULL) != 0) BUG(); @@ -114,15 +117,21 @@ static void xen_halt(void) } static const struct pv_irq_ops xen_irq_ops __initdata = { +#if 0 .save_fl = PV_CALLEE_SAVE(xen_save_fl), .restore_fl = PV_CALLEE_SAVE(xen_restore_fl), .irq_disable = PV_CALLEE_SAVE(xen_irq_disable), .irq_enable = PV_CALLEE_SAVE(xen_irq_enable), +#endif + .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl), + .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl), + .irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable), + .irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable), .safe_halt = xen_safe_halt, .halt = xen_halt, #ifdef CONFIG_X86_64 - .adjust_exception_frame = xen_adjust_exception_frame, + .adjust_exception_frame = paravirt_nop, #endif }; diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index f298bd7..978d4be 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -2047,8 +2047,8 @@ static void xen_leave_lazy_mmu(void) } static const struct pv_mmu_ops xen_mmu_ops __initdata = { - .read_cr2 = xen_read_cr2, - .write_cr2 = xen_write_cr2, + .read_cr2 = native_read_cr2, + .write_cr2 = native_write_cr2, .read_cr3 = xen_read_cr3, #ifdef CONFIG_X86_32 diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index adaf127..3a4d9c3 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -394,9 +394,9 @@ void __cpuinit xen_enable_syscall(void) void __init xen_arch_setup(void) { xen_panic_handler_init(); - - HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments); HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables); +#if 0 + HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments); if (!xen_feature(XENFEAT_auto_translated_physmap)) HYPERVISOR_vm_assist(VMASST_CMD_enable, @@ -415,6 +415,7 @@ void __init xen_arch_setup(void) disable_acpi(); } #endif +#endif memcpy(boot_command_line, xen_start_info->cmd_line, MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ? diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index aaa7291..9d0e0e3 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -16,6 +16,7 @@ __INIT ENTRY(startup_xen) cld + sti #ifdef CONFIG_X86_32 mov %esi,xen_start_info mov $init_thread_union+THREAD_SIZE,%esp diff --git a/config.sav b/config.sav index ce59406..fb7aff2 100644 --- a/config.sav +++ b/config.sav @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux/x86_64 2.6.39 Kernel Configuration -# Thu May 26 11:28:17 2011 +# Mon Jul 25 15:31:44 2011 # CONFIG_64BIT=y # CONFIG_X86_32 is not set @@ -153,7 +153,7 @@ CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y -CONFIG_CC_OPTIMIZE_FOR_SIZE=y +# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_ANON_INODES=y # CONFIG_EXPERT is not set @@ -190,7 +190,6 @@ CONFIG_PCI_QUIRKS=y CONFIG_SLAB=y # CONFIG_SLUB is not set CONFIG_PROFILING=y -CONFIG_TRACEPOINTS=y CONFIG_OPROFILE=m # CONFIG_OPROFILE_EVENT_MULTIPLEX is not set CONFIG_HAVE_OPROFILE=y @@ -294,7 +293,7 @@ CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_PARAVIRT_GUEST=y CONFIG_XEN=y -CONFIG_XEN_DOM0=y +CONFIG_XEN_DOM0=n CONFIG_XEN_PRIVILEGED_GUEST=y CONFIG_XEN_PVHVM=y CONFIG_XEN_MAX_DOMAIN_MEMORY=128 @@ -1046,7 +1045,6 @@ CONFIG_XPS=y # CONFIG_NET_PKTGEN=m # CONFIG_NET_TCPPROBE is not set -CONFIG_NET_DROP_MONITOR=y CONFIG_HAMRADIO=y # @@ -1956,7 +1954,6 @@ CONFIG_IWLAGN=m CONFIG_IWLWIFI_DEBUG=y CONFIG_IWLWIFI_DEBUGFS=y # CONFIG_IWLWIFI_DEBUG_EXPERIMENTAL_UCODE is not set -# CONFIG_IWLWIFI_DEVICE_TRACING is not set # CONFIG_IWL_P2P is not set CONFIG_IWLWIFI_LEGACY=m @@ -1965,12 +1962,10 @@ CONFIG_IWLWIFI_LEGACY=m # # CONFIG_IWLWIFI_LEGACY_DEBUG is not set # CONFIG_IWLWIFI_LEGACY_DEBUGFS is not set -# CONFIG_IWLWIFI_LEGACY_DEVICE_TRACING is not set CONFIG_IWL4965=m CONFIG_IWL3945=m CONFIG_IWM=m # CONFIG_IWM_DEBUG is not set -# CONFIG_IWM_TRACING is not set CONFIG_LIBERTAS=m CONFIG_LIBERTAS_USB=m CONFIG_LIBERTAS_CS=m @@ -4546,7 +4541,7 @@ CONFIG_DLM_DEBUG=y # # Kernel hacking # -CONFIG_TRACE_IRQFLAGS_SUPPORT=y +# CONFIG_TRACE_IRQFLAGS_SUPPORT is not set # CONFIG_PRINTK_TIME is not set CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4 # CONFIG_ENABLE_WARN_DEPRECATED is not set @@ -4575,10 +4570,7 @@ CONFIG_TIMER_STATS=y # CONFIG_RT_MUTEX_TESTER is not set # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_MUTEXES is not set -# CONFIG_DEBUG_LOCK_ALLOC is not set -# CONFIG_PROVE_LOCKING is not set # CONFIG_SPARSE_RCU_POINTER is not set -# CONFIG_LOCK_STAT is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set CONFIG_STACKTRACE=y @@ -4611,8 +4603,6 @@ CONFIG_LATENCYTOP=y # CONFIG_SYSCTL_SYSCALL_CHECK is not set # CONFIG_DEBUG_PAGEALLOC is not set CONFIG_USER_STACKTRACE_SUPPORT=y -CONFIG_NOP_TRACER=y -CONFIG_HAVE_FTRACE_NMI_ENTER=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y @@ -4621,34 +4611,8 @@ CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_SYSCALL_TRACEPOINTS=y CONFIG_HAVE_C_RECORDMCOUNT=y -CONFIG_TRACER_MAX_TRACE=y CONFIG_RING_BUFFER=y -CONFIG_FTRACE_NMI_ENTER=y -CONFIG_EVENT_TRACING=y -CONFIG_EVENT_POWER_TRACING_DEPRECATED=y -CONFIG_CONTEXT_SWITCH_TRACER=y CONFIG_RING_BUFFER_ALLOW_SWAP=y -CONFIG_TRACING=y -CONFIG_GENERIC_TRACER=y -CONFIG_TRACING_SUPPORT=y -CONFIG_FTRACE=y -CONFIG_FUNCTION_TRACER=y -CONFIG_FUNCTION_GRAPH_TRACER=y -# CONFIG_IRQSOFF_TRACER is not set -CONFIG_SCHED_TRACER=y -CONFIG_FTRACE_SYSCALLS=y -CONFIG_BRANCH_PROFILE_NONE=y -# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set -# CONFIG_PROFILE_ALL_BRANCHES is not set -CONFIG_STACK_TRACER=y -CONFIG_BLK_DEV_IO_TRACE=y -CONFIG_KPROBE_EVENT=y -CONFIG_DYNAMIC_FTRACE=y -CONFIG_FUNCTION_PROFILER=y -CONFIG_FTRACE_MCOUNT_RECORD=y -# CONFIG_FTRACE_STARTUP_TEST is not set -# CONFIG_MMIOTRACE is not set -CONFIG_RING_BUFFER_BENCHMARK=m CONFIG_PROVIDE_OHCI1394_DMA_INIT=y # CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set CONFIG_BUILD_DOCSRC=y @@ -4665,6 +4629,7 @@ CONFIG_KGDB_TESTS=y # CONFIG_KGDB_LOW_LEVEL_TRAP is not set # CONFIG_KGDB_KDB is not set CONFIG_HAVE_ARCH_KMEMCHECK=y +# CONFIG_KMEMCHECK is not set # CONFIG_TEST_KSTRTOX is not set CONFIG_STRICT_DEVMEM=y # CONFIG_X86_VERBOSE_BOOTUP is not set @@ -4863,13 +4828,12 @@ CONFIG_VIRTUALIZATION=y CONFIG_KVM=m CONFIG_KVM_INTEL=m CONFIG_KVM_AMD=m -# CONFIG_KVM_MMU_AUDIT is not set # CONFIG_VHOST_NET is not set CONFIG_VIRTIO=m CONFIG_VIRTIO_RING=m CONFIG_VIRTIO_PCI=m CONFIG_VIRTIO_BALLOON=m -CONFIG_BINARY_PRINTF=y +# CONFIG_BINARY_PRINTF is not set # # Library routines diff --git a/drivers/tty/serial/8250.c b/drivers/tty/serial/8250.c index 6611535..1f52fd1 100644 --- a/drivers/tty/serial/8250.c +++ b/drivers/tty/serial/8250.c @@ -3294,6 +3294,8 @@ static int __init serial8250_init(void) { int ret; +return -ENODEV; + if (nr_uarts > UART_NR) nr_uarts = UART_NR; diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 33167b4..9133641 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -1579,7 +1579,8 @@ int xen_set_callback_via(uint64_t via) } EXPORT_SYMBOL_GPL(xen_set_callback_via); -#ifdef CONFIG_XEN_PVHVM +#ifdef CONFIG_XEN_PVHVM +/* #if 1 */ /* Vector callbacks are better than PCI interrupts to receive event * channel notifications because we can receive vector callbacks on any * vcpu and we don't need PCI support or APIC interactions. */ @@ -1632,5 +1633,7 @@ void __init xen_init_IRQ(void) irq_ctx_init(smp_processor_id()); if (xen_initial_domain()) xen_setup_pirqs(); + if (xen_domain_type == XEN_PV_IN_HVM) + xen_callback_vector(); } } diff --git a/include/xen/xen.h b/include/xen/xen.h index a164024..96ad928 100644 --- a/include/xen/xen.h +++ b/include/xen/xen.h @@ -4,6 +4,7 @@ enum xen_domain_type { XEN_NATIVE, /* running on bare hardware */ XEN_PV_DOMAIN, /* running in a PV domain */ + XEN_PV_IN_HVM, /* running PV in HVM container */ XEN_HVM_DOMAIN, /* running in a Xen hvm domain */ }; @@ -15,7 +16,8 @@ extern enum xen_domain_type xen_domain_type; #define xen_domain() (xen_domain_type != XEN_NATIVE) #define xen_pv_domain() (xen_domain() && \ - xen_domain_type == XEN_PV_DOMAIN) + (xen_domain_type == XEN_PV_DOMAIN || \ + xen_domain_type == XEN_PV_IN_HVM)) #define xen_hvm_domain() (xen_domain() && \ xen_domain_type == XEN_HVM_DOMAIN) --MP_/lCEwAM=c9nqZOQB7fZHkfJC Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --MP_/lCEwAM=c9nqZOQB7fZHkfJC--