* [PATCH 2/3] KVM/Hyper-V: Add new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH
From: lantianyu1986 @ 2019-08-09 9:49 UTC (permalink / raw)
To: pbonzini, rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx,
mingo, bp, hpa, x86, michael.h.kelley
Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, vkuznets
In-Reply-To: <20190809094939.76093-1-Tianyu.Lan@microsoft.com>
From: Tianyu Lan <Tianyu.Lan@microsoft.com>
This patch adds new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH and let
user space to enable direct tlb flush function when only Hyper-V
hypervsior capability is exposed to VM. This patch also adds
enable_direct_tlbflush callback in the struct kvm_x86_ops and
platforms may use it to implement direct tlb flush support.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Documentation/virtual/kvm/api.txt | 10 ++++++++++
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/x86.c | 8 ++++++++
include/uapi/linux/kvm.h | 1 +
4 files changed, 21 insertions(+)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 2cd6250b2896..45308ed6dd75 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -5289,3 +5289,13 @@ Architectures: x86
This capability indicates that KVM supports paravirtualized Hyper-V IPI send
hypercalls:
HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
+8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
+
+Architecture: x86
+
+This capability indicates that KVM supports Hyper-V direct tlb flush function.
+User space should enable this feature only when Hyper-V hypervisor capability
+is exposed to guest and KVM profile is hided. Both Hyper-V and KVM hypercalls
+use RAX and RCX registers to pass parameters. If KVM hypercall is exposed
+to L2 guest with direct tlbflush enabled, Hyper-V may mistake KVM hypercall
+for Hyper-V tlb flush Hypercall due to paremeter register overlap.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0cc5b611a113..667d154e89d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1205,6 +1205,8 @@ struct kvm_x86_ops {
uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
+
+ int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d7b9e6a0939..a9d8ee7f7bf0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3183,6 +3183,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = kvm_x86_ops->get_nested_state ?
kvm_x86_ops->get_nested_state(NULL, NULL, 0) : 0;
break;
+ case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
+ r = kvm_x86_ops->enable_direct_tlbflush ? 1 : 0;
+ break;
default:
break;
}
@@ -3953,6 +3956,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
r = -EFAULT;
}
return r;
+ case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
+ if (!kvm_x86_ops->enable_direct_tlbflush)
+ return -ENOTTY;
+
+ return kvm_x86_ops->enable_direct_tlbflush(vcpu);
default:
return -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a7c19540ce21..cb959bc925b1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
#define KVM_CAP_ARM_PTRAUTH_GENERIC 172
#define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 174
#ifdef KVM_CAP_IRQ_ROUTING
--
2.14.2
^ permalink raw reply related
* [PATCH 3/3] KVM/Hyper-V/VMX: Add direct tlb flush support
From: lantianyu1986 @ 2019-08-09 9:49 UTC (permalink / raw)
To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
pbonzini, rkrcmar, michael.h.kelley
Cc: Vitaly Kuznetsov, linux-hyperv, linux-kernel, kvm, Tianyu Lan
In-Reply-To: <20190809094939.76093-1-Tianyu.Lan@microsoft.com>
From: Vitaly Kuznetsov <vkuznets@redhat.com>
This patch is to enable Hyper-V direct tlb flush function
for VMX.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
arch/x86/include/asm/hyperv-tlfs.h | 16 +++++++++++++++-
arch/x86/kvm/vmx/evmcs.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/kvm_host.h | 1 +
4 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index a79703c56ebe..d53d6e4a6210 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -171,6 +171,7 @@
#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
/* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
+#define HV_X64_NESTED_DIRECT_FLUSH BIT(17)
#define HV_X64_NESTED_GUEST_MAPPING_FLUSH BIT(18)
#define HV_X64_NESTED_MSR_BITMAP BIT(19)
@@ -514,12 +515,22 @@ struct hv_timer_message_payload {
__u64 delivery_time; /* When the message was delivered */
} __packed;
+struct hv_nested_enlightenments_control {
+ struct {
+ __u32 directhypercall:1;
+ __u32 reserved:31;
+ } features;
+ struct {
+ __u32 reserved;
+ } hypercallControls;
+} __packed;
+
/* Define virtual processor assist page structure. */
struct hv_vp_assist_page {
__u32 apic_assist;
__u32 reserved1;
__u64 vtl_control[3];
- __u64 nested_enlightenments_control[2];
+ struct hv_nested_enlightenments_control nested_control;
__u8 enlighten_vmentry;
__u8 reserved2[7];
__u64 current_nested_vmcs;
@@ -872,4 +883,7 @@ struct hv_tlb_flush_ex {
u64 gva_list[];
} __packed;
+struct hv_partition_assist_pg {
+ u32 tlb_lock_count;
+};
#endif
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 39a24eec8884..07ebf6882a45 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -178,6 +178,8 @@ static inline void evmcs_load(u64 phys_addr)
struct hv_vp_assist_page *vp_ap =
hv_get_vp_assist_page(smp_processor_id());
+ if (current_evmcs->hv_enlightenments_control.nested_flush_hypercall)
+ vp_ap->nested_control.features.directhypercall = 1;
vp_ap->current_nested_vmcs = phys_addr;
vp_ap->enlighten_vmentry = 1;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 84f8d49a2fd2..a49be029864e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -486,6 +486,34 @@ static int hv_remote_flush_tlb(struct kvm *kvm)
return hv_remote_flush_tlb_with_range(kvm, NULL);
}
+static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+{
+ struct hv_enlightened_vmcs *evmcs;
+
+ /*
+ * Synthetic VM-Exit is not enabled in current code and so All
+ * evmcs in singe VM shares same assist page.
+ */
+ if (!vcpu->kvm->hv_pa_pg) {
+ vcpu->kvm->hv_pa_pg = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!vcpu->kvm->hv_pa_pg)
+ return -ENOMEM;
+ pr_debug("KVM: Hyper-V: allocated PA_PG for %llx\n",
+ (u64)&vcpu->kvm);
+ }
+
+ evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs;
+
+ evmcs->partition_assist_page =
+ __pa(vcpu->kvm->hv_pa_pg);
+ evmcs->hv_vm_id = (u64)vcpu->kvm;
+ evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;
+
+ pr_debug("KVM: Hyper-V: enabled DIRECT flush for %llx\n",
+ (u64)vcpu->kvm);
+ return 0;
+}
+
#endif /* IS_ENABLED(CONFIG_HYPERV) */
/*
@@ -6516,6 +6544,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
current_evmcs->hv_clean_fields |=
HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
+ if (static_branch_unlikely(&enable_evmcs))
+ current_evmcs->hv_vp_id = vcpu->arch.hyperv.vp_index;
+
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if (vmx->host_debugctlmsr)
update_debugctlmsr(vmx->host_debugctlmsr);
@@ -6583,6 +6614,7 @@ static struct kvm *vmx_vm_alloc(void)
static void vmx_vm_free(struct kvm *kvm)
{
+ kfree(kvm->hv_pa_pg);
vfree(to_kvm_vmx(kvm));
}
@@ -7815,6 +7847,7 @@ static void vmx_exit(void)
if (!vp_ap)
continue;
+ vp_ap->nested_control.features.directhypercall = 0;
vp_ap->current_nested_vmcs = 0;
vp_ap->enlighten_vmentry = 0;
}
@@ -7854,6 +7887,11 @@ static int __init vmx_init(void)
pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n");
static_branch_enable(&enable_evmcs);
}
+
+ if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
+ vmx_x86_ops.enable_direct_tlbflush
+ = hv_enable_direct_tlbflush;
+
} else {
enlightened_vmcs = false;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c5da875f19e3..479ad76661e6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -500,6 +500,7 @@ struct kvm {
struct srcu_struct srcu;
struct srcu_struct irq_srcu;
pid_t userspace_pid;
+ struct hv_partition_assist_pg *hv_pa_pg;
};
#define kvm_err(fmt, ...) \
--
2.14.2
^ permalink raw reply related
* Re: [PATCH 1/3] x86/Hyper-V: Fix definition of struct hv_vp_assist_page
From: Vitaly Kuznetsov @ 2019-08-09 10:25 UTC (permalink / raw)
To: lantianyu1986
Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, pbonzini,
rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp,
hpa, x86, michael.h.kelley
In-Reply-To: <20190809094939.76093-2-Tianyu.Lan@microsoft.com>
lantianyu1986@gmail.com writes:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> The struct hv_vp_assist_page was defined incorrectly.
> The "vtl_control" should be u64[3], "nested_enlightenments_control"
> should be a u64 and there is 7 reserved bytes following "enlighten_vmentry".
> This patch is to fix it.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index af78cd72b8f3..a79703c56ebe 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -517,11 +517,11 @@ struct hv_timer_message_payload {
> /* Define virtual processor assist page structure. */
> struct hv_vp_assist_page {
> __u32 apic_assist;
> - __u32 reserved;
> - __u64 vtl_control[2];
> + __u32 reserved1;
> + __u64 vtl_control[3];
> __u64 nested_enlightenments_control[2];
In PATCH3 you define 'struct hv_nested_enlightenments_control' and it is
64bit long, not 128. We should change it here too as ...
> - __u32 enlighten_vmentry;
enlighten_vmentry filed will get a very different offset breaking
Enlightened VMCS.
> - __u32 padding;
> + __u8 enlighten_vmentry;
> + __u8 reserved2[7];
> __u64 current_nested_vmcs;
> } __packed;
--
Vitaly
^ permalink raw reply
* Re: [PATCH 2/3] KVM/Hyper-V: Add new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH
From: Vitaly Kuznetsov @ 2019-08-09 10:44 UTC (permalink / raw)
To: lantianyu1986
Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, pbonzini,
rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp,
hpa, x86, michael.h.kelley
In-Reply-To: <20190809094939.76093-3-Tianyu.Lan@microsoft.com>
lantianyu1986@gmail.com writes:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> This patch adds new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH and let
> user space to enable direct tlb flush function when only Hyper-V
> hypervsior capability is exposed to VM. This patch also adds
> enable_direct_tlbflush callback in the struct kvm_x86_ops and
> platforms may use it to implement direct tlb flush support.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Documentation/virtual/kvm/api.txt | 10 ++++++++++
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/x86.c | 8 ++++++++
> include/uapi/linux/kvm.h | 1 +
> 4 files changed, 21 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 2cd6250b2896..45308ed6dd75 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -5289,3 +5289,13 @@ Architectures: x86
> This capability indicates that KVM supports paravirtualized Hyper-V IPI send
> hypercalls:
> HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
> +8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
> +
> +Architecture: x86
> +
> +This capability indicates that KVM supports Hyper-V direct tlb flush function.
> +User space should enable this feature only when Hyper-V hypervisor capability
> +is exposed to guest and KVM profile is hided. Both Hyper-V and KVM hypercalls
> +use RAX and RCX registers to pass parameters. If KVM hypercall is exposed
> +to L2 guest with direct tlbflush enabled, Hyper-V may mistake KVM hypercall
> +for Hyper-V tlb flush Hypercall due to paremeter register overlap.
First, we need to explicitly state that this is for KVM on Hyper-V and
second, that this disables normal hypercall handling by KVM.
My take:
This capability indicates that KVM running on top of Hyper-V hypervisor
enables Direct TLB flush for its guests meaning that TLB flush
hypercalls are handled by Level 1 hypervisor (Hyper-V) bypassing KVM.
Due to the different ABI for hypercall parameters between Hyper-V and
KVM, enabling this capability effectively disables all hypercall
handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
flush hypercalls by Hyper-C) so userspace should disable KVM
identification in CPUID.
I think we should also enforce this somehow leaving only Hyper-V style
hypercalls handling (for Windows guests) in place.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0cc5b611a113..667d154e89d4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1205,6 +1205,8 @@ struct kvm_x86_ops {
> uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
>
> bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
> +
> + int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9d7b9e6a0939..a9d8ee7f7bf0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3183,6 +3183,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> r = kvm_x86_ops->get_nested_state ?
> kvm_x86_ops->get_nested_state(NULL, NULL, 0) : 0;
> break;
> + case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
> + r = kvm_x86_ops->enable_direct_tlbflush ? 1 : 0;
> + break;
> default:
> break;
> }
> @@ -3953,6 +3956,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
> r = -EFAULT;
> }
> return r;
> + case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
> + if (!kvm_x86_ops->enable_direct_tlbflush)
> + return -ENOTTY;
> +
> + return kvm_x86_ops->enable_direct_tlbflush(vcpu);
>
> default:
> return -EINVAL;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a7c19540ce21..cb959bc925b1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -996,6 +996,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> #define KVM_CAP_PMU_EVENT_FILTER 173
> +#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 174
>
> #ifdef KVM_CAP_IRQ_ROUTING
--
Vitaly
^ permalink raw reply
* Re: [PATCH net v2] hv_netvsc: Fix a warning of suspicious RCU usage
From: David Miller @ 2019-08-09 20:42 UTC (permalink / raw)
To: decui
Cc: netdev, haiyangz, sthemmin, jakub.kicinski, sashal, kys, mikelley,
linux-hyperv, linux-kernel, olaf, apw, jasowang, vkuznets,
marcelo.cerri
In-Reply-To: <PU1P153MB0169A6492DCBB490FE7FE52CBFD60@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
From: Dexuan Cui <decui@microsoft.com>
Date: Fri, 9 Aug 2019 01:58:08 +0000
> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
>
> Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Lorenzo Pieralisi @ 2019-08-12 13:06 UTC (permalink / raw)
To: Dexuan Cui
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Michael Kelley,
Stephen Hemminger, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org, Sasha Levin,
Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
jackm@mellanox.com
In-Reply-To: <PU1P153MB0169F9EDD707FFE1517F8D56BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
On Tue, Aug 06, 2019 at 08:41:17PM +0000, Dexuan Cui wrote:
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Bjorn Helgaas
> > Sent: Tuesday, August 6, 2019 1:16 PM
> > To: Dexuan Cui <decui@microsoft.com>
> >
> > Thanks for updating this. But you didn't update the subject line,
> > which is really still a little too low-level. Maybe Lorenzo will fix
> > this. Something like this, maybe?
> >
> > PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it
>
> This is better. Thanks!
>
> I hope Lorenzo can help to fix this so I could avoid a v3. :-)
You should have fixed it yourself, this time I will.
Thanks,
Lorenzo
^ permalink raw reply
* RE: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-12 13:50 UTC (permalink / raw)
To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
Cc: KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
linux-kernel@vger.kernel.org
In-Reply-To: <1565135484-31351-1-git-send-email-haiyangz@microsoft.com>
> -----Original Message-----
> From: LKML haiyangz <lkmlhyz@microsoft.com> On Behalf Of Haiyang
> Zhang
> Sent: Tuesday, August 6, 2019 7:52 PM
> To: sashal@kernel.org; bhelgaas@google.com; lorenzo.pieralisi@arm.com;
> linux-hyperv@vger.kernel.org; linux-pci@vger.kernel.org
> Cc: Haiyang Zhang <haiyangz@microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
>
> Currently in Azure cloud, for passthrough devices including GPU, the
> host sets the device instance ID's bytes 8 - 15 to a value derived from
> the host HWID, which is the same on all devices in a VM. So, the device
> instance ID's bytes 8 and 9 provided by the host are no longer unique.
>
> This can cause device passthrough to VMs to fail because the bytes 8 and
> 9 is used as PCI domain number. So, as recommended by Azure host team,
> we now use the bytes 4 and 5 which usually contain unique numbers as PCI
> domain. The chance of collision is greatly reduced. In the rare cases of
> collision, we will detect and find another number that is not in use.
>
> Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Acked-by: Sasha Levin <sashal@kernel.org>
> ---
> drivers/pci/controller/pci-hyperv.c | 92
Hi Lorenzo,
This patch has been updated based on Bjorn's comments. Do you have any further
comments? Could you take it from your tree?
Thanks,
- Haiyang
^ permalink raw reply
* Re: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Lorenzo Pieralisi @ 2019-08-12 15:38 UTC (permalink / raw)
To: Haiyang Zhang
Cc: sashal@kernel.org, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
linux-kernel@vger.kernel.org
In-Reply-To: <1565135484-31351-1-git-send-email-haiyangz@microsoft.com>
On Tue, Aug 06, 2019 at 11:52:11PM +0000, Haiyang Zhang wrote:
> Currently in Azure cloud, for passthrough devices including GPU, the
> host sets the device instance ID's bytes 8 - 15 to a value derived from
> the host HWID, which is the same on all devices in a VM. So, the device
> instance ID's bytes 8 and 9 provided by the host are no longer unique.
>
> This can cause device passthrough to VMs to fail because the bytes 8 and
> 9 is used as PCI domain number. So, as recommended by Azure host team,
> we now use the bytes 4 and 5 which usually contain unique numbers as PCI
> domain. The chance of collision is greatly reduced. In the rare cases of
> collision, we will detect and find another number that is not in use.
This is not clear at all. Why "finding another number" is fine with
this patch while it is not with current kernel code ? Also does this
have backward compatibility issues ?
I do not understand if a collision is a problem or not from the
log above.
> Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.
Add it as Suggested-by: tag.
Thanks,
Lorenzo
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Acked-by: Sasha Levin <sashal@kernel.org>
> ---
> drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
> 1 file changed, 79 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 40b6254..4f3d97e 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
> complete(&hbus->remove_event);
> }
>
> +#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> +
> +/*
> + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
> + * as invalid for passthrough PCI devices of this driver.
> + */
> +#define HVPCI_DOM_INVALID 0
> +
> +/**
> + * hv_get_dom_num() - Get a valid PCI domain number
> + * Check if the PCI domain number is in use, and return another number if
> + * it is in use.
> + *
> + * @dom: Requested domain number
> + *
> + * return: domain number on success, HVPCI_DOM_INVALID on failure
> + */
> +static u16 hv_get_dom_num(u16 dom)
> +{
> + unsigned int i;
> +
> + if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> + return dom;
> +
> + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> + if (test_and_set_bit(i, hvpci_dom_map) == 0)
> + return i;
> + }
> +
> + return HVPCI_DOM_INVALID;
> +}
> +
> +/**
> + * hv_put_dom_num() - Mark the PCI domain number as free
> + * @dom: Domain number to be freed
> + */
> +static void hv_put_dom_num(u16 dom)
> +{
> + clear_bit(dom, hvpci_dom_map);
> +}
> +
> /**
> * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
> * @hdev: VMBus's tracking struct for this root PCI bus
> @@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> const struct hv_vmbus_device_id *dev_id)
> {
> struct hv_pcibus_device *hbus;
> + u16 dom_req, dom;
> int ret;
>
> /*
> @@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
> hbus->state = hv_pcibus_init;
>
> /*
> - * The PCI bus "domain" is what is called "segment" in ACPI and
> - * other specs. Pull it from the instance ID, to get something
> - * unique. Bytes 8 and 9 are what is used in Windows guests, so
> - * do the same thing for consistency. Note that, since this code
> - * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> - * that (1) the only domain in use for something that looks like
> - * a physical PCI bus (which is actually emulated by the
> - * hypervisor) is domain 0 and (2) there will be no overlap
> - * between domains derived from these instance IDs in the same
> - * VM.
> + * The PCI bus "domain" is what is called "segment" in ACPI and other
> + * specs. Pull it from the instance ID, to get something usually
> + * unique. In rare cases of collision, we will find out another number
> + * not in use.
> + *
> + * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> + * together with this guest driver can guarantee that (1) The only
> + * domain used by Gen1 VMs for something that looks like a physical
> + * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> + * (2) There will be no overlap between domains (after fixing possible
> + * collisions) in the same VM.
> */
> - hbus->sysdata.domain = hdev->dev_instance.b[9] |
> - hdev->dev_instance.b[8] << 8;
> + dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> + dom = hv_get_dom_num(dom_req);
> +
> + if (dom == HVPCI_DOM_INVALID) {
> + dev_err(&hdev->device,
> + "Unable to use dom# 0x%hx or other numbers", dom_req);
> + ret = -EINVAL;
> + goto free_bus;
> + }
> +
> + if (dom != dom_req)
> + dev_info(&hdev->device,
> + "PCI dom# 0x%hx has collision, using 0x%hx",
> + dom_req, dom);
> +
> + hbus->sysdata.domain = dom;
>
> hbus->hdev = hdev;
> refcount_set(&hbus->remove_lock, 1);
> @@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> hbus->sysdata.domain);
> if (!hbus->wq) {
> ret = -ENOMEM;
> - goto free_bus;
> + goto free_dom;
> }
>
> ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> @@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
> vmbus_close(hdev->channel);
> destroy_wq:
> destroy_workqueue(hbus->wq);
> +free_dom:
> + hv_put_dom_num(hbus->sysdata.domain);
> free_bus:
> free_page((unsigned long)hbus);
> return ret;
> @@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
> put_hvpcibus(hbus);
> wait_for_completion(&hbus->remove_event);
> destroy_workqueue(hbus->wq);
> +
> + hv_put_dom_num(hbus->sysdata.domain);
> +
> free_page((unsigned long)hbus);
> return 0;
> }
> @@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
>
> static int __init init_hv_pci_drv(void)
> {
> + /* Set the invalid domain number's bit, so it will not be used */
> + set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> +
> return vmbus_driver_register(&hv_pci_drv);
> }
>
> --
> 1.8.3.1
>
^ permalink raw reply
* RE: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-12 15:56 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: sashal@kernel.org, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
linux-kernel@vger.kernel.org
In-Reply-To: <20190812153833.GA30794@e121166-lin.cambridge.arm.com>
> -----Original Message-----
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Sent: Monday, August 12, 2019 11:39 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; bhelgaas@google.com; linux-
> hyperv@vger.kernel.org; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
>
> On Tue, Aug 06, 2019 at 11:52:11PM +0000, Haiyang Zhang wrote:
> > Currently in Azure cloud, for passthrough devices including GPU, the
> > host sets the device instance ID's bytes 8 - 15 to a value derived from
> > the host HWID, which is the same on all devices in a VM. So, the device
> > instance ID's bytes 8 and 9 provided by the host are no longer unique.
> >
> > This can cause device passthrough to VMs to fail because the bytes 8 and
> > 9 is used as PCI domain number. So, as recommended by Azure host team,
> > we now use the bytes 4 and 5 which usually contain unique numbers as PCI
> > domain. The chance of collision is greatly reduced. In the rare cases of
> > collision, we will detect and find another number that is not in use.
>
> This is not clear at all. Why "finding another number" is fine with
> this patch while it is not with current kernel code ? Also does this
> have backward compatibility issues ?
The bytes 4, 5 have more uniqueness (info entropy) than bytes 8, 9, so we use
bytes 4, 5. On older hosts, bytes 4, 5 can also be used -- so it has no backward
compatibility issues.
> I do not understand if a collision is a problem or not from the
> log above.
Collision will cause the second device with the same domain number fails to load.
I will include these info into the patch description.
>
> > Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this
> idea.
>
> Add it as Suggested-by: tag.
I will add this line.
Thanks,
- Haiyang
^ permalink raw reply
* [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-12 18:20 UTC (permalink / raw)
To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
vkuznets, linux-kernel@vger.kernel.org
Currently in Azure cloud, for passthrough devices including GPU, the host
sets the device instance ID's bytes 8 - 15 to a value derived from the host
HWID, which is the same on all devices in a VM. So, the device instance
ID's bytes 8 and 9 provided by the host are no longer unique. This can
cause device passthrough to VMs to fail because the bytes 8 and 9 are used
as PCI domain number. Collision of domain numbers will cause the second
device with the same domain number fail to load.
As recommended by Azure host team, the bytes 4, 5 have more uniqueness
(info entropy) than bytes 8, 9. So now we use bytes 4, 5 as the PCI domain
numbers. On older hosts, bytes 4, 5 can also be used -- no backward
compatibility issues here. The chance of collision is greatly reduced. In
the rare cases of collision, we will detect and find another number that is
not in use.
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Acked-by: Sasha Levin <sashal@kernel.org>
---
drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
1 file changed, 79 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b6254..4f3d97e 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
complete(&hbus->remove_event);
}
+#define HVPCI_DOM_MAP_SIZE (64 * 1024)
+static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
+
+/*
+ * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
+ * as invalid for passthrough PCI devices of this driver.
+ */
+#define HVPCI_DOM_INVALID 0
+
+/**
+ * hv_get_dom_num() - Get a valid PCI domain number
+ * Check if the PCI domain number is in use, and return another number if
+ * it is in use.
+ *
+ * @dom: Requested domain number
+ *
+ * return: domain number on success, HVPCI_DOM_INVALID on failure
+ */
+static u16 hv_get_dom_num(u16 dom)
+{
+ unsigned int i;
+
+ if (test_and_set_bit(dom, hvpci_dom_map) == 0)
+ return dom;
+
+ for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
+ if (test_and_set_bit(i, hvpci_dom_map) == 0)
+ return i;
+ }
+
+ return HVPCI_DOM_INVALID;
+}
+
+/**
+ * hv_put_dom_num() - Mark the PCI domain number as free
+ * @dom: Domain number to be freed
+ */
+static void hv_put_dom_num(u16 dom)
+{
+ clear_bit(dom, hvpci_dom_map);
+}
+
/**
* hv_pci_probe() - New VMBus channel probe, for a root PCI bus
* @hdev: VMBus's tracking struct for this root PCI bus
@@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
const struct hv_vmbus_device_id *dev_id)
{
struct hv_pcibus_device *hbus;
+ u16 dom_req, dom;
int ret;
/*
@@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
hbus->state = hv_pcibus_init;
/*
- * The PCI bus "domain" is what is called "segment" in ACPI and
- * other specs. Pull it from the instance ID, to get something
- * unique. Bytes 8 and 9 are what is used in Windows guests, so
- * do the same thing for consistency. Note that, since this code
- * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
- * that (1) the only domain in use for something that looks like
- * a physical PCI bus (which is actually emulated by the
- * hypervisor) is domain 0 and (2) there will be no overlap
- * between domains derived from these instance IDs in the same
- * VM.
+ * The PCI bus "domain" is what is called "segment" in ACPI and other
+ * specs. Pull it from the instance ID, to get something usually
+ * unique. In rare cases of collision, we will find out another number
+ * not in use.
+ *
+ * Note that, since this code only runs in a Hyper-V VM, Hyper-V
+ * together with this guest driver can guarantee that (1) The only
+ * domain used by Gen1 VMs for something that looks like a physical
+ * PCI bus (which is actually emulated by the hypervisor) is domain 0.
+ * (2) There will be no overlap between domains (after fixing possible
+ * collisions) in the same VM.
*/
- hbus->sysdata.domain = hdev->dev_instance.b[9] |
- hdev->dev_instance.b[8] << 8;
+ dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
+ dom = hv_get_dom_num(dom_req);
+
+ if (dom == HVPCI_DOM_INVALID) {
+ dev_err(&hdev->device,
+ "Unable to use dom# 0x%hx or other numbers", dom_req);
+ ret = -EINVAL;
+ goto free_bus;
+ }
+
+ if (dom != dom_req)
+ dev_info(&hdev->device,
+ "PCI dom# 0x%hx has collision, using 0x%hx",
+ dom_req, dom);
+
+ hbus->sysdata.domain = dom;
hbus->hdev = hdev;
refcount_set(&hbus->remove_lock, 1);
@@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
hbus->sysdata.domain);
if (!hbus->wq) {
ret = -ENOMEM;
- goto free_bus;
+ goto free_dom;
}
ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
@@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
vmbus_close(hdev->channel);
destroy_wq:
destroy_workqueue(hbus->wq);
+free_dom:
+ hv_put_dom_num(hbus->sysdata.domain);
free_bus:
free_page((unsigned long)hbus);
return ret;
@@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
put_hvpcibus(hbus);
wait_for_completion(&hbus->remove_event);
destroy_workqueue(hbus->wq);
+
+ hv_put_dom_num(hbus->sysdata.domain);
+
free_page((unsigned long)hbus);
return 0;
}
@@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
static int __init init_hv_pci_drv(void)
{
+ /* Set the invalid domain number's bit, so it will not be used */
+ set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
+
return vmbus_driver_register(&hv_pci_drv);
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v1] Tools: hv: move to tools buildsystem
From: Andy Shevchenko @ 2019-08-12 18:27 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, linux-hyperv,
Sasha Levin
In-Reply-To: <20190628170542.28481-1-andriy.shevchenko@linux.intel.com>
On Fri, Jun 28, 2019 at 08:05:42PM +0300, Andy Shevchenko wrote:
> There is a nice buildsystem dedicated for userspace tools in Linux kernel tree.
> Switch gpio target to be built by it.
>
Any comments on this?
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
> tools/hv/Build | 3 +++
> tools/hv/Makefile | 51 +++++++++++++++++++++++++++++++++++++----------
> 2 files changed, 44 insertions(+), 10 deletions(-)
> create mode 100644 tools/hv/Build
>
> diff --git a/tools/hv/Build b/tools/hv/Build
> new file mode 100644
> index 000000000000..6cf51fa4b306
> --- /dev/null
> +++ b/tools/hv/Build
> @@ -0,0 +1,3 @@
> +hv_kvp_daemon-y += hv_kvp_daemon.o
> +hv_vss_daemon-y += hv_vss_daemon.o
> +hv_fcopy_daemon-y += hv_fcopy_daemon.o
> diff --git a/tools/hv/Makefile b/tools/hv/Makefile
> index 5db5e62cebda..b57143d9459c 100644
> --- a/tools/hv/Makefile
> +++ b/tools/hv/Makefile
> @@ -1,28 +1,55 @@
> # SPDX-License-Identifier: GPL-2.0
> # Makefile for Hyper-V tools
> -
> -WARNINGS = -Wall -Wextra
> -CFLAGS = $(WARNINGS) -g $(shell getconf LFS_CFLAGS)
> -
> -CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> +include ../scripts/Makefile.include
>
> sbindir ?= /usr/sbin
> libexecdir ?= /usr/libexec
> sharedstatedir ?= /var/lib
>
> -ALL_PROGRAMS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> +ifeq ($(srctree),)
> +srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> +srctree := $(patsubst %/,%,$(dir $(srctree)))
> +endif
> +
> +# Do not use make's built-in rules
> +# (this improves performance and avoids hard-to-debug behaviour);
> +MAKEFLAGS += -r
> +
> +override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
> +
> +ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> +ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
>
> ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
>
> all: $(ALL_PROGRAMS)
>
> -%: %.c
> - $(CC) $(CFLAGS) -o $@ $^
> +export srctree OUTPUT CC LD CFLAGS
> +include $(srctree)/tools/build/Makefile.include
> +
> +HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
> +$(HV_KVP_DAEMON_IN): FORCE
> + $(Q)$(MAKE) $(build)=hv_kvp_daemon
> +$(OUTPUT)hv_kvp_daemon: $(HV_KVP_DAEMON_IN)
> + $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> +
> +HV_VSS_DAEMON_IN := $(OUTPUT)hv_vss_daemon-in.o
> +$(HV_VSS_DAEMON_IN): FORCE
> + $(Q)$(MAKE) $(build)=hv_vss_daemon
> +$(OUTPUT)hv_vss_daemon: $(HV_VSS_DAEMON_IN)
> + $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> +
> +HV_FCOPY_DAEMON_IN := $(OUTPUT)hv_fcopy_daemon-in.o
> +$(HV_FCOPY_DAEMON_IN): FORCE
> + $(Q)$(MAKE) $(build)=hv_fcopy_daemon
> +$(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
> + $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
>
> clean:
> - $(RM) hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> + rm -f $(ALL_PROGRAMS)
> + find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
>
> -install: all
> +install: $(ALL_PROGRAMS)
> install -d -m 755 $(DESTDIR)$(sbindir); \
> install -d -m 755 $(DESTDIR)$(libexecdir)/hypervkvpd; \
> install -d -m 755 $(DESTDIR)$(sharedstatedir); \
> @@ -33,3 +60,7 @@ install: all
> for script in $(ALL_SCRIPTS); do \
> install $$script -m 755 $(DESTDIR)$(libexecdir)/hypervkvpd/$${script%.sh}; \
> done
> +
> +FORCE:
> +
> +.PHONY: all install clean FORCE prepare
> --
> 2.20.1
>
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* RE: [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
From: Michael Kelley @ 2019-08-12 18:39 UTC (permalink / raw)
To: lantianyu1986@gmail.com, luto@kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org,
KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, daniel.lezcano@linaro.org, arnd@arndb.de,
ashal@kernel.org
Cc: Tianyu Lan, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <20190729075243.22745-2-Tianyu.Lan@microsoft.com>
From: lantianyu1986@gmail.com <lantianyu1986@gmail.com> Sent: Monday, July 29, 2019 12:53 AM
>
> This is to prepare to add Hyper-V sched clock callback and move
> Hyper-V reference TSC initialization much earlier in the boot
> process when timestamp is 0. So no discontinuity is observed
> when pv_ops.time.sched_clock to calculate its offset. This earlier
> initialization requires that the Hyper-V TSC page be allocated
> statically instead of with vmalloc(), so fixup the references
> to the TSC page and the method of getting its physical address.
I'd suggest tweaking the commit message wording a bit:
Prepare to add Hyper-V sched clock callback and move Hyper-V
Reference TSC initialization much earlier in the boot process. Earlier
initialization is needed so that it happens while the timestamp value
is still 0 and no discontinuity in the timestamp will occur when
pv_ops.time.sched_clock calculates its offset. The earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> arch/x86/entry/vdso/vma.c | 2 +-
> drivers/clocksource/hyperv_timer.c | 12 ++++--------
> 2 files changed, 5 insertions(+), 9 deletions(-)
>
> @@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
> if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
> return false;
>
> - tsc_pg = vmalloc(PAGE_SIZE);
> - if (!tsc_pg)
> - return false;
> -
> hyperv_cs = &hyperv_cs_tsc;
> - phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> + phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;
The and'ing with PAGE_MASK isn't needed. You've set up tsc_pg
to ensure it is page aligned, so there's no need to mask out any
low order bits. That's why the previous code didn't do the masking
either.
>
> /*
> * The Hyper-V TLFS specifies to preserve the value of reserved
> --
> 2.14.5
^ permalink raw reply
* RE: [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Michael Kelley @ 2019-08-12 18:41 UTC (permalink / raw)
To: lantianyu1986@gmail.com, KY Srinivasan, Haiyang Zhang,
Stephen Hemminger, sashal@kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org,
daniel.lezcano@linaro.org, arnd@arndb.de, ashal@kernel.org
Cc: Tianyu Lan, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <20190729075243.22745-3-Tianyu.Lan@microsoft.com>
From: Tianyu Lan <Tianyu.Lan@microsoft.com> Sent: Monday, July 29, 2019 12:53 AM
>
> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> on x86. But native_sched_clock() directly uses the raw TSC value, which
> can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
> to set the sched clock function appropriately. On x86, this sets pv_ops.time.
> sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted
> to be continuous.
>
> Also move the Hyper-V reference TSC initialization much earlier in the boot
> process so no discontinuity is observed when pv_ops.time.sched_clock
> calculates its offset.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> arch/x86/hyperv/hv_init.c | 2 --
> arch/x86/kernel/cpu/mshyperv.c | 8 ++++++++
> drivers/clocksource/hyperv_timer.c | 22 ++++++++++++----------
> include/asm-generic/mshyperv.h | 1 +
> 4 files changed, 21 insertions(+), 12 deletions(-)
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Michael Kelley @ 2019-08-12 19:22 UTC (permalink / raw)
To: Tianyu Lan, vkuznets
Cc: Peter Zijlstra, Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger kernel org,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, the arch/x86 maintainers, KY Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Daniel Lezcano,
Arnd Bergmann, ashal@kernel.org
In-Reply-To: <CAOLK0py6ngy9kAnZcRMBK8U45s2L5Wo4X0NP_qPM0zv7WjeVQQ@mail.gmail.com>
From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>
> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >
> > Peter Zijlstra <peterz@infradead.org> writes:
> >
> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> > >> lantianyu1986@gmail.com writes:
> > >>
> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> > >> >
> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which
> > >> > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
> > >> > to set the sched clock function appropriately. On x86, this sets
> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > >> > scaled and adjusted to be continuous.
> > >>
> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
> > >> I'm afraid.
> > >>
> > >> On the other hand, what we have now is probably worse: TSC can,
> > >> actually, jump backwards (e.g. on migration) and we're breaking the
> > >> requirements for sched_clock().
> > >
> > > That (obviously) also breaks the requirements for using TSC as
> > > clocksource.
> > >
> > > IOW, it breaks the entire purpose of having TSC in the first place.
> >
> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> > instead. The problem is that 'TSC page' can be disabled by the
> > hypervisor and in that case the only remaining clocksource is MSR-based
> > (slow).
> >
>
> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
> kernel uses MSR based
> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
> take this into
> account and determine which clocksource should be exposed or not.
>
We've confirmed with the Hyper-V team that the TSC page is always available
on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
hardware presents an InvariantTSC. But the Linux Kconfig's are set up so
the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
reads. For 32-bit, this set of changes will add more overhead because the
sched clock reads will now be MSR reads.
I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
supported in Azure so usage is pretty small. The alternative would be to continue
to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
live migration or similar scenarios.
Michael
^ permalink raw reply
* [PATCH] MAINTAINERS: Hyper-V: Fix typo in a filepath
From: Denis Efremov @ 2019-08-13 6:04 UTC (permalink / raw)
To: linux-kernel; +Cc: Denis Efremov, joe, Lan Tianyu, Sasha Levin, linux-hyperv
In-Reply-To: <20190325212516.26489-1-joe@perches.com>
Fix typo in hyperv-iommu.c filepath.
Cc: Lan Tianyu <Tianyu.Lan@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Signed-off-by: Denis Efremov <efremov@linux.com>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 2764e0872ebd..51ab502485ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7452,7 +7452,7 @@ F: drivers/net/hyperv/
F: drivers/scsi/storvsc_drv.c
F: drivers/uio/uio_hv_generic.c
F: drivers/video/fbdev/hyperv_fb.c
-F: drivers/iommu/hyperv_iommu.c
+F: drivers/iommu/hyperv-iommu.c
F: net/vmw_vsock/hyperv_transport.c
F: include/clocksource/hyperv_timer.h
F: include/linux/hyperv.h
--
2.21.0
^ permalink raw reply related
* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Vitaly Kuznetsov @ 2019-08-13 8:33 UTC (permalink / raw)
To: Michael Kelley, Tianyu Lan
Cc: Peter Zijlstra, Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger kernel org,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, the arch/x86 maintainers, KY Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Daniel Lezcano,
Arnd Bergmann, ashal@kernel.org
In-Reply-To: <DM5PR21MB0137E03AAD8C2EA61EC81ED7D7D30@DM5PR21MB0137.namprd21.prod.outlook.com>
Michael Kelley <mikelley@microsoft.com> writes:
> From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>>
>> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> >
>> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> > >> lantianyu1986@gmail.com writes:
>> > >>
>> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> > >> >
>> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which
>> > >> > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
>> > >> > to set the sched clock function appropriately. On x86, this sets
>> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > >> > scaled and adjusted to be continuous.
>> > >>
>> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> > >> I'm afraid.
>> > >>
>> > >> On the other hand, what we have now is probably worse: TSC can,
>> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> > >> requirements for sched_clock().
>> > >
>> > > That (obviously) also breaks the requirements for using TSC as
>> > > clocksource.
>> > >
>> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >
>> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> > instead. The problem is that 'TSC page' can be disabled by the
>> > hypervisor and in that case the only remaining clocksource is MSR-based
>> > (slow).
>> >
>>
>> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> kernel uses MSR based
>> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> take this into
>> account and determine which clocksource should be exposed or not.
>>
>
> We've confirmed with the Hyper-V team that the TSC page is always available
> on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> hardware presents an InvariantTSC.
Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).
> But the Linux Kconfig's are set up so
> the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> reads. For 32-bit, this set of changes will add more overhead because the
> sched clock reads will now be MSR reads.
>
> I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> supported in Azure so usage is pretty small. The alternative would be to continue
> to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> live migration or similar scenarios.
The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?
--
Vitaly
^ permalink raw reply
* [PATCH] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
From: Wei Hu @ 2019-08-13 9:54 UTC (permalink / raw)
To: b.zolnierkie@samsung.com, linux-hyperv@vger.kernel.org,
dri-devel@lists.freedesktop.org, linux-fbdev@vger.kernel.org,
linux-kernel@vger.kernel.org, sashal@kernel.org,
Stephen Hemminger, Haiyang Zhang, KY Srinivasan, Dexuan Cui,
Iouri Tarassov, Michael Kelley, Wei Hu
Beginning from Windows 10 RS5+, VM screen resolution is obtained from host.
The "video=hyperv_fb" boot time option is not needed, but still can be
used to overwrite the VM resolution. The VM resolution on the host could be
set by executing the powershell "set-vmvideo" command.
Signed-off-by: Iouri Tarassov <iourit@microsoft.com>
Signed-off-by: Wei Hu <weh@microsoft.com>
---
drivers/video/fbdev/hyperv_fb.c | 136 +++++++++++++++++++++++++++++---
1 file changed, 125 insertions(+), 11 deletions(-)
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index 00f5bdcc6c6f..1042f3311fa2 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -23,6 +23,14 @@
*
* Portrait orientation is also supported:
* For example: video=hyperv_fb:864x1152
+ *
+ * When a Windows 10 RS5+ host is used, the virtual machine screen
+ * resolution is obtained from the host. The "video=hyperv_fb" option is
+ * not needed, but still can be used to overwrite the VM resolution. The
+ * VM resolution on the host could be set by executing the powershell
+ * "set-vmvideo" command. For example
+ * set-vmvideo -vmname name -horizontalresolution:1920 \
+ * -verticalresolution:1200 -resolutiontype single
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -44,6 +52,7 @@
#define SYNTHVID_VERSION(major, minor) ((minor) << 16 | (major))
#define SYNTHVID_VERSION_WIN7 SYNTHVID_VERSION(3, 0)
#define SYNTHVID_VERSION_WIN8 SYNTHVID_VERSION(3, 2)
+#define SYNTHVID_VERSION_WIN10 SYNTHVID_VERSION(3, 5)
#define SYNTHVID_DEPTH_WIN7 16
#define SYNTHVID_DEPTH_WIN8 32
@@ -82,16 +91,25 @@ enum synthvid_msg_type {
SYNTHVID_POINTER_SHAPE = 8,
SYNTHVID_FEATURE_CHANGE = 9,
SYNTHVID_DIRT = 10,
+ SYNTHVID_RESOLUTION_REQUEST = 13,
+ SYNTHVID_RESOLUTION_RESPONSE = 14,
- SYNTHVID_MAX = 11
+ SYNTHVID_MAX = 15
};
+#define SYNTHVID_EDID_BLOCK_SIZE 128
+#define SYNTHVID_MAX_RESOLUTION_COUNT 64
+
+struct hvd_screen_info {
+ u16 width;
+ u16 height;
+} __packed;
+
struct synthvid_msg_hdr {
u32 type;
u32 size; /* size of this header + payload after this field*/
} __packed;
-
struct synthvid_version_req {
u32 version;
} __packed;
@@ -102,6 +120,18 @@ struct synthvid_version_resp {
u8 max_video_outputs;
} __packed;
+struct synthvid_supported_resolution_req {
+ u8 maximum_resolution_count;
+} __packed;
+
+struct synthvid_supported_resolution_resp {
+ u8 edid_block[SYNTHVID_EDID_BLOCK_SIZE];
+ u8 resolution_count;
+ u8 default_resolution_index;
+ u8 is_standard;
+ struct hvd_screen_info supported_resolution[1];
+} __packed;
+
struct synthvid_vram_location {
u64 user_ctx;
u8 is_vram_gpa_specified;
@@ -187,6 +217,8 @@ struct synthvid_msg {
struct synthvid_pointer_shape ptr_shape;
struct synthvid_feature_change feature_chg;
struct synthvid_dirt dirt;
+ struct synthvid_supported_resolution_req resolution_req;
+ struct synthvid_supported_resolution_resp resolution_resp;
};
} __packed;
@@ -224,6 +256,8 @@ struct hvfb_par {
static uint screen_width = HVFB_WIDTH;
static uint screen_height = HVFB_HEIGHT;
+static uint screen_width_max = HVFB_WIDTH;
+static uint screen_height_max = HVFB_HEIGHT;
static uint screen_depth;
static uint screen_fb_size;
@@ -354,6 +388,7 @@ static void synthvid_recv_sub(struct hv_device *hdev)
/* Complete the wait event */
if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
+ msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
memcpy(par->init_buf, msg, MAX_VMBUS_PKT_SIZE);
complete(&par->wait);
@@ -428,6 +463,64 @@ static int synthvid_negotiate_ver(struct hv_device *hdev, u32 ver)
}
par->synthvid_version = ver;
+ pr_info("Synthvid Version major %d, minor %d\n",
+ ver & 0x0000ffff, (ver & 0xffff0000) >> 16);
+
+out:
+ return ret;
+}
+
+/* Get current resolution from the host */
+static int synthvid_get_supported_resolution(struct hv_device *hdev)
+{
+ struct fb_info *info = hv_get_drvdata(hdev);
+ struct hvfb_par *par = info->par;
+ struct synthvid_msg *msg = (struct synthvid_msg *)par->init_buf;
+ int ret = 0;
+ unsigned long t;
+ u8 index;
+ int i;
+
+ memset(msg, 0, sizeof(struct synthvid_msg));
+ msg->vid_hdr.type = SYNTHVID_RESOLUTION_REQUEST;
+ msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
+ sizeof(struct synthvid_supported_resolution_req);
+
+ msg->resolution_req.maximum_resolution_count =
+ SYNTHVID_MAX_RESOLUTION_COUNT;
+ synthvid_send(hdev, msg);
+
+ t = wait_for_completion_timeout(&par->wait, VSP_TIMEOUT);
+ if (!t) {
+ pr_err("Time out on waiting resolution response\n");
+ ret = -ETIMEDOUT;
+ goto out;
+ }
+
+ if (msg->resolution_resp.resolution_count == 0) {
+ pr_err("No supported resolutions\n");
+ ret = -ENODEV;
+ goto out;
+ }
+
+ index = msg->resolution_resp.default_resolution_index;
+ if (index >= msg->resolution_resp.resolution_count) {
+ pr_err("Invalid resolution index: %d\n", index);
+ ret = -ENODEV;
+ goto out;
+ }
+
+ for (i = 0; i < msg->resolution_resp.resolution_count; i++) {
+ screen_width_max = max_t(unsigned int, screen_width_max,
+ msg->resolution_resp.supported_resolution[i].width);
+ screen_height_max = max_t(unsigned int, screen_height_max,
+ msg->resolution_resp.supported_resolution[i].height);
+ }
+
+ screen_width =
+ msg->resolution_resp.supported_resolution[index].width;
+ screen_height =
+ msg->resolution_resp.supported_resolution[index].height;
out:
return ret;
@@ -448,11 +541,21 @@ static int synthvid_connect_vsp(struct hv_device *hdev)
}
/* Negotiate the protocol version with host */
- if (vmbus_proto_version == VERSION_WS2008 ||
- vmbus_proto_version == VERSION_WIN7)
+ switch (vmbus_proto_version) {
+ case VERSION_WS2008:
+ case VERSION_WIN7:
ret = synthvid_negotiate_ver(hdev, SYNTHVID_VERSION_WIN7);
- else
+ break;
+ case VERSION_WIN8:
+ case VERSION_WIN8_1:
ret = synthvid_negotiate_ver(hdev, SYNTHVID_VERSION_WIN8);
+ break;
+ case VERSION_WIN10:
+ case VERSION_WIN10_V5:
+ default:
+ ret = synthvid_negotiate_ver(hdev, SYNTHVID_VERSION_WIN10);
+ break;
+ }
if (ret) {
pr_err("Synthetic video device version not accepted\n");
@@ -464,6 +567,12 @@ static int synthvid_connect_vsp(struct hv_device *hdev)
else
screen_depth = SYNTHVID_DEPTH_WIN8;
+ if (par->synthvid_version >= SYNTHVID_VERSION_WIN10) {
+ ret = synthvid_get_supported_resolution(hdev);
+ if (ret)
+ pr_info("Failed to get supported resolution from host, use default\n");
+ }
+
screen_fb_size = hdev->channel->offermsg.offer.
mmio_megabytes * 1024 * 1024;
@@ -653,6 +762,8 @@ static void hvfb_get_option(struct fb_info *info)
}
if (x < HVFB_WIDTH_MIN || y < HVFB_HEIGHT_MIN ||
+ (par->synthvid_version >= SYNTHVID_VERSION_WIN10 &&
+ (x > screen_width_max || y > screen_height_max)) ||
(par->synthvid_version == SYNTHVID_VERSION_WIN8 &&
x * y * screen_depth / 8 > SYNTHVID_FB_SIZE_WIN8) ||
(par->synthvid_version == SYNTHVID_VERSION_WIN7 &&
@@ -689,8 +800,12 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
}
if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM) ||
- pci_resource_len(pdev, 0) < screen_fb_size)
+ pci_resource_len(pdev, 0) < screen_fb_size) {
+ pr_err("Resource not available or (0x%lx < 0x%lx)\n",
+ (unsigned long) pci_resource_len(pdev, 0),
+ (unsigned long) screen_fb_size);
goto err1;
+ }
pot_end = pci_resource_end(pdev, 0);
pot_start = pot_end - screen_fb_size + 1;
@@ -781,17 +896,16 @@ static int hvfb_probe(struct hv_device *hdev,
goto error1;
}
+ hvfb_get_option(info);
+ pr_info("Screen resolution: %dx%d, Color depth: %d\n",
+ screen_width, screen_height, screen_depth);
+
ret = hvfb_getmem(hdev, info);
if (ret) {
pr_err("No memory for framebuffer\n");
goto error2;
}
- hvfb_get_option(info);
- pr_info("Screen resolution: %dx%d, Color depth: %d\n",
- screen_width, screen_height, screen_depth);
-
-
/* Set up fb_info */
info->flags = FBINFO_DEFAULT;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Lorenzo Pieralisi @ 2019-08-13 10:14 UTC (permalink / raw)
To: Haiyang Zhang
Cc: sashal@kernel.org, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
linux-kernel@vger.kernel.org
In-Reply-To: <1565634013-19404-1-git-send-email-haiyangz@microsoft.com>
On Mon, Aug 12, 2019 at 06:20:53PM +0000, Haiyang Zhang wrote:
> Currently in Azure cloud, for passthrough devices including GPU, the host
> sets the device instance ID's bytes 8 - 15 to a value derived from the host
> HWID, which is the same on all devices in a VM. So, the device instance
> ID's bytes 8 and 9 provided by the host are no longer unique. This can
> cause device passthrough to VMs to fail because the bytes 8 and 9 are used
> as PCI domain number. Collision of domain numbers will cause the second
> device with the same domain number fail to load.
>
> As recommended by Azure host team, the bytes 4, 5 have more uniqueness
> (info entropy) than bytes 8, 9. So now we use bytes 4, 5 as the PCI domain
> numbers. On older hosts, bytes 4, 5 can also be used -- no backward
> compatibility issues here. The chance of collision is greatly reduced. In
> the rare cases of collision, we will detect and find another number that is
> not in use.
I have not explained what I meant correctly. This patch fixes an
issue and the "find another number" fallback can be also applied
to the current kernel without changing the bytes you use for
domain numbers.
This patch would leave old kernels susceptible to breakage.
Again, I have no Azure knowledge but it seems better to me to
add a fallback "find another number" allocation on top of mainline
and send it to stable kernels. Then we can add another patch to
change the bytes you use to reduce the number of collision.
Please let me know what you think, thanks.
Lorenzo
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Acked-by: Sasha Levin <sashal@kernel.org>
> ---
> drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
> 1 file changed, 79 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 40b6254..4f3d97e 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
> complete(&hbus->remove_event);
> }
>
> +#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> +
> +/*
> + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
> + * as invalid for passthrough PCI devices of this driver.
> + */
> +#define HVPCI_DOM_INVALID 0
> +
> +/**
> + * hv_get_dom_num() - Get a valid PCI domain number
> + * Check if the PCI domain number is in use, and return another number if
> + * it is in use.
> + *
> + * @dom: Requested domain number
> + *
> + * return: domain number on success, HVPCI_DOM_INVALID on failure
> + */
> +static u16 hv_get_dom_num(u16 dom)
> +{
> + unsigned int i;
> +
> + if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> + return dom;
> +
> + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> + if (test_and_set_bit(i, hvpci_dom_map) == 0)
> + return i;
> + }
> +
> + return HVPCI_DOM_INVALID;
> +}
> +
> +/**
> + * hv_put_dom_num() - Mark the PCI domain number as free
> + * @dom: Domain number to be freed
> + */
> +static void hv_put_dom_num(u16 dom)
> +{
> + clear_bit(dom, hvpci_dom_map);
> +}
> +
> /**
> * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
> * @hdev: VMBus's tracking struct for this root PCI bus
> @@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> const struct hv_vmbus_device_id *dev_id)
> {
> struct hv_pcibus_device *hbus;
> + u16 dom_req, dom;
> int ret;
>
> /*
> @@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
> hbus->state = hv_pcibus_init;
>
> /*
> - * The PCI bus "domain" is what is called "segment" in ACPI and
> - * other specs. Pull it from the instance ID, to get something
> - * unique. Bytes 8 and 9 are what is used in Windows guests, so
> - * do the same thing for consistency. Note that, since this code
> - * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> - * that (1) the only domain in use for something that looks like
> - * a physical PCI bus (which is actually emulated by the
> - * hypervisor) is domain 0 and (2) there will be no overlap
> - * between domains derived from these instance IDs in the same
> - * VM.
> + * The PCI bus "domain" is what is called "segment" in ACPI and other
> + * specs. Pull it from the instance ID, to get something usually
> + * unique. In rare cases of collision, we will find out another number
> + * not in use.
> + *
> + * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> + * together with this guest driver can guarantee that (1) The only
> + * domain used by Gen1 VMs for something that looks like a physical
> + * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> + * (2) There will be no overlap between domains (after fixing possible
> + * collisions) in the same VM.
> */
> - hbus->sysdata.domain = hdev->dev_instance.b[9] |
> - hdev->dev_instance.b[8] << 8;
> + dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> + dom = hv_get_dom_num(dom_req);
> +
> + if (dom == HVPCI_DOM_INVALID) {
> + dev_err(&hdev->device,
> + "Unable to use dom# 0x%hx or other numbers", dom_req);
> + ret = -EINVAL;
> + goto free_bus;
> + }
> +
> + if (dom != dom_req)
> + dev_info(&hdev->device,
> + "PCI dom# 0x%hx has collision, using 0x%hx",
> + dom_req, dom);
> +
> + hbus->sysdata.domain = dom;
>
> hbus->hdev = hdev;
> refcount_set(&hbus->remove_lock, 1);
> @@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> hbus->sysdata.domain);
> if (!hbus->wq) {
> ret = -ENOMEM;
> - goto free_bus;
> + goto free_dom;
> }
>
> ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> @@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
> vmbus_close(hdev->channel);
> destroy_wq:
> destroy_workqueue(hbus->wq);
> +free_dom:
> + hv_put_dom_num(hbus->sysdata.domain);
> free_bus:
> free_page((unsigned long)hbus);
> return ret;
> @@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
> put_hvpcibus(hbus);
> wait_for_completion(&hbus->remove_event);
> destroy_workqueue(hbus->wq);
> +
> + hv_put_dom_num(hbus->sysdata.domain);
> +
> free_page((unsigned long)hbus);
> return 0;
> }
> @@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
>
> static int __init init_hv_pci_drv(void)
> {
> + /* Set the invalid domain number's bit, so it will not be used */
> + set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> +
> return vmbus_driver_register(&hv_pci_drv);
> }
>
> --
> 1.8.3.1
>
^ permalink raw reply
* Re: [PATCH] PCI: hv: Fix build error without CONFIG_SYSFS
From: Lorenzo Pieralisi @ 2019-08-13 10:18 UTC (permalink / raw)
To: Yuehaibing
Cc: Michael Kelley, bhelgaas@google.com, Stephen Hemminger,
sashal@kernel.org, Dexuan Cui, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <b049b0f9-e31e-897a-6f2e-e30d6d865f24@huawei.com>
On Sat, Jun 15, 2019 at 02:48:24PM +0800, Yuehaibing wrote:
> +cc Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Can we drop this patch and merge:
https://patchwork.ozlabs.org/patch/1131444/
instead ?
Thanks,
Lorenzo
>
> On 2019/6/15 14:18, Yuehaibing wrote:
> >
> > On 2019/6/2 6:59, Michael Kelley wrote:
> >> From: YueHaibing <yuehaibing@huawei.com> Sent: Friday, May 31, 2019 8:09 AM
> >>>
> >>> while building without CONFIG_SYSFS, fails as below:
> >>>
> >>> drivers/pci/controller/pci-hyperv.o: In function 'hv_pci_assign_slots':
> >>> pci-hyperv.c:(.text+0x40a): undefined reference to 'pci_create_slot'
> >>> drivers/pci/controller/pci-hyperv.o: In function 'pci_devices_present_work':
> >>> pci-hyperv.c:(.text+0xc02): undefined reference to 'pci_destroy_slot'
> >>> drivers/pci/controller/pci-hyperv.o: In function 'hv_pci_remove':
> >>> pci-hyperv.c:(.text+0xe50): undefined reference to 'pci_destroy_slot'
> >>> drivers/pci/controller/pci-hyperv.o: In function 'hv_eject_device_work':
> >>> pci-hyperv.c:(.text+0x11f9): undefined reference to 'pci_destroy_slot'
> >>>
> >>> Select SYSFS while PCI_HYPERV is set to fix this.
> >>>
> >>
> >> I'm wondering if is the right way to fix the problem. Conceptually
> >> is it possible to setup & operate virtual PCI devices like
> >> pci-hyperv.c does, even if sysfs is not present? Or is it right to
> >> always required sysfs?
> >>
> >> The function pci_dev_assign_slot() in slot.c has a null implementation
> >> in include/linux/pci.h when CONFIG_SYSFS is not defined, which
> >> seems to be trying to solve the same problem for that function. And
> >> if CONFIG_HOTPLUG_PCI is defined but CONFIG_SYSFS is not,
> >> pci_hp_create_module_link() and pci_hp_remove_module_link()
> >> look like they would have the same problem. Maybe there should
> >> be degenerate implementations of pci_create_slot() and
> >> pci_destroy_slot() for cases when CONFIG_SYSFS is not defined?
> >>
> >> But I'll admit I don't know the full story behind how PCI slots
> >> are represented and used, so maybe I'm off base. I just noticed
> >> the inconsistency in how other functions in slot.c are handled.
> >>
> >> Thoughts?
> >
> > 268a03a42d33 ("PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS")
> >
> > make slot.o depends CONFIG_SYSFS
> >
> > commit 268a03a42d3377d5fb41e6e7cbdec4e0b65cab2e
> > Author: Alex Chiang <achiang@hp.com>
> > Date: Wed Jun 17 19:03:57 2009 -0600
> >
> > PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS
> >
> > There is no way to interact with a physical PCI slot without
> > sysfs, so encode the dependency and prevent this build error:
> >
> > drivers/pci/slot.c: In function 'pci_hp_create_module_link':
> > drivers/pci/slot.c:327: error: 'module_kset' undeclared
> >
> > This patch _should_ make pci-sysfs.o depend on CONFIG_SYSFS too,
> > but we cannot (yet) because the PCI core merrily assumes the
> > existence of sysfs:
> >
> > drivers/built-in.o: In function `pci_bus_add_device':
> > drivers/pci/bus.c:89: undefined reference to `pci_create_sysfs_dev_files'
> > drivers/built-in.o: In function `pci_stop_dev':
> > drivers/pci/remove.c:24: undefined reference to `pci_remove_sysfs_dev_files'
> >
> > So do the minimal bit for now and figure out how to untangle it
> > later.
> >
> > If No CONFIG_SYSFS, slot.o is not build
> >
> >>
> >> Michael
> >>
> >>
> >
> >
> > .
> >
>
^ permalink raw reply
* [PATCH] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Wei Hu @ 2019-08-13 10:36 UTC (permalink / raw)
To: rdunlap@infradead.org, shc_work@mail.ru,
gregkh@linuxfoundation.org, lee.jones@linaro.org,
alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
fthain@telegraphics.com.au, info@metux.net,
linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
KY Srinivasan, Dexuan Cui, Iouri Tarassov, Michael Kelley, Wei Hu
Without deferred IO support, hyperv_fb driver informs the host to refresh
the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
is screen update or not. This patch supports defered IO for screens in
graphic mode and also enables the framme buffer on-demand refresh. The
highest refresh rate is still set at 20Hz.
Due to limitation on Hyper-V host, we keep a shadow copy of frame buffer
in the guest. This means one more copy of the dirty rectangle inside
guest when doing the on-demand refresh. This can be optimized in the
future with help from host. For now the host performance gain from deferred
IO outweighs the shadow copy impact in the guest.
Signed-off-by: Wei Hu <weh@microsoft.com>
---
drivers/video/fbdev/Kconfig | 1 +
drivers/video/fbdev/hyperv_fb.c | 217 +++++++++++++++++++++++++++++---
2 files changed, 198 insertions(+), 20 deletions(-)
diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 1b2f5f31fb6f..e781f89a1824 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -2241,6 +2241,7 @@ config FB_HYPERV
select FB_CFB_FILLRECT
select FB_CFB_COPYAREA
select FB_CFB_IMAGEBLIT
+ select FB_DEFERRED_IO
help
This framebuffer driver supports Microsoft Hyper-V Synthetic Video.
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index 1042f3311fa2..85198a6ea8e7 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -233,6 +233,7 @@ struct synthvid_msg {
#define RING_BUFSIZE (256 * 1024)
#define VSP_TIMEOUT (10 * HZ)
#define HVFB_UPDATE_DELAY (HZ / 20)
+#define HVFB_ONDEMAND_THROTTLE (HZ / 20)
struct hvfb_par {
struct fb_info *info;
@@ -252,6 +253,17 @@ struct hvfb_par {
bool synchronous_fb;
struct notifier_block hvfb_panic_nb;
+
+ /* Memory for deferred IO and frame buffer itself */
+ unsigned char *dio_vp;
+ unsigned char *mmio_vp;
+ unsigned long mmio_pp;
+ spinlock_t docopy_lock; /* Lock to protect memory copy */
+
+ /* Dirty rectangle, protected by delayed_refresh_lock */
+ int x1, y1, x2, y2;
+ bool delayed_refresh;
+ spinlock_t delayed_refresh_lock;
};
static uint screen_width = HVFB_WIDTH;
@@ -260,6 +272,7 @@ static uint screen_width_max = HVFB_WIDTH;
static uint screen_height_max = HVFB_HEIGHT;
static uint screen_depth;
static uint screen_fb_size;
+static uint dio_fb_size; /* FB size for deferred IO */
/* Send message to Hyper-V host */
static inline int synthvid_send(struct hv_device *hdev,
@@ -346,28 +359,88 @@ static int synthvid_send_ptr(struct hv_device *hdev)
}
/* Send updated screen area (dirty rectangle) location to host */
-static int synthvid_update(struct fb_info *info)
+static int
+synthvid_update(struct fb_info *info, int x1, int y1, int x2, int y2)
{
struct hv_device *hdev = device_to_hv_device(info->device);
struct synthvid_msg msg;
memset(&msg, 0, sizeof(struct synthvid_msg));
+ if (x2 == INT_MAX)
+ x2 = info->var.xres;
+ if (y2 == INT_MAX)
+ y2 = info->var.yres;
msg.vid_hdr.type = SYNTHVID_DIRT;
msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
sizeof(struct synthvid_dirt);
msg.dirt.video_output = 0;
msg.dirt.dirt_count = 1;
- msg.dirt.rect[0].x1 = 0;
- msg.dirt.rect[0].y1 = 0;
- msg.dirt.rect[0].x2 = info->var.xres;
- msg.dirt.rect[0].y2 = info->var.yres;
+ msg.dirt.rect[0].x1 = (x1 < 0 || x1 > x2) ? 0 : x1;
+ msg.dirt.rect[0].y1 = (y2 < 0 || y1 > y2) ? 0 : y1;
+ msg.dirt.rect[0].x2 =
+ (x2 < x1 || x2 > info->var.xres) ? info->var.xres : x2;
+ msg.dirt.rect[0].y2 =
+ (y2 < y1 || y2 > info->var.yres) ? info->var.yres : y2;
synthvid_send(hdev, &msg);
return 0;
}
+static void hvfb_docopy(struct hvfb_par *par,
+ unsigned long offset,
+ unsigned long size)
+{
+ if (!par || !par->mmio_vp || !par->dio_vp || !par->fb_ready ||
+ size == 0 || offset >= dio_fb_size)
+ return;
+
+ if (offset + size > dio_fb_size)
+ size = dio_fb_size - offset;
+
+ memcpy(par->mmio_vp + offset, par->dio_vp + offset, size);
+}
+
+/* Deferred IO callback */
+static void synthvid_deferred_io(struct fb_info *p,
+ struct list_head *pagelist)
+{
+ struct hvfb_par *par = p->par;
+ struct page *page;
+ unsigned long start, end;
+ int y1, y2, miny, maxy;
+ unsigned long flags;
+
+ miny = INT_MAX;
+ maxy = 0;
+
+ list_for_each_entry(page, pagelist, lru) {
+ start = page->index << PAGE_SHIFT;
+ end = start + PAGE_SIZE - 1;
+ y1 = start / p->fix.line_length;
+ y2 = end / p->fix.line_length;
+ if (y2 > p->var.yres)
+ y2 = p->var.yres;
+ miny = min_t(int, miny, y1);
+ maxy = max_t(int, maxy, y2);
+
+ /* Copy from dio space to mmio address */
+ if (par->fb_ready) {
+ spin_lock_irqsave(&par->docopy_lock, flags);
+ hvfb_docopy(par, start, PAGE_SIZE);
+ spin_unlock_irqrestore(&par->docopy_lock, flags);
+ }
+ }
+
+ if (par->fb_ready)
+ synthvid_update(p, 0, miny, p->var.xres, maxy);
+}
+
+static struct fb_deferred_io synthvid_defio = {
+ .delay = HZ / 20,
+ .deferred_io = synthvid_deferred_io,
+};
/*
* Actions on received messages from host:
@@ -597,7 +670,7 @@ static int synthvid_send_config(struct hv_device *hdev)
msg->vid_hdr.type = SYNTHVID_VRAM_LOCATION;
msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
sizeof(struct synthvid_vram_location);
- msg->vram.user_ctx = msg->vram.vram_gpa = info->fix.smem_start;
+ msg->vram.user_ctx = msg->vram.vram_gpa = par->mmio_pp;
msg->vram.is_vram_gpa_specified = 1;
synthvid_send(hdev, msg);
@@ -607,7 +680,7 @@ static int synthvid_send_config(struct hv_device *hdev)
ret = -ETIMEDOUT;
goto out;
}
- if (msg->vram_ack.user_ctx != info->fix.smem_start) {
+ if (msg->vram_ack.user_ctx != par->mmio_pp) {
pr_err("Unable to set VRAM location\n");
ret = -ENODEV;
goto out;
@@ -624,19 +697,85 @@ static int synthvid_send_config(struct hv_device *hdev)
/*
* Delayed work callback:
- * It is called at HVFB_UPDATE_DELAY or longer time interval to process
- * screen updates. It is re-scheduled if further update is necessary.
+ * It is scheduled to call whenever update request is received and it has
+ * not been called in last HVFB_ONDEMAND_THROTTLE time interval.
*/
static void hvfb_update_work(struct work_struct *w)
{
struct hvfb_par *par = container_of(w, struct hvfb_par, dwork.work);
struct fb_info *info = par->info;
+ unsigned long flags;
+ int x1, x2, y1, y2;
+ int j;
+
+ x1 = y1 = 0;
+ x2 = y2 = INT_MAX;
+
+ spin_lock_irqsave(&par->delayed_refresh_lock, flags);
+ /* Reset the request flag */
+ par->delayed_refresh = false;
+
+ /* Store the dirty rectangle to local variables */
+ x1 = par->x1;
+ x2 = par->x2;
+ y1 = par->y1;
+ y2 = par->y2;
+
+ /* Clear dirty rectangle */
+ par->x1 = par->y1 = INT_MAX;
+ par->x2 = par->y2 = 0;
+ spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
+
+ if (x1 < 0 || x1 > info->var.xres || x2 < 0 ||
+ x2 > info->var.xres || y1 < 0 || y1 > info->var.yres ||
+ y2 < 0 || y2 > info->var.yres)
+ return;
+
+ /* Copy the dirty rectangle to frame buffer memory */
+ spin_lock_irqsave(&par->docopy_lock, flags);
+ for (j = y1; j <= y2 && x1 < x2; j++) {
+ if (j == info->var.yres)
+ break;
+ hvfb_docopy(par,
+ j * info->fix.line_length +
+ (x1 * screen_depth / 8),
+ (x2 - x1 + 1) * screen_depth / 8);
+ }
+ spin_unlock_irqrestore(&par->docopy_lock, flags);
+
+ /* Refresh */
if (par->fb_ready)
- synthvid_update(info);
+ synthvid_update(info, x1, y1, x2, y2);
+}
+
+/*
+ * Control the on-demand refresh frequency. It schedules a delayed
+ * screen update if it has not yet.
+ */
+static void hvfb_ondemand_refresh_throttle(struct hvfb_par *par,
+ int x1, int y1, int w, int h)
+{
+ unsigned long flags;
+ int x2 = x1 + w;
+ int y2 = y1 + h;
+
+ spin_lock_irqsave(&par->delayed_refresh_lock, flags);
+
+ /* Merge dirty rectangle */
+ par->x1 = min_t(int, par->x1, x1);
+ par->y1 = min_t(int, par->y1, y1);
+ par->x2 = max_t(int, par->x2, x2);
+ par->y2 = max_t(int, par->y2, y2);
+
+ /* Schedule a delayed screen update if not yet */
+ if (par->delayed_refresh == false) {
+ schedule_delayed_work(&par->dwork,
+ HVFB_ONDEMAND_THROTTLE);
+ par->delayed_refresh = true;
+ }
- if (par->update)
- schedule_delayed_work(&par->dwork, HVFB_UPDATE_DELAY);
+ spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
}
static int hvfb_on_panic(struct notifier_block *nb,
@@ -648,7 +787,8 @@ static int hvfb_on_panic(struct notifier_block *nb,
par = container_of(nb, struct hvfb_par, hvfb_panic_nb);
par->synchronous_fb = true;
info = par->info;
- synthvid_update(info);
+ hvfb_docopy(par, 0, dio_fb_size);
+ synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
return NOTIFY_DONE;
}
@@ -709,7 +849,10 @@ static void hvfb_cfb_fillrect(struct fb_info *p,
cfb_fillrect(p, rect);
if (par->synchronous_fb)
- synthvid_update(p);
+ synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+ else
+ hvfb_ondemand_refresh_throttle(par, rect->dx, rect->dy,
+ rect->width, rect->height);
}
static void hvfb_cfb_copyarea(struct fb_info *p,
@@ -719,7 +862,10 @@ static void hvfb_cfb_copyarea(struct fb_info *p,
cfb_copyarea(p, area);
if (par->synchronous_fb)
- synthvid_update(p);
+ synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+ else
+ hvfb_ondemand_refresh_throttle(par, area->dx, area->dy,
+ area->width, area->height);
}
static void hvfb_cfb_imageblit(struct fb_info *p,
@@ -729,7 +875,10 @@ static void hvfb_cfb_imageblit(struct fb_info *p,
cfb_imageblit(p, image);
if (par->synchronous_fb)
- synthvid_update(p);
+ synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+ else
+ hvfb_ondemand_refresh_throttle(par, image->dx, image->dy,
+ image->width, image->height);
}
static struct fb_ops hvfb_ops = {
@@ -788,6 +937,9 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
resource_size_t pot_start, pot_end;
int ret;
+ dio_fb_size =
+ screen_width * screen_height * screen_depth / 8;
+
if (gen2vm) {
pot_start = 0;
pot_end = -1;
@@ -822,9 +974,15 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
if (!fb_virt)
goto err2;
+ /* Allocate memory for deferred IO */
+ par->dio_vp = vzalloc(((dio_fb_size >> PAGE_SHIFT) + 1)
+ << PAGE_SHIFT);
+ if (par->dio_vp == NULL)
+ goto err3;
+
info->apertures = alloc_apertures(1);
if (!info->apertures)
- goto err3;
+ goto err4;
if (gen2vm) {
info->apertures->ranges[0].base = screen_info.lfb_base;
@@ -836,16 +994,23 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
info->apertures->ranges[0].size = pci_resource_len(pdev, 0);
}
+ /* Physical address of FB device */
+ par->mmio_pp = par->mem->start;
+ /* Virtual address of FB device */
+ par->mmio_vp = (unsigned char *) fb_virt;
+
info->fix.smem_start = par->mem->start;
- info->fix.smem_len = screen_fb_size;
- info->screen_base = fb_virt;
- info->screen_size = screen_fb_size;
+ info->fix.smem_len = dio_fb_size;
+ info->screen_base = par->dio_vp;
+ info->screen_size = dio_fb_size;
if (!gen2vm)
pci_dev_put(pdev);
return 0;
+err4:
+ vfree(par->dio_vp);
err3:
iounmap(fb_virt);
err2:
@@ -863,6 +1028,7 @@ static void hvfb_putmem(struct fb_info *info)
{
struct hvfb_par *par = info->par;
+ vfree(par->dio_vp);
iounmap(info->screen_base);
vmbus_free_mmio(par->mem->start, screen_fb_size);
par->mem = NULL;
@@ -888,6 +1054,12 @@ static int hvfb_probe(struct hv_device *hdev,
init_completion(&par->wait);
INIT_DELAYED_WORK(&par->dwork, hvfb_update_work);
+ par->delayed_refresh = false;
+ spin_lock_init(&par->delayed_refresh_lock);
+ spin_lock_init(&par->docopy_lock);
+ par->x1 = par->y1 = INT_MAX;
+ par->x2 = par->y2 = 0;
+
/* Connect to VSP */
hv_set_drvdata(hdev, info);
ret = synthvid_connect_vsp(hdev);
@@ -939,6 +1111,10 @@ static int hvfb_probe(struct hv_device *hdev,
info->fbops = &hvfb_ops;
info->pseudo_palette = par->pseudo_palette;
+ /* Initialize deferred IO */
+ info->fbdefio = &synthvid_defio;
+ fb_deferred_io_init(info);
+
/* Send config to host */
ret = synthvid_send_config(hdev);
if (ret)
@@ -960,6 +1136,7 @@ static int hvfb_probe(struct hv_device *hdev,
return 0;
error:
+ fb_deferred_io_cleanup(info);
hvfb_putmem(info);
error2:
vmbus_close(hdev->channel);
--
2.20.1
^ permalink raw reply related
* RE: [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-13 12:55 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: sashal@kernel.org, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
linux-kernel@vger.kernel.org
In-Reply-To: <20190813101417.GA14977@e121166-lin.cambridge.arm.com>
> -----Original Message-----
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Sent: Tuesday, August 13, 2019 6:14 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; bhelgaas@google.com; linux-
> hyperv@vger.kernel.org; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
>
> On Mon, Aug 12, 2019 at 06:20:53PM +0000, Haiyang Zhang wrote:
> > Currently in Azure cloud, for passthrough devices including GPU, the host
> > sets the device instance ID's bytes 8 - 15 to a value derived from the host
> > HWID, which is the same on all devices in a VM. So, the device instance
> > ID's bytes 8 and 9 provided by the host are no longer unique. This can
> > cause device passthrough to VMs to fail because the bytes 8 and 9 are used
> > as PCI domain number. Collision of domain numbers will cause the second
> > device with the same domain number fail to load.
> >
> > As recommended by Azure host team, the bytes 4, 5 have more uniqueness
> > (info entropy) than bytes 8, 9. So now we use bytes 4, 5 as the PCI domain
> > numbers. On older hosts, bytes 4, 5 can also be used -- no backward
> > compatibility issues here. The chance of collision is greatly reduced. In
> > the rare cases of collision, we will detect and find another number that is
> > not in use.
>
> I have not explained what I meant correctly. This patch fixes an
> issue and the "find another number" fallback can be also applied
> to the current kernel without changing the bytes you use for
> domain numbers.
>
> This patch would leave old kernels susceptible to breakage.
>
> Again, I have no Azure knowledge but it seems better to me to
> add a fallback "find another number" allocation on top of mainline
> and send it to stable kernels. Then we can add another patch to
> change the bytes you use to reduce the number of collision.
>
> Please let me know what you think, thanks.
Thanks for your clarification.
Actually, I hope the stable kernel will be patched to use bytes 4,5 too,
because host provided numbers are persistent across reboots, we like
to use them if possible.
I think we can either --
1) Apply this patch for mainline and stable kernels as well.
2) Or, break this patch into two patches, and apply both of them for
Mainline and stable kernels.
Which way do you prefer?
Thanks,
- Haiyang
^ permalink raw reply
* Re: [PATCH] PCI: hv: Fix build error without CONFIG_SYSFS
From: Yuehaibing @ 2019-08-13 12:43 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: Michael Kelley, bhelgaas@google.com, Stephen Hemminger,
sashal@kernel.org, Dexuan Cui, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <20190813101845.GB14977@e121166-lin.cambridge.arm.com>
On 2019/8/13 18:18, Lorenzo Pieralisi wrote:
> On Sat, Jun 15, 2019 at 02:48:24PM +0800, Yuehaibing wrote:
>> +cc Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>
> Can we drop this patch and merge:
>
> https://patchwork.ozlabs.org/patch/1131444/
>
> instead ?
Ok, it looks good to me, thanks!
>
> Thanks,
> Lorenzo
>
>>
>> On 2019/6/15 14:18, Yuehaibing wrote:
>>>
>>> On 2019/6/2 6:59, Michael Kelley wrote:
>>>> From: YueHaibing <yuehaibing@huawei.com> Sent: Friday, May 31, 2019 8:09 AM
>>>>>
>>>>> while building without CONFIG_SYSFS, fails as below:
>>>>>
>>>>> drivers/pci/controller/pci-hyperv.o: In function 'hv_pci_assign_slots':
>>>>> pci-hyperv.c:(.text+0x40a): undefined reference to 'pci_create_slot'
>>>>> drivers/pci/controller/pci-hyperv.o: In function 'pci_devices_present_work':
>>>>> pci-hyperv.c:(.text+0xc02): undefined reference to 'pci_destroy_slot'
>>>>> drivers/pci/controller/pci-hyperv.o: In function 'hv_pci_remove':
>>>>> pci-hyperv.c:(.text+0xe50): undefined reference to 'pci_destroy_slot'
>>>>> drivers/pci/controller/pci-hyperv.o: In function 'hv_eject_device_work':
>>>>> pci-hyperv.c:(.text+0x11f9): undefined reference to 'pci_destroy_slot'
>>>>>
>>>>> Select SYSFS while PCI_HYPERV is set to fix this.
>>>>>
>>>>
>>>> I'm wondering if is the right way to fix the problem. Conceptually
>>>> is it possible to setup & operate virtual PCI devices like
>>>> pci-hyperv.c does, even if sysfs is not present? Or is it right to
>>>> always required sysfs?
>>>>
>>>> The function pci_dev_assign_slot() in slot.c has a null implementation
>>>> in include/linux/pci.h when CONFIG_SYSFS is not defined, which
>>>> seems to be trying to solve the same problem for that function. And
>>>> if CONFIG_HOTPLUG_PCI is defined but CONFIG_SYSFS is not,
>>>> pci_hp_create_module_link() and pci_hp_remove_module_link()
>>>> look like they would have the same problem. Maybe there should
>>>> be degenerate implementations of pci_create_slot() and
>>>> pci_destroy_slot() for cases when CONFIG_SYSFS is not defined?
>>>>
>>>> But I'll admit I don't know the full story behind how PCI slots
>>>> are represented and used, so maybe I'm off base. I just noticed
>>>> the inconsistency in how other functions in slot.c are handled.
>>>>
>>>> Thoughts?
>>>
>>> 268a03a42d33 ("PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS")
>>>
>>> make slot.o depends CONFIG_SYSFS
>>>
>>> commit 268a03a42d3377d5fb41e6e7cbdec4e0b65cab2e
>>> Author: Alex Chiang <achiang@hp.com>
>>> Date: Wed Jun 17 19:03:57 2009 -0600
>>>
>>> PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS
>>>
>>> There is no way to interact with a physical PCI slot without
>>> sysfs, so encode the dependency and prevent this build error:
>>>
>>> drivers/pci/slot.c: In function 'pci_hp_create_module_link':
>>> drivers/pci/slot.c:327: error: 'module_kset' undeclared
>>>
>>> This patch _should_ make pci-sysfs.o depend on CONFIG_SYSFS too,
>>> but we cannot (yet) because the PCI core merrily assumes the
>>> existence of sysfs:
>>>
>>> drivers/built-in.o: In function `pci_bus_add_device':
>>> drivers/pci/bus.c:89: undefined reference to `pci_create_sysfs_dev_files'
>>> drivers/built-in.o: In function `pci_stop_dev':
>>> drivers/pci/remove.c:24: undefined reference to `pci_remove_sysfs_dev_files'
>>>
>>> So do the minimal bit for now and figure out how to untangle it
>>> later.
>>>
>>> If No CONFIG_SYSFS, slot.o is not build
>>>
>>>>
>>>> Michael
>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>
> .
>
^ permalink raw reply
* Re: [PATCH v1] Tools: hv: move to tools buildsystem
From: Vitaly Kuznetsov @ 2019-08-13 13:56 UTC (permalink / raw)
To: Andy Shevchenko
Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, linux-hyperv,
Sasha Levin
In-Reply-To: <20190812182716.GS30120@smile.fi.intel.com>
Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> On Fri, Jun 28, 2019 at 08:05:42PM +0300, Andy Shevchenko wrote:
>> There is a nice buildsystem dedicated for userspace tools in Linux kernel tree.
>> Switch gpio target to be built by it.
>>
>
> Any comments on this?
>
Minor nit: "gpio target" in the commit message should be replaced with
"Hyper-V daemons".
--
Vitaly
^ permalink raw reply
* Re: [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Lorenzo Pieralisi @ 2019-08-13 14:25 UTC (permalink / raw)
To: Haiyang Zhang
Cc: sashal@kernel.org, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
linux-kernel@vger.kernel.org
In-Reply-To: <DM6PR21MB1337D4F34CAA49BE369FB793CAD20@DM6PR21MB1337.namprd21.prod.outlook.com>
On Tue, Aug 13, 2019 at 12:55:59PM +0000, Haiyang Zhang wrote:
>
>
> > -----Original Message-----
> > From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Sent: Tuesday, August 13, 2019 6:14 AM
> > To: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: sashal@kernel.org; bhelgaas@google.com; linux-
> > hyperv@vger.kernel.org; linux-pci@vger.kernel.org; KY Srinivasan
> > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> > olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> > kernel@vger.kernel.org
> > Subject: Re: [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number
> > collision
> >
> > On Mon, Aug 12, 2019 at 06:20:53PM +0000, Haiyang Zhang wrote:
> > > Currently in Azure cloud, for passthrough devices including GPU, the host
> > > sets the device instance ID's bytes 8 - 15 to a value derived from the host
> > > HWID, which is the same on all devices in a VM. So, the device instance
> > > ID's bytes 8 and 9 provided by the host are no longer unique. This can
> > > cause device passthrough to VMs to fail because the bytes 8 and 9 are used
> > > as PCI domain number. Collision of domain numbers will cause the second
> > > device with the same domain number fail to load.
> > >
> > > As recommended by Azure host team, the bytes 4, 5 have more uniqueness
> > > (info entropy) than bytes 8, 9. So now we use bytes 4, 5 as the PCI domain
> > > numbers. On older hosts, bytes 4, 5 can also be used -- no backward
> > > compatibility issues here. The chance of collision is greatly reduced. In
> > > the rare cases of collision, we will detect and find another number that is
> > > not in use.
> >
> > I have not explained what I meant correctly. This patch fixes an
> > issue and the "find another number" fallback can be also applied
> > to the current kernel without changing the bytes you use for
> > domain numbers.
> >
> > This patch would leave old kernels susceptible to breakage.
> >
> > Again, I have no Azure knowledge but it seems better to me to
> > add a fallback "find another number" allocation on top of mainline
> > and send it to stable kernels. Then we can add another patch to
> > change the bytes you use to reduce the number of collision.
> >
> > Please let me know what you think, thanks.
>
> Thanks for your clarification.
> Actually, I hope the stable kernel will be patched to use bytes 4,5 too,
> because host provided numbers are persistent across reboots, we like
> to use them if possible.
>
> I think we can either --
> 1) Apply this patch for mainline and stable kernels as well.
> 2) Or, break this patch into two patches, and apply both of them for
> Mainline and stable kernels.
(2) since one patch is a fix and the other one an (optional - however
important it is) change.
This way if the optional change needs reverting we still have a working
kernel.
In the end it is up to you - I am just expressing what I think is the
most sensible way forward.
Lorenzo
^ permalink raw reply
* RE: [PATCH] MAINTAINERS: Hyper-V: Fix typo in a filepath
From: Tianyu Lan @ 2019-08-13 14:36 UTC (permalink / raw)
To: Denis Efremov, linux-kernel@vger.kernel.org
Cc: joe@perches.com, Sasha Levin, linux-hyperv@vger.kernel.org
In-Reply-To: <20190813060422.12634-1-efremov@linux.com>
Hi Denis:
Thanks for notice. I has posted a fix patch before. https://lkml.org/lkml/2019/8/13/73
Hi Sashe:
Could you take care the fix patch? Thanks.
-----Original Message-----
From: Denis Efremov <efremov@linux.com>
Sent: Tuesday, August 13, 2019 2:04 PM
To: linux-kernel@vger.kernel.org
Cc: Denis Efremov <efremov@linux.com>; joe@perches.com; Tianyu Lan <Tianyu.Lan@microsoft.com>; Sasha Levin <sashal@kernel.org>; linux-hyperv@vger.kernel.org
Subject: [PATCH] MAINTAINERS: Hyper-V: Fix typo in a filepath
Fix typo in hyperv-iommu.c filepath.
Cc: Lan Tianyu <Tianyu.Lan@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Signed-off-by: Denis Efremov <efremov@linux.com>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 2764e0872ebd..51ab502485ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7452,7 +7452,7 @@ F: drivers/net/hyperv/
F: drivers/scsi/storvsc_drv.c
F: drivers/uio/uio_hv_generic.c
F: drivers/video/fbdev/hyperv_fb.c
-F: drivers/iommu/hyperv_iommu.c
+F: drivers/iommu/hyperv-iommu.c
F: net/vmw_vsock/hyperv_transport.c
F: include/clocksource/hyperv_timer.h
F: include/linux/hyperv.h
--
2.21.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox