* Re: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
[not found] ` <20210203150435.27941-17-wei.liu@kernel.org>
@ 2021-02-04 13:33 ` Joerg Roedel
2021-02-04 17:53 ` Michael Kelley via Virtualization
1 sibling, 0 replies; 11+ messages in thread
From: Joerg Roedel @ 2021-02-04 13:33 UTC (permalink / raw)
To: Wei Liu
Cc: Linux on Hyper-V List, Stephen Hemminger, pasha.tatashin,
Will Deacon, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, Michael Kelley, open list:IOMMU DRIVERS,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
Sunil Muthuswamy, virtualization, Vineeth Pillai, Thomas Gleixner
On Wed, Feb 03, 2021 at 03:04:35PM +0000, Wei Liu wrote:
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <joro@8bytes.org>
> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
> arch/x86/hyperv/irqdomain.c | 25 +++++
> arch/x86/include/asm/mshyperv.h | 4 +
> drivers/iommu/hyperv-iommu.c | 177 +++++++++++++++++++++++++++++++-
> 3 files changed, 203 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 117f17e8c88a..0cabc9aece38 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
> }
>
> #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_unmap_interrupt(device_id.as_uint64, entry);
> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_map_interrupt(device_id, level, cpu, vector, entry);
> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
>
> struct irq_domain *hv_create_pci_msi_domain(void);
>
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +
> #else /* CONFIG_HYPERV */
> static inline void hyperv_init(void) {}
> static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..e285a220c913 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
> #include <asm/io_apic.h>
> #include <asm/irq_remapping.h>
> #include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
>
> #include "irq_remapping.h"
>
> @@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
> .free = hyperv_irq_remapping_free,
> };
>
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
> static int __init hyperv_prepare_irq_remapping(void)
> {
> struct fwnode_handle *fn;
> int i;
> + const char *name;
> + const struct irq_domain_ops *ops;
>
> if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> x86_init.hyper.msi_ext_dest_id() ||
> !x2apic_supported())
> return -ENODEV;
>
> - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> + if (hv_root_partition) {
> + name = "HYPERV-ROOT-IR";
> + ops = &hyperv_root_ir_domain_ops;
> + } else {
> + name = "HYPERV-IR";
> + ops = &hyperv_ir_domain_ops;
> + }
> +
> + fn = irq_domain_alloc_named_id_fwnode(name, 0);
> if (!fn)
> return -ENOMEM;
>
> ioapic_ir_domain =
> irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> - 0, IOAPIC_REMAPPING_ENTRY, fn,
> - &hyperv_ir_domain_ops, NULL);
> + 0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
>
> if (!ioapic_ir_domain) {
> irq_domain_free_fwnode(fn);
> return -ENOMEM;
> }
>
> + if (hv_root_partition)
> + return 0; /* The rest is only relevant to guests */
> +
> /*
> * Hyper-V doesn't provide irq remapping function for
> * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> @@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
> .enable = hyperv_enable_irq_remapping,
> };
>
> +/* IRQ remapping domain when Linux runs as the root partition */
> +struct hyperv_root_ir_data {
> + u8 ioapic_id;
> + bool is_level;
> + struct hv_interrupt_entry entry;
> +};
> +
> +static void
> +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> +{
> + u64 status;
> + u32 vector;
> + struct irq_cfg *cfg;
> + int ioapic_id;
> + struct cpumask *affinity;
> + int cpu;
> + struct hv_interrupt_entry entry;
> + struct hyperv_root_ir_data *data = irq_data->chip_data;
> + struct IO_APIC_route_entry e;
> +
> + cfg = irqd_cfg(irq_data);
> + affinity = irq_data_get_effective_affinity_mask(irq_data);
> + cpu = cpumask_first_and(affinity, cpu_online_mask);
> +
> + vector = cfg->vector;
> + ioapic_id = data->ioapic_id;
> +
> + if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
> + && data->entry.ioapic_rte.as_uint64) {
> + entry = data->entry;
> +
> + status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> +
> + if (status != HV_STATUS_SUCCESS)
> + pr_debug("%s: unexpected unmap status %lld\n", __func__, status);
> +
> + data->entry.ioapic_rte.as_uint64 = 0;
> + data->entry.source = 0; /* Invalid source */
> + }
> +
> +
> + status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, cpu,
> + vector, &entry);
> +
> + if (status != HV_STATUS_SUCCESS) {
> + pr_err("%s: map hypercall failed, status %lld\n", __func__, status);
> + return;
> + }
> +
> + data->entry = entry;
> +
> + /* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> + e.w1 = entry.ioapic_rte.low_uint32;
> + e.w2 = entry.ioapic_rte.high_uint32;
> +
> + memset(msg, 0, sizeof(*msg));
> + msg->arch_data.vector = e.vector;
> + msg->arch_data.delivery_mode = e.delivery_mode;
> + msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> + msg->arch_addr_lo.dmar_format = e.ir_format;
> + msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> +}
> +
> +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> + const struct cpumask *mask, bool force)
> +{
> + struct irq_data *parent = data->parent_data;
> + struct irq_cfg *cfg = irqd_cfg(data);
> + int ret;
> +
> + ret = parent->chip->irq_set_affinity(parent, mask, force);
> + if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> + return ret;
> +
> + send_cleanup_vector(cfg);
> +
> + return 0;
> +}
> +
> +static struct irq_chip hyperv_root_ir_chip = {
> + .name = "HYPERV-ROOT-IR",
> + .irq_ack = apic_ack_irq,
> + .irq_set_affinity = hyperv_root_ir_set_affinity,
> + .irq_compose_msi_msg = hyperv_root_ir_compose_msi_msg,
> +};
> +
> +static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs,
> + void *arg)
> +{
> + struct irq_alloc_info *info = arg;
> + struct irq_data *irq_data;
> + struct hyperv_root_ir_data *data;
> + int ret = 0;
> +
> + if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> + return -EINVAL;
> +
> + ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> + if (ret < 0)
> + return ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> + return -ENOMEM;
> + }
> +
> + irq_data = irq_domain_get_irq_data(domain, virq);
> + if (!irq_data) {
> + kfree(data);
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> + return -EINVAL;
> + }
> +
> + data->ioapic_id = info->devid;
> + data->is_level = info->ioapic.is_level;
> +
> + irq_data->chip = &hyperv_root_ir_chip;
> + irq_data->chip_data = data;
> +
> + return 0;
> +}
> +
> +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs)
> +{
> + struct irq_data *irq_data;
> + struct hyperv_root_ir_data *data;
> + struct hv_interrupt_entry *e;
> + int i;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_data = irq_domain_get_irq_data(domain, virq + i);
> +
> + if (irq_data && irq_data->chip_data) {
> + data = irq_data->chip_data;
> + e = &data->entry;
> +
> + if (e->source == HV_DEVICE_TYPE_IOAPIC
> + && e->ioapic_rte.as_uint64)
> + hv_unmap_ioapic_interrupt(data->ioapic_id,
> + &data->entry);
> +
> + kfree(data);
> + }
> + }
> +
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +}
> +
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> + .select = hyperv_irq_remapping_select,
> + .alloc = hyperv_root_irq_remapping_alloc,
> + .free = hyperv_root_irq_remapping_free,
> +};
> +
> #endif
> --
> 2.20.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition
[not found] ` <20210203150435.27941-3-wei.liu@kernel.org>
@ 2021-02-04 16:49 ` Michael Kelley via Virtualization
0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 16:49 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Stephen Hemminger, pasha.tatashin@soleen.com, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
Sunil Muthuswamy, Vineeth Pillai, Thomas Gleixner
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:04 AM
>
> For now we can use the privilege flag to check. Stash the value to be
> used later.
>
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
> arch/x86/include/asm/mshyperv.h | 2 ++
> arch/x86/kernel/cpu/mshyperv.c | 20 ++++++++++++++++++++
> 3 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
> #define HYPERV_CPUID_FEATURES 0x40000003
> #define HYPERV_CPUID_ENLIGHTMENT_INFO 0x40000004
> #define HYPERV_CPUID_IMPLEMENT_LIMITS 0x40000005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x40000007
> #define HYPERV_CPUID_NESTED_FEATURES 0x4000000A
>
> #define HYPERV_CPUID_VIRT_STACK_INTERFACE 0x40000081
> @@ -110,6 +111,15 @@
> /* Recommend using enlightened VMCS */
> #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
>
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNC BIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT BIT(31)
> +
> /*
> * Virtual processor will never share a physical core with another virtual
> * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
> struct hv_guest_mapping_flush_list *flush,
> u64 start_gfn, u64 end_gfn);
>
> +extern bool hv_root_partition;
> +
> #ifdef CONFIG_X86_64
> void hv_apic_init(void);
> void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
> #include <asm/nmi.h>
> #include <clocksource/hyperv_timer.h>
>
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
> struct ms_hyperv_info ms_hyperv;
> EXPORT_SYMBOL_GPL(ms_hyperv);
>
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
> pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
> ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
>
> + /*
> + * Check CPU management privilege.
> + *
> + * To mirror what Windows does we should extract CPU management
> + * features and use the ReservedIdentityBit to detect if Linux is the
> + * root partition. But that requires negotiating CPU management
> + * interface (a process to be finalized).
> + *
> + * For now, use the privilege flag as the indicator for running as
> + * root.
> + */
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> + hv_root_partition = true;
> + pr_info("Hyper-V: running as root partition\n");
> + }
> +
> /*
> * Extract host information.
> */
> --
> 2.20.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 05/16] x86/hyperv: allocate output arg pages if required
[not found] ` <20210203150435.27941-6-wei.liu@kernel.org>
@ 2021-02-04 16:52 ` Michael Kelley via Virtualization
0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 16:52 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Stephen Hemminger, pasha.tatashin@soleen.com,
Lillian Grassin-Drake,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Nuno Das Neves,
Borislav Petkov, Sunil Muthuswamy, Vineeth Pillai, Haiyang Zhang
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:04 AM
>
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
> arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++-----
> arch/x86/include/asm/mshyperv.h | 1 +
> 2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> void __percpu **hyperv_pcpu_input_arg;
> EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>
> +void __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
> u32 hv_max_vp_index;
> EXPORT_SYMBOL_GPL(hv_max_vp_index);
>
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
> void **input_arg;
> struct page *pg;
>
> - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
> if (unlikely(!pg))
> return -ENOMEM;
> +
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> *input_arg = page_address(pg);
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(pg + 1);
> + }
>
> hv_get_vp_index(msr_vp_index);
>
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
> unsigned int new_cpu;
> unsigned long flags;
> void **input_arg;
> - void *input_pg = NULL;
> + void *pg;
>
> local_irq_save(flags);
> input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - input_pg = *input_arg;
> + pg = *input_arg;
> *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = NULL;
> + }
> +
> local_irq_restore(flags);
> - free_page((unsigned long)input_pg);
> +
> + free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
>
> if (hv_vp_assist_page && hv_vp_assist_page[cpu])
> wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
>
> BUG_ON(hyperv_pcpu_input_arg == NULL);
>
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void *);
> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
> /* Allocate percpu VP index */
> hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
> #if IS_ENABLED(CONFIG_HYPERV)
> extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> +extern void __percpu **hyperv_pcpu_output_arg;
>
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> --
> 2.20.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 06/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
[not found] ` <20210203150435.27941-7-wei.liu@kernel.org>
@ 2021-02-04 16:54 ` Michael Kelley via Virtualization
0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 16:54 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: open list:GENERIC INCLUDE/ASM HEADER FILES, Stephen Hemminger,
pasha.tatashin@soleen.com, Arnd Bergmann, Lillian Grassin-Drake,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Nuno Das Neves,
Borislav Petkov, Sunil Muthuswamy, Vineeth Pillai, Haiyang Zhang
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:04 AM
>
> We will need the partition ID for executing some hypercalls later.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v6:
> 1. Use u64 status.
>
> v3:
> 1. Make hv_get_partition_id static.
> 2. Change code structure a bit.
> ---
> arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 2 ++
> include/asm-generic/hyperv-tlfs.h | 6 ++++++
> 3 files changed, 34 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6f4cb40e53fe..5b90a7290177 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -26,6 +26,9 @@
> #include <linux/syscore_ops.h>
> #include <clocksource/hyperv_timer.h>
>
> +u64 hv_current_partition_id = ~0ull;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
> void *hv_hypercall_pg;
> EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>
> @@ -331,6 +334,24 @@ static struct syscore_ops hv_syscore_ops = {
> .resume = hv_resume,
> };
>
> +static void __init hv_get_partition_id(void)
> +{
> + struct hv_get_partition_id *output_page;
> + u64 status;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
> + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
> + /* No point in proceeding if this failed */
> + pr_err("Failed to get partition ID: %lld\n", status);
> + BUG();
> + }
> + hv_current_partition_id = output_page->partition_id;
> + local_irq_restore(flags);
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -426,6 +447,11 @@ void __init hyperv_init(void)
>
> register_syscore_ops(&hv_syscore_ops);
>
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> + hv_get_partition_id();
> +
> + BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> +
> return;
>
> remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 62d9390f1ddf..67f5d35a73d3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> extern void __percpu **hyperv_pcpu_output_arg;
>
> +extern u64 hv_current_partition_id;
> +
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> +#define HVCALL_GET_PARTITION_ID 0x0046
> #define HVCALL_GET_VP_REGISTERS 0x0050
> #define HVCALL_SET_VP_REGISTERS 0x0051
> #define HVCALL_POST_MESSAGE 0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
> u64 gva_list[];
> } __packed;
>
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> + u64 partition_id;
> +} __packed;
> +
> /* HvRetargetDeviceInterrupt hypercall */
> union hv_msi_entry {
> u64 as_uint64;
> --
> 2.20.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
[not found] ` <20210203150435.27941-9-wei.liu@kernel.org>
@ 2021-02-04 16:56 ` Michael Kelley via Virtualization
[not found] ` <20210204183841.y4fgwjuggtbrnere@liuwe-devbox-debian-v2>
1 sibling, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 16:56 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: pasha.tatashin@soleen.com,
open list:ACPI COMPONENT ARCHITECTURE (ACPICA), Kaneda, Erik,
Rafael J. Wysocki, Linux Kernel List,
virtualization@lists.linux-foundation.org, robert.moore,
Nuno Das Neves, Sunil Muthuswamy,
open list:ACPI COMPONENT ARCHITECTURE (ACPICA), Vineeth Pillai,
Len Brown
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:04 AM
>
> There is already a stub function for pxm_to_node but conversion to the
> other direction is missing.
>
> It will be used by Microsoft Hypervisor code later.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v6: new
> ---
> include/acpi/acpi_numa.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> index a4c6ef809e27..40a91ce87e04 100644
> --- a/include/acpi/acpi_numa.h
> +++ b/include/acpi/acpi_numa.h
> @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
> {
> return 0;
> }
> +static inline int node_to_pxm(int node)
> +{
> + return 0;
> +}
> #endif /* CONFIG_ACPI_NUMA */
>
> #ifdef CONFIG_ACPI_HMAT
> --
> 2.20.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions
[not found] ` <20210203150435.27941-10-wei.liu@kernel.org>
@ 2021-02-04 17:13 ` Michael Kelley via Virtualization
0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 17:13 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: open list:GENERIC INCLUDE/ASM HEADER FILES, Stephen Hemminger,
pasha.tatashin@soleen.com, Arnd Bergmann, Lillian Grassin-Drake,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Nuno Das Neves,
Borislav Petkov, Sunil Muthuswamy, Vineeth Pillai, Haiyang Zhang
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:04 AM
>
> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v6:
> 1. Address Michael's comments.
>
> v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.
>
> v3:
> 1. Add __packed to structures.
> 2. Drop unnecessary exports.
>
> v2:
> 1. Adapt to hypervisor side changes
> 2. Address Vitaly's comments
>
> use u64 status
>
> pages
>
> major comments
>
> minor comments
>
> rely on acpi code
> ---
> arch/x86/hyperv/Makefile | 2 +-
> arch/x86/hyperv/hv_proc.c | 219 ++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 4 +
> include/asm-generic/hyperv-tlfs.h | 67 +++++++++
> 4 files changed, 291 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/hyperv/hv_proc.c
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
[not found] ` <20210203150435.27941-14-wei.liu@kernel.org>
@ 2021-02-04 17:15 ` Michael Kelley via Virtualization
0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 17:15 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: open list:GENERIC INCLUDE/ASM HEADER FILES, Stephen Hemminger,
pasha.tatashin@soleen.com, Arnd Bergmann, Haiyang Zhang,
Linux Kernel List, virtualization@lists.linux-foundation.org,
Nuno Das Neves, Sunil Muthuswamy, Vineeth Pillai
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:05 AM
>
> We will need to identify the device we want Microsoft Hypervisor to
> manipulate. Introduce the data structures for that purpose.
>
> They will be used in a later patch.
>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v6:
> 1. Add reserved0 as field name.
> ---
> include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
> 1 file changed, 79 insertions(+)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 94c7d77bbf68..ce53c0db28ae 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
> } element[];
> } __packed;
>
> +enum hv_device_type {
> + HV_DEVICE_TYPE_LOGICAL = 0,
> + HV_DEVICE_TYPE_PCI = 1,
> + HV_DEVICE_TYPE_IOAPIC = 2,
> + HV_DEVICE_TYPE_ACPI = 3,
> +};
> +
> +typedef u16 hv_pci_rid;
> +typedef u16 hv_pci_segment;
> +typedef u64 hv_logical_device_id;
> +union hv_pci_bdf {
> + u16 as_uint16;
> +
> + struct {
> + u8 function:3;
> + u8 device:5;
> + u8 bus;
> + };
> +} __packed;
> +
> +union hv_pci_bus_range {
> + u16 as_uint16;
> +
> + struct {
> + u8 subordinate_bus;
> + u8 secondary_bus;
> + };
> +} __packed;
> +
> +union hv_device_id {
> + u64 as_uint64;
> +
> + struct {
> + u64 reserved0:62;
> + u64 device_type:2;
> + };
> +
> + /* HV_DEVICE_TYPE_LOGICAL */
> + struct {
> + u64 id:62;
> + u64 device_type:2;
> + } logical;
> +
> + /* HV_DEVICE_TYPE_PCI */
> + struct {
> + union {
> + hv_pci_rid rid;
> + union hv_pci_bdf bdf;
> + };
> +
> + hv_pci_segment segment;
> + union hv_pci_bus_range shadow_bus_range;
> +
> + u16 phantom_function_bits:2;
> + u16 source_shadow:1;
> +
> + u16 rsvdz0:11;
> + u16 device_type:2;
> + } pci;
> +
> + /* HV_DEVICE_TYPE_IOAPIC */
> + struct {
> + u8 ioapic_id;
> + u8 rsvdz0;
> + u16 rsvdz1;
> + u16 rsvdz2;
> +
> + u16 rsvdz3:14;
> + u16 device_type:2;
> + } ioapic;
> +
> + /* HV_DEVICE_TYPE_ACPI */
> + struct {
> + u32 input_mapping_base;
> + u32 input_mapping_count:30;
> + u32 device_type:2;
> + } acpi;
> +} __packed;
> +
> #endif
> --
> 2.20.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
[not found] ` <20210203150435.27941-16-wei.liu@kernel.org>
@ 2021-02-04 17:43 ` Michael Kelley via Virtualization
[not found] ` <20210204175641.pzonxqrqlo7uvvze@liuwe-devbox-debian-v2>
0 siblings, 1 reply; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 17:43 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Stephen Hemminger, pasha.tatashin@soleen.com, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
Sunil Muthuswamy, Vineeth Pillai, Thomas Gleixner
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:05 AM
>
> When Linux runs as the root partition on Microsoft Hypervisor, its
> interrupts are remapped. Linux will need to explicitly map and unmap
> interrupts for hardware.
>
> Implement an MSI domain to issue the correct hypercalls. And initialize
> this irqdomain as the default MSI irq domain.
>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v6:
> 1. Use u64 status.
> 2. Use vpset instead of bitmap.
> 3. Factor out hv_map_interrupt
> 4. Address other misc comments.
>
> v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> v3: build irqdomain.o for 32bit as well.
> v2: This patch is simplified due to upstream changes.
> ---
> arch/x86/hyperv/Makefile | 2 +-
> arch/x86/hyperv/hv_init.c | 9 +
> arch/x86/hyperv/irqdomain.c | 362 ++++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 2 +
> 4 files changed, 374 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/hyperv/irqdomain.c
>
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 565358020921..48e2c51464e8 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -obj-y := hv_init.o mmu.o nested.o
> +obj-y := hv_init.o mmu.o nested.o irqdomain.o
> obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
>
> ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 11c5997691f4..894ce899f0cb 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -483,6 +483,15 @@ void __init hyperv_init(void)
>
> BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
>
> +#ifdef CONFIG_PCI_MSI
> + /*
> + * If we're running as root, we want to create our own PCI MSI domain.
> + * We can't set this in hv_pci_init because that would be too late.
> + */
> + if (hv_root_partition)
> + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
> +#endif
> +
> return;
>
> remove_cpuhp_state:
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> new file mode 100644
> index 000000000000..117f17e8c88a
> --- /dev/null
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -0,0 +1,362 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * for Linux to run as the root partition on Microsoft Hypervisor.
Nit: Looks like the initial word "Irqdomain" got dropped from the above
comment line. But don't respin just for this.
> + *
> + * Authors:
> + * Sunil Muthuswamy <sunilmut@microsoft.com>
> + * Wei Liu <wei.liu@kernel.org>
> + */
> +
> +#include <linux/pci.h>
> +#include <linux/irq.h>
> +#include <asm/mshyperv.h>
> +
> +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> + int cpu, int vector, struct hv_interrupt_entry *entry)
> +{
> + struct hv_input_map_device_interrupt *input;
> + struct hv_output_map_device_interrupt *output;
> + struct hv_device_interrupt_descriptor *intr_desc;
> + unsigned long flags;
> + u64 status;
> + cpumask_t mask = CPU_MASK_NONE;
> + int nr_bank, var_size;
> +
> + local_irq_save(flags);
> +
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + intr_desc = &input->interrupt_descriptor;
> + memset(input, 0, sizeof(*input));
> + input->partition_id = hv_current_partition_id;
> + input->device_id = device_id.as_uint64;
> + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> + intr_desc->vector_count = 1;
> + intr_desc->target.vector = vector;
> +
> + if (level)
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> + else
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> + cpumask_set_cpu(cpu, &mask);
> + intr_desc->target.vp_set.valid_bank_mask = 0;
> + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
There's a function get_cpu_mask() that returns a pointer to a cpumask with only
the specified cpu set in the mask. It returns a const pointer to the correct entry
in a pre-allocated array of all such cpumasks, so it's a lot more efficient than
allocating and initializing a local cpumask instance on the stack.
> + if (nr_bank < 0) {
> + local_irq_restore(flags);
> + pr_err("%s: unable to generate VP set\n", __func__);
> + return EINVAL;
> + }
> + intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
> +
> + /*
> + * var-sized hypercall, var-size starts after vp_mask (thus
> + * vp_set.format does not count, but vp_set.valid_bank_mask
> + * does).
> + */
> + var_size = nr_bank + 1;
> +
> + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size,
> + input, output);
> + *entry = output->interrupt_entry;
> +
> + local_irq_restore(flags);
> +
> + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS)
> + pr_err("%s: hypercall failed, status %lld\n", __func__, status);
> +
> + return status & HV_HYPERCALL_RESULT_MASK;
> +}
> +
> +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> +{
> + unsigned long flags;
> + struct hv_input_unmap_device_interrupt *input;
> + struct hv_interrupt_entry *intr_entry;
> + u64 status;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + memset(input, 0, sizeof(*input));
> + intr_entry = &input->interrupt_entry;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = id;
> + *intr_entry = *old_entry;
> +
> + status = hv_do_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, input, NULL);
> + local_irq_restore(flags);
> +
> + return status & HV_HYPERCALL_RESULT_MASK;
> +}
> +
> +#ifdef CONFIG_PCI_MSI
> +struct rid_data {
> + struct pci_dev *bridge;
> + u32 rid;
> +};
> +
> +static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
> +{
> + struct rid_data *rd = data;
> + u8 bus = PCI_BUS_NUM(rd->rid);
> +
> + if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
> + rd->bridge = pdev;
> + rd->rid = alias;
> + }
> +
> + return 0;
> +}
> +
> +static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> +{
> + union hv_device_id dev_id;
> + struct rid_data data = {
> + .bridge = NULL,
> + .rid = PCI_DEVID(dev->bus->number, dev->devfn)
> + };
> +
> + pci_for_each_dma_alias(dev, get_rid_cb, &data);
> +
> + dev_id.as_uint64 = 0;
> + dev_id.device_type = HV_DEVICE_TYPE_PCI;
> + dev_id.pci.segment = pci_domain_nr(dev->bus);
> +
> + dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> + dev_id.pci.bdf.device = PCI_SLOT(data.rid);
> + dev_id.pci.bdf.function = PCI_FUNC(data.rid);
> + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
> +
> + if (data.bridge) {
> + int pos;
> +
> + /*
> + * Microsoft Hypervisor requires a bus range when the bridge is
> + * running in PCI-X mode.
> + *
> + * To distinguish conventional vs PCI-X bridge, we can check
> + * the bridge's PCI-X Secondary Status Register, Secondary Bus
> + * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
> + * Specification Revision 1.0 5.2.2.1.3.
> + *
> + * Value zero means it is in conventional mode, otherwise it is
> + * in PCI-X mode.
> + */
> +
> + pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
> + if (pos) {
> + u16 status;
> +
> + pci_read_config_word(data.bridge, pos +
> + PCI_X_BRIDGE_SSTATUS, &status);
> +
> + if (status & PCI_X_SSTATUS_FREQ) {
> + /* Non-zero, PCI-X mode */
> + u8 sec_bus, sub_bus;
> +
> + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
> +
> + pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus);
> + dev_id.pci.shadow_bus_range.secondary_bus = sec_bus;
> + pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus);
> + dev_id.pci.shadow_bus_range.subordinate_bus = sub_bus;
> + }
> + }
> + }
> +
> + return dev_id;
> +}
> +
> +static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id = hv_build_pci_dev_id(dev);
> +
> + return hv_map_interrupt(device_id, false, cpu, vector, entry);
> +}
> +
> +static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
> +{
> + /* High address is always 0 */
> + msg->address_hi = 0;
> + msg->address_lo = entry->msi_entry.address.as_uint32;
> + msg->data = entry->msi_entry.data.as_uint32;
> +}
> +
> +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry);
> +static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> + struct msi_desc *msidesc;
> + struct pci_dev *dev;
> + struct hv_interrupt_entry out_entry, *stored_entry;
> + struct irq_cfg *cfg = irqd_cfg(data);
> + cpumask_t *affinity;
> + int cpu;
> + u64 status;
> +
> + msidesc = irq_data_get_msi_desc(data);
> + dev = msi_desc_to_pci_dev(msidesc);
> +
> + if (!cfg) {
> + pr_debug("%s: cfg is NULL", __func__);
> + return;
> + }
> +
> + affinity = irq_data_get_effective_affinity_mask(data);
> + cpu = cpumask_first_and(affinity, cpu_online_mask);
> +
> + if (data->chip_data) {
> + /*
> + * This interrupt is already mapped. Let's unmap first.
> + *
> + * We don't use retarget interrupt hypercalls here because
> + * Microsoft Hypervisor doens't allow root to change the vector
> + * or specify VPs outside of the set that is initially used
> + * during mapping.
> + */
> + stored_entry = data->chip_data;
> + data->chip_data = NULL;
> +
> + status = hv_unmap_msi_interrupt(dev, stored_entry);
> +
> + kfree(stored_entry);
> +
> + if (status != HV_STATUS_SUCCESS) {
> + pr_debug("%s: failed to unmap, status %lld", __func__, status);
> + return;
> + }
> + }
> +
> + stored_entry = kzalloc(sizeof(*stored_entry), GFP_ATOMIC);
> + if (!stored_entry) {
> + pr_debug("%s: failed to allocate chip data\n", __func__);
> + return;
> + }
> +
> + status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry);
> + if (status != HV_STATUS_SUCCESS) {
> + kfree(stored_entry);
> + return;
> + }
> +
> + *stored_entry = out_entry;
> + data->chip_data = stored_entry;
> + entry_to_msi_msg(&out_entry, msg);
> +
> + return;
> +}
> +
> +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry)
> +{
> + return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry);
> +}
> +
> +static void hv_teardown_msi_irq_common(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
> +{
> + u64 status;
> + struct hv_interrupt_entry old_entry;
> + struct irq_desc *desc;
> + struct irq_data *data;
> + struct msi_msg msg;
> +
> + desc = irq_to_desc(irq);
> + if (!desc) {
> + pr_debug("%s: no irq desc\n", __func__);
> + return;
> + }
> +
> + data = &desc->irq_data;
> + if (!data) {
> + pr_debug("%s: no irq data\n", __func__);
> + return;
> + }
> +
> + if (!data->chip_data) {
> + pr_debug("%s: no chip data\n!", __func__);
> + return;
> + }
> +
> + old_entry = *(struct hv_interrupt_entry *)data->chip_data;
> + entry_to_msi_msg(&old_entry, &msg);
> +
> + kfree(data->chip_data);
> + data->chip_data = NULL;
> +
> + status = hv_unmap_msi_interrupt(dev, &old_entry);
> +
> + if (status != HV_STATUS_SUCCESS) {
> + pr_err("%s: hypercall failed, status %lld\n", __func__, status);
> + return;
> + }
> +}
> +
> +static void hv_msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +{
> + int i;
> + struct msi_desc *entry;
> + struct pci_dev *pdev;
> +
> + if (WARN_ON_ONCE(!dev_is_pci(dev)))
> + return;
> +
> + pdev = to_pci_dev(dev);
> +
> + for_each_pci_msi_entry(entry, pdev) {
> + if (entry->irq) {
> + for (i = 0; i < entry->nvec_used; i++) {
> + hv_teardown_msi_irq_common(pdev, entry, entry->irq + i);
> + irq_domain_free_irqs(entry->irq + i, 1);
> + }
> + }
> + }
> +}
> +
> +/*
> + * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
> + * which implement the MSI or MSI-X Capability Structure.
> + */
> +static struct irq_chip hv_pci_msi_controller = {
> + .name = "HV-PCI-MSI",
> + .irq_unmask = pci_msi_unmask_irq,
> + .irq_mask = pci_msi_mask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_compose_msi_msg = hv_irq_compose_msi_msg,
> + .irq_set_affinity = msi_domain_set_affinity,
> + .flags = IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static struct msi_domain_ops pci_msi_domain_ops = {
> + .domain_free_irqs = hv_msi_domain_free_irqs,
> + .msi_prepare = pci_msi_prepare,
> +};
> +
> +static struct msi_domain_info hv_pci_msi_domain_info = {
> + .flags = MSI_FLAG_USE_DEF_DOM_OPS |
> MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_PCI_MSIX,
> + .ops = &pci_msi_domain_ops,
> + .chip = &hv_pci_msi_controller,
> + .handler = handle_edge_irq,
> + .handler_name = "edge",
> +};
> +
> +struct irq_domain * __init hv_create_pci_msi_domain(void)
> +{
> + struct irq_domain *d = NULL;
> + struct fwnode_handle *fn;
> +
> + fn = irq_domain_alloc_named_fwnode("HV-PCI-MSI");
> + if (fn)
> + d = pci_msi_create_irq_domain(fn, &hv_pci_msi_domain_info, x86_vector_domain);
> +
> + /* No point in going further if we can't get an irq domain */
> + BUG_ON(!d);
> +
> + return d;
> +}
> +
> +#endif /* CONFIG_PCI_MSI */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index cbee72550a12..ccc849e25d5e 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -261,6 +261,8 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry
> *msi_entry,
> msi_entry->data.as_uint32 = msi_desc->msg.data;
> }
>
> +struct irq_domain *hv_create_pci_msi_domain(void);
> +
> #else /* CONFIG_HYPERV */
> static inline void hyperv_init(void) {}
> static inline void hyperv_setup_mmu_ops(void) {}
> --
> 2.20.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
[not found] ` <20210203150435.27941-17-wei.liu@kernel.org>
2021-02-04 13:33 ` [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Joerg Roedel
@ 2021-02-04 17:53 ` Michael Kelley via Virtualization
1 sibling, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 17:53 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Stephen Hemminger, pasha.tatashin@soleen.com, Will Deacon,
Haiyang Zhang, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
open list:IOMMU DRIVERS, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, Nuno Das Neves, Sunil Muthuswamy, Vineeth Pillai,
Thomas Gleixner, Joerg Roedel
From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 7:05 AM
>
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
> arch/x86/hyperv/irqdomain.c | 25 +++++
> arch/x86/include/asm/mshyperv.h | 4 +
> drivers/iommu/hyperv-iommu.c | 177 +++++++++++++++++++++++++++++++-
> 3 files changed, 203 insertions(+), 3 deletions(-)
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
[not found] ` <20210204175641.pzonxqrqlo7uvvze@liuwe-devbox-debian-v2>
@ 2021-02-04 18:40 ` Michael Kelley via Virtualization
0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley via Virtualization @ 2021-02-04 18:40 UTC (permalink / raw)
To: Wei Liu
Cc: Linux on Hyper-V List, Stephen Hemminger,
pasha.tatashin@soleen.com, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, virtualization@lists.linux-foundation.org,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
Sunil Muthuswamy, Vineeth Pillai, Thomas Gleixner
From: Wei Liu <wei.liu@kernel.org> Sent: Thursday, February 4, 2021 9:57 AM
>
> On Thu, Feb 04, 2021 at 05:43:16PM +0000, Michael Kelley wrote:
> [...]
> > > remove_cpuhp_state:
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > new file mode 100644
> > > index 000000000000..117f17e8c88a
> > > --- /dev/null
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -0,0 +1,362 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +/*
> > > + * for Linux to run as the root partition on Microsoft Hypervisor.
> >
> > Nit: Looks like the initial word "Irqdomain" got dropped from the above
> > comment line. But don't respin just for this.
> >
>
> I've added it back. Thanks.
>
> > > +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > > + int cpu, int vector, struct hv_interrupt_entry *entry)
> > > +{
> > > + struct hv_input_map_device_interrupt *input;
> > > + struct hv_output_map_device_interrupt *output;
> > > + struct hv_device_interrupt_descriptor *intr_desc;
> > > + unsigned long flags;
> > > + u64 status;
> > > + cpumask_t mask = CPU_MASK_NONE;
> > > + int nr_bank, var_size;
> > > +
> > > + local_irq_save(flags);
> > > +
> > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > +
> > > + intr_desc = &input->interrupt_descriptor;
> > > + memset(input, 0, sizeof(*input));
> > > + input->partition_id = hv_current_partition_id;
> > > + input->device_id = device_id.as_uint64;
> > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > + intr_desc->vector_count = 1;
> > > + intr_desc->target.vector = vector;
> > > +
> > > + if (level)
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > + else
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > + cpumask_set_cpu(cpu, &mask);
> > > + intr_desc->target.vp_set.valid_bank_mask = 0;
> > > + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> >
> > There's a function get_cpu_mask() that returns a pointer to a cpumask with only
> > the specified cpu set in the mask. It returns a const pointer to the correct entry
> > in a pre-allocated array of all such cpumasks, so it's a lot more efficient than
> > allocating and initializing a local cpumask instance on the stack.
> >
>
> That's nice.
>
> I've got the following diff to fix both issues. If you're happy with the
> changes, can you give your Reviewed-by? That saves a round of posting.
>
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 0cabc9aece38..fa71db798465 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -1,7 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
>
> /*
> - * for Linux to run as the root partition on Microsoft Hypervisor.
> + * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> *
> * Authors:
> * Sunil Muthuswamy <sunilmut@microsoft.com>
> @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> struct hv_device_interrupt_descriptor *intr_desc;
> unsigned long flags;
> u64 status;
> - cpumask_t mask = CPU_MASK_NONE;
> + const cpumask_t *mask;
> int nr_bank, var_size;
>
> local_irq_save(flags);
> @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, bool
> level,
> else
> intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
>
> - cpumask_set_cpu(cpu, &mask);
> + mask = cpumask_of(cpu);
> intr_desc->target.vp_set.valid_bank_mask = 0;
> intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);
Can you just do the following and get rid of the 'mask' local entirely?
nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
Either way,
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> if (nr_bank < 0) {
> local_irq_restore(flags);
> pr_err("%s: unable to generate VP set\n", __func__);
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
[not found] ` <20210204183841.y4fgwjuggtbrnere@liuwe-devbox-debian-v2>
@ 2021-02-04 18:45 ` Rafael J. Wysocki
0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2021-02-04 18:45 UTC (permalink / raw)
To: Wei Liu
Cc: Linux on Hyper-V List, Rafael J. Wysocki, Pavel Tatashin,
open list:ACPI COMPONENT ARCHITECTURE (ACPICA),
open list:ACPI COMPONENT ARCHITECTURE (ACPICA), Erik Kaneda,
Linux Kernel List, Michael Kelley, Robert Moore, Nuno Das Neves,
Sunil Muthuswamy, virtualization, Vineeth Pillai, Len Brown
On Thu, Feb 4, 2021 at 7:41 PM Wei Liu <wei.liu@kernel.org> wrote:
>
> On Wed, Feb 03, 2021 at 03:04:27PM +0000, Wei Liu wrote:
> > There is already a stub function for pxm_to_node but conversion to the
> > other direction is missing.
> >
> > It will be used by Microsoft Hypervisor code later.
> >
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
>
> Hi ACPI maintainers, if you're happy with this patch I can take it via
> the hyperv-next tree, given the issue is discovered when pxm_to_node is
> called in our code.
Yes, you can.
Thanks!
>
> > ---
> > v6: new
> > ---
> > include/acpi/acpi_numa.h | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> > index a4c6ef809e27..40a91ce87e04 100644
> > --- a/include/acpi/acpi_numa.h
> > +++ b/include/acpi/acpi_numa.h
> > @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
> > {
> > return 0;
> > }
> > +static inline int node_to_pxm(int node)
> > +{
> > + return 0;
> > +}
> > #endif /* CONFIG_ACPI_NUMA */
> >
> > #ifdef CONFIG_ACPI_HMAT
> > --
> > 2.20.1
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-02-04 18:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210203150435.27941-1-wei.liu@kernel.org>
[not found] ` <20210203150435.27941-17-wei.liu@kernel.org>
2021-02-04 13:33 ` [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Joerg Roedel
2021-02-04 17:53 ` Michael Kelley via Virtualization
[not found] ` <20210203150435.27941-3-wei.liu@kernel.org>
2021-02-04 16:49 ` [PATCH v6 02/16] x86/hyperv: detect if Linux is the " Michael Kelley via Virtualization
[not found] ` <20210203150435.27941-6-wei.liu@kernel.org>
2021-02-04 16:52 ` [PATCH v6 05/16] x86/hyperv: allocate output arg pages if required Michael Kelley via Virtualization
[not found] ` <20210203150435.27941-7-wei.liu@kernel.org>
2021-02-04 16:54 ` [PATCH v6 06/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary Michael Kelley via Virtualization
[not found] ` <20210203150435.27941-9-wei.liu@kernel.org>
2021-02-04 16:56 ` [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm() Michael Kelley via Virtualization
[not found] ` <20210204183841.y4fgwjuggtbrnere@liuwe-devbox-debian-v2>
2021-02-04 18:45 ` Rafael J. Wysocki
[not found] ` <20210203150435.27941-10-wei.liu@kernel.org>
2021-02-04 17:13 ` [PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions Michael Kelley via Virtualization
[not found] ` <20210203150435.27941-14-wei.liu@kernel.org>
2021-02-04 17:15 ` [PATCH v6 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures Michael Kelley via Virtualization
[not found] ` <20210203150435.27941-16-wei.liu@kernel.org>
2021-02-04 17:43 ` [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition Michael Kelley via Virtualization
[not found] ` <20210204175641.pzonxqrqlo7uvvze@liuwe-devbox-debian-v2>
2021-02-04 18:40 ` Michael Kelley via Virtualization
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).