* Re: [PATCH RFC v1 05/18] clocksource/hyperv: use MSR-based access if running as root
[not found] ` <20200914112802.80611-6-wei.liu@kernel.org>
@ 2020-09-15 10:10 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 10:10 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Wei Liu, Stephen Hemminger, Haiyang Zhang, Daniel Lezcano,
virtualization, Linux Kernel List, Nuno Das Neves,
Sunil Muthuswamy, Michael Kelley, Vineeth Pillai, Thomas Gleixner
Wei Liu <wei.liu@kernel.org> writes:
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> drivers/clocksource/hyperv_timer.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index 09aa44cb8a91..fe96082ce85e 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
> if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
> return false;
>
> + if (hv_root_partition)
> + return false;
> +
Out of pure curiosity,
TSC page clocksource seems to be available to the root partition (as
HV_MSR_REFERENCE_TSC_AVAILABLE is set), why don't we use it? (I
understand that with TSC scaling support in modern CPUs even migration
is a no-issue and we can use raw TSC but this all seems to be
independent from root/non-root partition question).
> hv_read_reference_counter = read_hv_clock_tsc;
> phys_addr = virt_to_phys(hv_get_tsc_page());
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 06/18] x86/hyperv: allocate output arg pages if required
[not found] ` <20200914112802.80611-7-wei.liu@kernel.org>
@ 2020-09-15 10:16 ` Vitaly Kuznetsov
[not found] ` <20200915124318.z6tisek5y4d7e254@liuwe-devbox-debian-v2>
0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 10:16 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Wei Liu, Stephen Hemminger, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), virtualization,
Linux Kernel List, Nuno Das Neves, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, Michael Kelley,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> arch/x86/hyperv/hv_init.c | 45 +++++++++++++++++++++++++++++----
> arch/x86/include/asm/mshyperv.h | 1 +
> 2 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index cac8e4c56261..ebba4be4185d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -45,6 +45,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> void __percpu **hyperv_pcpu_input_arg;
> EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>
> +void __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
> u32 hv_max_vp_index;
> EXPORT_SYMBOL_GPL(hv_max_vp_index);
>
> @@ -75,14 +78,29 @@ static int hv_cpu_init(unsigned int cpu)
> u64 msr_vp_index;
> struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> void **input_arg;
> - struct page *pg;
> + struct page *input_pg;
>
> input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> - if (unlikely(!pg))
> + input_pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + if (unlikely(!input_pg))
> return -ENOMEM;
> - *input_arg = page_address(pg);
> + *input_arg = page_address(input_pg);
> +
> + if (hv_root_partition) {
> + struct page *output_pg;
> + void **output_arg;
> +
> + output_pg = alloc_page(irqs_disabled() ? GFP_ATOMIC :
> GFP_KERNEL);
To simplify the code, can we just rename 'input_arg' to 'hypercall_args'
and do alloc_pages(rqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 1) to
allocate two pages above?
> + if (unlikely(!output_pg)) {
> + free_page((unsigned long)*input_arg);
> + *input_arg = NULL;
> + return -ENOMEM;
> + }
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(output_pg);
> + }
>
> hv_get_vp_index(msr_vp_index);
>
> @@ -209,14 +227,25 @@ static int hv_cpu_die(unsigned int cpu)
> unsigned int new_cpu;
> unsigned long flags;
> void **input_arg;
> - void *input_pg = NULL;
> + void *input_pg = NULL, *output_pg = NULL;
>
> local_irq_save(flags);
> input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> input_pg = *input_arg;
> *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + output_pg = *output_arg;
> + *output_arg = NULL;
> + }
> +
> local_irq_restore(flags);
> +
> free_page((unsigned long)input_pg);
> + free_page((unsigned long)output_pg);
>
> if (hv_vp_assist_page && hv_vp_assist_page[cpu])
> wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -350,6 +379,12 @@ void __init hyperv_init(void)
>
> BUG_ON(hyperv_pcpu_input_arg == NULL);
>
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void *);
redundant space ^^^^^
> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
> /* Allocate percpu VP index */
> hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 2a2cc81beac6..f5c62140f28d 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -63,6 +63,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
> #if IS_ENABLED(CONFIG_HYPERV)
> extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> +extern void __percpu **hyperv_pcpu_output_arg;
>
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 07/18] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
[not found] ` <20200914112802.80611-8-wei.liu@kernel.org>
@ 2020-09-15 10:27 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 10:27 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: open list:GENERIC INCLUDE/ASM HEADER FILES, Wei Liu,
Stephen Hemminger, Arnd Bergmann, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), virtualization,
Linux Kernel List, Nuno Das Neves, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, Michael Kelley,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> We will need the partition ID for executing some hypercalls later.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 2 ++
> include/asm-generic/hyperv-tlfs.h | 6 ++++++
> 3 files changed, 34 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index ebba4be4185d..0eec1ed32023 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -30,6 +30,9 @@
> bool hv_root_partition;
> EXPORT_SYMBOL_GPL(hv_root_partition);
>
> +u64 hv_current_partition_id;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
> void *hv_hypercall_pg;
> EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>
> @@ -345,6 +348,26 @@ static struct syscore_ops hv_syscore_ops = {
> .resume = hv_resume,
> };
>
> +void __init hv_get_partition_id(void)
> +{
> + struct hv_get_partition_id *output_page;
> + int status;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> + HV_HYPERCALL_RESULT_MASK;
Nit: in this case status is 'u16', we can define it as such (instead of
signed int).
> + if (status != HV_STATUS_SUCCESS)
> + pr_err("Failed to get partition ID: %d\n", status);
> + else
> + hv_current_partition_id = output_page->partition_id;
> + local_irq_restore(flags);
> +
> + /* No point in proceeding if this failed */
> + BUG_ON(status != HV_STATUS_SUCCESS);
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -440,6 +463,9 @@ void __init hyperv_init(void)
>
> register_syscore_ops(&hv_syscore_ops);
>
> + if (hv_root_partition)
> + hv_get_partition_id();
According to TLFS, partition ID is available when AccessPartitionId
privilege is granted. I'd suggest we check that instead of
hv_root_partition (and we can set hv_current_partition_id to something
like U64_MAX so we know it wasn't acuired). So the BUG_ON condition will
move here:
hv_get_partition_id();
BUG_ON(hv_root_partition && hv_current_partition_id == U64_MAX);
> +
> return;
>
> remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index f5c62140f28d..4039302e0ae9 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -65,6 +65,8 @@ extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> extern void __percpu **hyperv_pcpu_output_arg;
>
> +extern u64 hv_current_partition_id;
> +
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> +#define HVCALL_GET_PARTITION_ID 0x0046
> #define HVCALL_GET_VP_REGISTERS 0x0050
> #define HVCALL_SET_VP_REGISTERS 0x0051
> #define HVCALL_POST_MESSAGE 0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
> u64 gva_list[];
> } __packed;
>
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> + u64 partition_id;
> +} __packed;
> +
> /* HvRetargetDeviceInterrupt hypercall */
> union hv_msi_entry {
> u64 as_uint64;
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root
[not found] ` <20200914112802.80611-9-wei.liu@kernel.org>
@ 2020-09-15 10:32 ` Vitaly Kuznetsov
[not found] ` <20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2>
0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 10:32 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Wei Liu, Stephen Hemminger, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), virtualization,
Linux Kernel List, Nuno Das Neves, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, Michael Kelley,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> When Linux is running as the root partition, the hypercall page will
> have already been setup by Hyper-V. Copy the content over to the
> allocated page.
And we can't setup a new hypercall page by writing something different
to HV_X64_MSR_HYPERCALL, right?
>
> The suspend, resume and cleanup paths remain untouched because they are
> not supported in this setup yet.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++--
> 1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0eec1ed32023..26233aebc86c 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -25,6 +25,7 @@
> #include <linux/cpuhotplug.h>
> #include <linux/syscore_ops.h>
> #include <clocksource/hyperv_timer.h>
> +#include <linux/highmem.h>
>
> /* Is Linux running as the root partition? */
> bool hv_root_partition;
> @@ -448,8 +449,29 @@ void __init hyperv_init(void)
>
> rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> hypercall_msr.enable = 1;
> - hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> - wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> + if (hv_root_partition) {
> + struct page *pg;
> + void *src, *dst;
> +
> + /*
> + * Order is important here. We must enable the hypercall page
> + * so it is populated with code, then copy the code to an
> + * executable page.
> + */
> + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> + pg = vmalloc_to_page(hv_hypercall_pg);
> + dst = kmap(pg);
> + src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE,
> + MEMREMAP_WB);
memremap() can fail...
> + memcpy(dst, src, PAGE_SIZE);
> + memunmap(src);
> + kunmap(pg);
> + } else {
> + hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> + }
Why can't we do wrmsrl() for both cases here?
>
> /*
> * Ignore any errors in setting up stimer clockevents
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 09/18] x86/hyperv: provide a bunch of helper functions
[not found] ` <20200914115928.83184-1-wei.liu@kernel.org>
@ 2020-09-15 11:00 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 11:00 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: open list:GENERIC INCLUDE/ASM HEADER FILES, Wei Liu,
Stephen Hemminger, Arnd Bergmann, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), virtualization,
Linux Kernel List, Nuno Das Neves, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, Michael Kelley,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> arch/x86/hyperv/Makefile | 2 +-
> arch/x86/hyperv/hv_proc.c | 209 ++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 4 +
> include/asm-generic/hyperv-tlfs.h | 56 ++++++++
> 4 files changed, 270 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/hyperv/hv_proc.c
>
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 89b1f74d3225..565358020921 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-y := hv_init.o mmu.o nested.o
> -obj-$(CONFIG_X86_64) += hv_apic.o
> +obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
>
> ifdef CONFIG_X86_64
> obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o
> diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
> new file mode 100644
> index 000000000000..847c72465d0e
> --- /dev/null
> +++ b/arch/x86/hyperv/hv_proc.c
> @@ -0,0 +1,209 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/types.h>
> +#include <linux/version.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
> +#include <linux/clockchips.h>
> +#include <linux/acpi.h>
> +#include <linux/hyperv.h>
> +#include <linux/slab.h>
> +#include <linux/cpuhotplug.h>
> +#include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
> +#include <asm/apic.h>
> +
> +#include <asm/trace/hyperv.h>
> +
> +#define HV_DEPOSIT_MAX_ORDER (8)
> +#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER)
> +
> +#define MAX(a, b) ((a) > (b) ? (a) : (b))
> +#define MIN(a, b) ((a) < (b) ? (a) : (b))
Nit: include/linux/kernel.h defines min() and max() macros with type
checking.
> +
> +/*
> + * Deposits exact number of pages
> + * Must be called with interrupts enabled
> + * Max 256 pages
> + */
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +{
> + struct page **pages;
> + int *counts;
> + int num_allocations;
> + int i, j, page_count;
> + int order;
> + int desired_order;
> + int status;
> + int ret;
> + u64 base_pfn;
> + struct hv_deposit_memory *input_page;
> + unsigned long flags;
> +
> + if (num_pages > HV_DEPOSIT_MAX)
> + return -EINVAL;
> + if (!num_pages)
> + return 0;
> +
> + ret = -ENOMEM;
> +
> + /* One buffer for page pointers and counts */
> + pages = page_address(alloc_page(GFP_KERNEL));
> + if (!pages)
> + goto free_buf;
There is nothing to free, just do 'return -ENOMEM' here;
> + counts = (int *)&pages[256];
> +
Oh this is weird. So 'pages' is an array of 512 'struct page *' items
and we use its second half (pages[256]) for an array of signed(!)
integers(!). Can we use a locally defined struct or something better for
that?
> + /* Allocate all the pages before disabling interrupts */
> + num_allocations = 0;
> + i = 0;
> + order = HV_DEPOSIT_MAX_ORDER;
> +
> + while (num_pages) {
> + /* Find highest order we can actually allocate */
> + desired_order = 31 - __builtin_clz(num_pages);
> + order = MIN(desired_order, order);
> + do {
> + pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
> + if (!pages[i]) {
> + if (!order) {
> + goto err_free_allocations;
> + }
> + --order;
> + }
> + } while (!pages[i]);
> +
> + split_page(pages[i], order);
> + counts[i] = 1 << order;
> + num_pages -= counts[i];
> + i++;
So here we believe we will never overrun the 2048 bytes we 'allocated'
for 'counts' above. While 'if (num_pages > HV_DEPOSIT_MAX)' presumably
guarantees that, this is not really obvious.
> + num_allocations++;
> + }
> +
> + local_irq_save(flags);
> +
> + input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + input_page->partition_id = partition_id;
> +
> + /* Populate gpa_page_list - these will fit on the input page */
> + for (i = 0, page_count = 0; i < num_allocations; ++i) {
> + base_pfn = page_to_pfn(pages[i]);
> + for (j = 0; j < counts[i]; ++j, ++page_count)
> + input_page->gpa_page_list[page_count] = base_pfn + j;
> + }
> + status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
> + page_count, 0, input_page,
> + NULL) & HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + if (status != HV_STATUS_SUCCESS) {
Nit: same like in one ov the previous patches, status can be 'u16'.
> + pr_err("Failed to deposit pages: %d\n", status);
> + ret = status;
> + goto err_free_allocations;
> + }
> +
> + ret = 0;
> + goto free_buf;
> +
> +err_free_allocations:
> + for (i = 0; i < num_allocations; ++i) {
> + base_pfn = page_to_pfn(pages[i]);
> + for (j = 0; j < counts[i]; ++j)
> + __free_page(pfn_to_page(base_pfn + j));
> + }
> +
> +free_buf:
> + free_page((unsigned long)pages);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(hv_call_deposit_pages);
> +
> +int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
> +{
> + struct hv_add_logical_processor_in *input;
> + struct hv_add_logical_processor_out *output;
> + int status;
> + unsigned long flags;
> + int ret = 0;
> +
> + do {
> + local_irq_save(flags);
> +
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + /* We don't do anything with the output right now */
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + input->lp_index = lp_index;
> + input->apic_id = apic_id;
> + input->flags = 0;
> + input->proximity_domain_info.domain_id = node_to_pxm(node);
> + input->proximity_domain_info.flags.reserved = 0;
> + input->proximity_domain_info.flags.proximity_info_valid = 1;
> + input->proximity_domain_info.flags.proximity_preferred = 1;
> + status = hv_do_hypercall(HVCALL_ADD_LOGICAL_PROCESSOR,
> + input, output);
> + local_irq_restore(flags);
> +
> + if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
> + if (status != HV_STATUS_SUCCESS) {
> + pr_err("%s: cpu %u apic ID %u, %d\n", __func__,
> + lp_index, apic_id, status);
> + ret = status;
> + }
> + break;
So if status == HV_STATUS_SUCCESS we break and avoid
hv_call_deposit_pages() below?
> + }
> + ret = hv_call_deposit_pages(node, hv_current_partition_id, 1);
> +
> + } while (!ret);
And if hv_call_deposit_pages() returns '0' we keep doing something? Sorry
but I'm probably missing something important in the 'depositing'
process, could you please add a comment explaining what's going on here?
> +
> + return ret;
> +}
> +
> +int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
> +{
> + struct hv_create_vp *input;
> + int status;
> + unsigned long irq_flags;
> + int ret = 0;
> +
> + /* Root VPs don't seem to need pages deposited */
> + if (partition_id != hv_current_partition_id) {
> + ret = hv_call_deposit_pages(node, partition_id, 90);
> + if (ret)
> + return ret;
> + }
> +
> + do {
> + local_irq_save(irq_flags);
> +
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + input->partition_id = partition_id;
> + input->vp_index = vp_index;
> + input->flags = flags;
> + if (node != NUMA_NO_NODE) {
> + input->proximity_domain_info.domain_id = node_to_pxm(node);
> + input->proximity_domain_info.flags.reserved = 0;
> + input->proximity_domain_info.flags.proximity_info_valid = 1;
> + input->proximity_domain_info.flags.proximity_preferred = 1;
> + } else {
> + input->proximity_domain_info.as_uint64 = 0;
> + }
> + status = hv_do_hypercall(HVCALL_CREATE_VP, input, NULL);
> + local_irq_restore(irq_flags);
> +
> + if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
> + if (status != HV_STATUS_SUCCESS) {
> + pr_err("%s: vcpu %u, lp %u, %d\n", __func__,
> + vp_index, flags, status);
> + ret = status;
> + }
> + break;
> + }
> + ret = hv_call_deposit_pages(node, partition_id, 1);
> +
> + } while (!ret);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(hv_call_create_vp);
> +
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 4039302e0ae9..60afc3e417d0 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -67,6 +67,10 @@ extern void __percpu **hyperv_pcpu_output_arg;
>
> extern u64 hv_current_partition_id;
>
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> +int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> +int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> +
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 87b1a79b19eb..2b05bed712c0 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -142,6 +142,8 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> #define HVCALL_GET_PARTITION_ID 0x0046
> +#define HVCALL_DEPOSIT_MEMORY 0x0048
> +#define HVCALL_CREATE_VP 0x004e
> #define HVCALL_GET_VP_REGISTERS 0x0050
> #define HVCALL_SET_VP_REGISTERS 0x0051
> #define HVCALL_POST_MESSAGE 0x005c
> @@ -149,6 +151,7 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_POST_DEBUG_DATA 0x0069
> #define HVCALL_RETRIEVE_DEBUG_DATA 0x006a
> #define HVCALL_RESET_DEBUG_SESSION 0x006b
> +#define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076
> #define HVCALL_RETARGET_INTERRUPT 0x007e
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> @@ -413,6 +416,59 @@ struct hv_get_partition_id {
> u64 partition_id;
> } __packed;
>
> +/* HvDepositMemory hypercall */
> +struct hv_deposit_memory {
> + u64 partition_id;
> + u64 gpa_page_list[];
> +};
Other structures above have '__packed' and I remember there were
different opinions if it is needed or not (for properly padded
structures). I'd suggest we stay consitent and keep adding it unless we
decide to get rid of them (but you've added it to the newly introduced
hv_get_partition_id above).
> +
> +
> +struct hv_proximity_domain_flags {
> + u32 proximity_preferred : 1;
> + u32 reserved : 30;
> + u32 proximity_info_valid : 1;
> +};
> +
> +/* Not a union in windows but useful for zeroing */
> +union hv_proximity_domain_info {
> + struct {
> + u32 domain_id;
> + struct hv_proximity_domain_flags flags;
> + };
> + u64 as_uint64;
> +};
> +
> +struct hv_lp_startup_status {
> + u64 hv_status;
> + u64 substatus1;
> + u64 substatus2;
> + u64 substatus3;
> + u64 substatus4;
> + u64 substatus5;
> + u64 substatus6;
> +};
> +
> +/* HvAddLogicalProcessor hypercalls */
s/hypercalls/hypercall/
> +struct hv_add_logical_processor_in {
> + u32 lp_index;
> + u32 apic_id;
> + union hv_proximity_domain_info proximity_domain_info;
> + u64 flags;
> +};
> +
> +struct hv_add_logical_processor_out {
> + struct hv_lp_startup_status startup_status;
> +};
> +
> +/* HvCreateVp hypercall */
> +struct hv_create_vp {
> + u64 partition_id;
> + u32 vp_index;
> + u32 padding;
> + union hv_proximity_domain_info proximity_domain_info;
> + u64 flags;
> +};
> +
> /* HvRetargetDeviceInterrupt hypercall */
> union hv_msi_entry {
> u64 as_uint64;
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root
[not found] ` <20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2>
@ 2020-09-15 11:02 ` Vitaly Kuznetsov
[not found] ` <20200915111657.boa4cneqjqtmcaaq@liuwe-devbox-debian-v2>
2020-09-16 21:34 ` [EXTERNAL] " Sunil Muthuswamy via Virtualization
1 sibling, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 11:02 UTC (permalink / raw)
To: Wei Liu
Cc: Wei Liu, Stephen Hemminger, Linux on Hyper-V List, Nuno Das Neves,
Haiyang Zhang, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, Michael Kelley, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, virtualization,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> On Tue, Sep 15, 2020 at 12:32:29PM +0200, Vitaly Kuznetsov wrote:
>> Wei Liu <wei.liu@kernel.org> writes:
>>
>> > When Linux is running as the root partition, the hypercall page will
>> > have already been setup by Hyper-V. Copy the content over to the
>> > allocated page.
>>
>> And we can't setup a new hypercall page by writing something different
>> to HV_X64_MSR_HYPERCALL, right?
>>
>
> My understanding is that we can't, but Sunil can maybe correct me.
>
>> >
>> > The suspend, resume and cleanup paths remain untouched because they are
>> > not supported in this setup yet.
>> >
>> > Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
>> > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> > Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
>> > Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
>> > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> > Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
>> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
>> > ---
>> > arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++--
>> > 1 file changed, 24 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> > index 0eec1ed32023..26233aebc86c 100644
>> > --- a/arch/x86/hyperv/hv_init.c
>> > +++ b/arch/x86/hyperv/hv_init.c
>> > @@ -25,6 +25,7 @@
>> > #include <linux/cpuhotplug.h>
>> > #include <linux/syscore_ops.h>
>> > #include <clocksource/hyperv_timer.h>
>> > +#include <linux/highmem.h>
>> >
>> > /* Is Linux running as the root partition? */
>> > bool hv_root_partition;
>> > @@ -448,8 +449,29 @@ void __init hyperv_init(void)
>> >
>> > rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > hypercall_msr.enable = 1;
>> > - hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>> > - wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > +
>> > + if (hv_root_partition) {
>> > + struct page *pg;
>> > + void *src, *dst;
>> > +
>> > + /*
>> > + * Order is important here. We must enable the hypercall page
>> > + * so it is populated with code, then copy the code to an
>> > + * executable page.
>> > + */
>> > + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > +
>> > + pg = vmalloc_to_page(hv_hypercall_pg);
>> > + dst = kmap(pg);
>> > + src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE,
>> > + MEMREMAP_WB);
>>
>> memremap() can fail...
>
> And we don't care here, if it fails, we would rather it panic or oops.
>
> I was relying on the fact that copying from / to a NULL pointer will
> cause the kernel to crash. But of course it wouldn't hurt to explicitly
> panic here.
>
>>
>> > + memcpy(dst, src, PAGE_SIZE);
>> > + memunmap(src);
>> > + kunmap(pg);
>> > + } else {
>> > + hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>> > + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > + }
>>
>> Why can't we do wrmsrl() for both cases here?
>>
>
> Because the hypercall page has already been set up when Linux is the
> root.
But you already do wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64)
in 'if (hv_root_partition)' case above, that's why I asked.
>
> I could've tried writing to the MSR again, but because the behaviour
> here is not documented and subject to change so I didn't bother trying.
>
> Wei.
>
>> >
>> > /*
>> > * Ignore any errors in setting up stimer clockevents
>>
>> --
>> Vitaly
>>
>
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 10/18] x86/hyperv: implement and use hv_smp_prepare_cpus
[not found] ` <20200914115928.83184-2-wei.liu@kernel.org>
@ 2020-09-15 11:14 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 11:14 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: Wei Liu, Stephen Hemminger, Haiyang Zhang,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), virtualization,
Linux Kernel List, Nuno Das Neves, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, Michael Kelley,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> Microsoft Hypervisor requires the root partition to make a few
> hypercalls to setup application processors before they can be used.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> CPU hotplug and unplug is not yet supported in this setup, so those
> paths remain untouched.
> ---
> arch/x86/kernel/cpu/mshyperv.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 1bf57d310f78..7522cae02759 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -203,6 +203,31 @@ static void __init hv_smp_prepare_boot_cpu(void)
> hv_init_spinlocks();
> #endif
> }
> +
> +static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> +{
> +#if defined(CONFIG_X86_64)
I think it makes little sense to try to make Linux work as Hyper-V root
partition when !CONFIG_X86_64. If we still care about Hyper-V enablement
for !CONFIG_X86_64 we can probably introduce something like
CONFIG_HYPERV_ROOT and enable it automatically, e.g.
config HYPERV_ROOT
def_bool HYPERV && X86_64
and use it instead.
> + int i;
> + int vp_index = 1;
> + int ret;
> +
> + native_smp_prepare_cpus(max_cpus);
> +
> + for_each_present_cpu(i) {
> + if (i == 0)
> + continue;
> + ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
> + BUG_ON(ret);
> + }
> +
> + for_each_present_cpu(i) {
> + if (i == 0)
> + continue;
> + ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, vp_index++, i);
So vp_index variable is needed here to make sure there are no gaps? (or
we could've just used 'i')?
> + BUG_ON(ret);
> + }
> +#endif
> +}
> #endif
>
> static void __init ms_hyperv_init_platform(void)
> @@ -359,6 +384,8 @@ static void __init ms_hyperv_init_platform(void)
>
> # ifdef CONFIG_SMP
> smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> + if (hv_root_partition)
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> # endif
>
> /*
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 13/18] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
[not found] ` <20200914115928.83184-5-wei.liu@kernel.org>
@ 2020-09-15 11:16 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 11:16 UTC (permalink / raw)
To: Wei Liu, Linux on Hyper-V List
Cc: open list:GENERIC INCLUDE/ASM HEADER FILES, Wei Liu,
Stephen Hemminger, Arnd Bergmann, Haiyang Zhang, virtualization,
Linux Kernel List, Nuno Das Neves, Sunil Muthuswamy,
Michael Kelley, Vineeth Pillai
Wei Liu <wei.liu@kernel.org> writes:
> We will need to identify the device we want Microsoft Hypervisor to
> manipulate. Introduce the data structures for that purpose.
>
> They will be used in a later patch.
>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
> 1 file changed, 79 insertions(+)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 83945ada5a50..faf892ce152d 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -612,4 +612,83 @@ struct hv_set_vp_registers_input {
> } element[];
> } __packed;
>
> +enum hv_device_type {
> + HV_DEVICE_TYPE_LOGICAL = 0,
> + HV_DEVICE_TYPE_PCI = 1,
> + HV_DEVICE_TYPE_IOAPIC = 2,
> + HV_DEVICE_TYPE_ACPI = 3,
> +};
> +
> +typedef u16 hv_pci_rid;
> +typedef u16 hv_pci_segment;
> +typedef u64 hv_logical_device_id;
> +union hv_pci_bdf {
> + u16 as_uint16;
> +
> + struct {
> + u8 function:3;
> + u8 device:5;
> + u8 bus;
> + };
> +} __packed;
> +
> +union hv_pci_bus_range {
> + u16 as_uint16;
> +
> + struct {
> + u8 subordinate_bus;
> + u8 secondary_bus;
> + };
> +} __packed;
> +
> +union hv_device_id {
> + u64 as_uint64;
> +
> + struct {
> + u64 :62;
> + u64 device_type:2;
> + };
> +
> + // HV_DEVICE_TYPE_LOGICAL
Nit: please no '//' comments.
> + struct {
> + u64 id:62;
> + u64 device_type:2;
> + } logical;
> +
> + // HV_DEVICE_TYPE_PCI
> + struct {
> + union {
> + hv_pci_rid rid;
> + union hv_pci_bdf bdf;
> + };
> +
> + hv_pci_segment segment;
> + union hv_pci_bus_range shadow_bus_range;
> +
> + u16 phantom_function_bits:2;
> + u16 source_shadow:1;
> +
> + u16 rsvdz0:11;
> + u16 device_type:2;
> + } pci;
> +
> + // HV_DEVICE_TYPE_IOAPIC
> + struct {
> + u8 ioapic_id;
> + u8 rsvdz0;
> + u16 rsvdz1;
> + u16 rsvdz2;
> +
> + u16 rsvdz3:14;
> + u16 device_type:2;
> + } ioapic;
> +
> + // HV_DEVICE_TYPE_ACPI
> + struct {
> + u32 input_mapping_base;
> + u32 input_mapping_count:30;
> + u32 device_type:2;
> + } acpi;
> +} __packed;
> +
> #endif
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root
[not found] ` <20200915111657.boa4cneqjqtmcaaq@liuwe-devbox-debian-v2>
@ 2020-09-15 11:23 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-15 11:23 UTC (permalink / raw)
To: Wei Liu
Cc: Wei Liu, Stephen Hemminger, Linux on Hyper-V List, Nuno Das Neves,
Haiyang Zhang, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, Michael Kelley, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, virtualization,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> On Tue, Sep 15, 2020 at 01:02:08PM +0200, Vitaly Kuznetsov wrote:
>> Wei Liu <wei.liu@kernel.org> writes:
>>
>> > On Tue, Sep 15, 2020 at 12:32:29PM +0200, Vitaly Kuznetsov wrote:
>> >> Wei Liu <wei.liu@kernel.org> writes:
>> >>
>> >> > When Linux is running as the root partition, the hypercall page will
>> >> > have already been setup by Hyper-V. Copy the content over to the
>> >> > allocated page.
>> >>
>> >> And we can't setup a new hypercall page by writing something different
>> >> to HV_X64_MSR_HYPERCALL, right?
>> >>
>> >
>> > My understanding is that we can't, but Sunil can maybe correct me.
>> >
>> >> >
>> >> > The suspend, resume and cleanup paths remain untouched because they are
>> >> > not supported in this setup yet.
>> >> >
>> >> > Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
>> >> > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> >> > Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
>> >> > Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
>> >> > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> >> > Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
>> >> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
>> >> > ---
>> >> > arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++--
>> >> > 1 file changed, 24 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> >> > index 0eec1ed32023..26233aebc86c 100644
>> >> > --- a/arch/x86/hyperv/hv_init.c
>> >> > +++ b/arch/x86/hyperv/hv_init.c
>> >> > @@ -25,6 +25,7 @@
>> >> > #include <linux/cpuhotplug.h>
>> >> > #include <linux/syscore_ops.h>
>> >> > #include <clocksource/hyperv_timer.h>
>> >> > +#include <linux/highmem.h>
>> >> >
>> >> > /* Is Linux running as the root partition? */
>> >> > bool hv_root_partition;
>> >> > @@ -448,8 +449,29 @@ void __init hyperv_init(void)
>> >> >
>> >> > rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > hypercall_msr.enable = 1;
>> >> > - hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>> >> > - wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > +
>> >> > + if (hv_root_partition) {
>> >> > + struct page *pg;
>> >> > + void *src, *dst;
>> >> > +
>> >> > + /*
>> >> > + * Order is important here. We must enable the hypercall page
>> >> > + * so it is populated with code, then copy the code to an
>> >> > + * executable page.
>> >> > + */
>> >> > + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > +
>> >> > + pg = vmalloc_to_page(hv_hypercall_pg);
>> >> > + dst = kmap(pg);
>> >> > + src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE,
>> >> > + MEMREMAP_WB);
>> >>
>> >> memremap() can fail...
>> >
>> > And we don't care here, if it fails, we would rather it panic or oops.
>> >
>> > I was relying on the fact that copying from / to a NULL pointer will
>> > cause the kernel to crash. But of course it wouldn't hurt to explicitly
>> > panic here.
>> >
>> >>
>> >> > + memcpy(dst, src, PAGE_SIZE);
>> >> > + memunmap(src);
>> >> > + kunmap(pg);
>> >> > + } else {
>> >> > + hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>> >> > + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > + }
>> >>
>> >> Why can't we do wrmsrl() for both cases here?
>> >>
>> >
>> > Because the hypercall page has already been set up when Linux is the
>> > root.
>>
>> But you already do wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64)
>> in 'if (hv_root_partition)' case above, that's why I asked.
>>
>
> You mean extracting wrmsrl to this point? The ordering matters. See the
> comment in the root branch -- we have to enable the page before copying
> the content.
>
> What can be done is:
>
> if (!root) {
> /* some stuff */
> }
>
> wrmsrl(...)
>
> if (root) {
> /* some stuff */
> }
>
> This is not looking any better than the existing code.
>
Oh, I missed the comment indeed. So Hypervisor already picked a page for
us, however, it didn't enable it and it's not populated? How can we be
sure that we didn't use it for something else already? Maybe we can
still give a different known-to-be-empty page?
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 06/18] x86/hyperv: allocate output arg pages if required
[not found] ` <20200916154200.7nf74vjmqu3f6sfq@liuwe-devbox-debian-v2>
@ 2020-09-16 16:10 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-16 16:10 UTC (permalink / raw)
To: Wei Liu
Cc: Wei Liu, Stephen Hemminger, Linux on Hyper-V List, Nuno Das Neves,
Haiyang Zhang, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Linux Kernel List, Michael Kelley, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Borislav Petkov, Sunil Muthuswamy, virtualization,
Vineeth Pillai, Lillian Grassin-Drake
Wei Liu <wei.liu@kernel.org> writes:
> On Tue, Sep 15, 2020 at 12:43:18PM +0000, Wei Liu wrote:
>> On Tue, Sep 15, 2020 at 12:16:58PM +0200, Vitaly Kuznetsov wrote:
>> > Wei Liu <wei.liu@kernel.org> writes:
>> >
>> > > When Linux runs as the root partition, it will need to make hypercalls
>> > > which return data from the hypervisor.
>> > >
>> > > Allocate pages for storing results when Linux runs as the root
>> > > partition.
>> > >
>> > > Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
>> > > Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
>> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
>> > > ---
>> > > arch/x86/hyperv/hv_init.c | 45 +++++++++++++++++++++++++++++----
>> > > arch/x86/include/asm/mshyperv.h | 1 +
>> > > 2 files changed, 41 insertions(+), 5 deletions(-)
>> > >
>> > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> > > index cac8e4c56261..ebba4be4185d 100644
>> > > --- a/arch/x86/hyperv/hv_init.c
>> > > +++ b/arch/x86/hyperv/hv_init.c
>> > > @@ -45,6 +45,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>> > > void __percpu **hyperv_pcpu_input_arg;
>> > > EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>> > >
>> > > +void __percpu **hyperv_pcpu_output_arg;
>> > > +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
>> > > +
>> > > u32 hv_max_vp_index;
>> > > EXPORT_SYMBOL_GPL(hv_max_vp_index);
>> > >
>> > > @@ -75,14 +78,29 @@ static int hv_cpu_init(unsigned int cpu)
>> > > u64 msr_vp_index;
>> > > struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> > > void **input_arg;
>> > > - struct page *pg;
>> > > + struct page *input_pg;
>> > >
>> > > input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>> > > /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
>> > > - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
>> > > - if (unlikely(!pg))
>> > > + input_pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
>> > > + if (unlikely(!input_pg))
>> > > return -ENOMEM;
>> > > - *input_arg = page_address(pg);
>> > > + *input_arg = page_address(input_pg);
>> > > +
>> > > + if (hv_root_partition) {
>> > > + struct page *output_pg;
>> > > + void **output_arg;
>> > > +
>> > > + output_pg = alloc_page(irqs_disabled() ? GFP_ATOMIC :
>> > > GFP_KERNEL);
>> >
>> > To simplify the code, can we just rename 'input_arg' to 'hypercall_args'
>> > and do alloc_pages(rqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 1) to
>> > allocate two pages above?
>>
>> It should be doable, but I need to code it up and test it to be 100%
>> sure.
>>
>
> I switch to alloc_pages to allocate an order of 2 page if necessary, but
> I keep input_arg and output_arg separate because I want to avoid code
> churn in other places.
>
My idea was that we're free to choose how to use these pages, e.g. with
two pages allocated we can now do a hypercall which takes two pages of
input and produces no output other than the return value. This doesn't
contradict your suggestion to keep input_arg/output_arg as these are
just pointers to the continuous space, we can still do the trick.
I don't feel strong about this and you probably have more patches in
your stash, no need for massive re-work I believe.
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [EXTERNAL] Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root
[not found] ` <20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2>
2020-09-15 11:02 ` Vitaly Kuznetsov
@ 2020-09-16 21:34 ` Sunil Muthuswamy via Virtualization
2020-09-17 11:06 ` Vitaly Kuznetsov
1 sibling, 1 reply; 14+ messages in thread
From: Sunil Muthuswamy via Virtualization @ 2020-09-16 21:34 UTC (permalink / raw)
To: Wei Liu, vkuznets
Cc: Linux on Hyper-V List, Stephen Hemminger, Lillian Grassin-Drake,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
virtualization@lists.linux-foundation.org, Linux Kernel List,
Michael Kelley, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
Thomas Gleixner, Nuno Das Neves, Vineeth Pillai, Haiyang Zhang
>
> On Tue, Sep 15, 2020 at 12:32:29PM +0200, Vitaly Kuznetsov wrote:
> > Wei Liu <wei.liu@kernel.org> writes:
> >
> > > When Linux is running as the root partition, the hypercall page will
> > > have already been setup by Hyper-V. Copy the content over to the
> > > allocated page.
> >
> > And we can't setup a new hypercall page by writing something different
> > to HV_X64_MSR_HYPERCALL, right?
> >
>
> My understanding is that we can't, but Sunil can maybe correct me.
That is correct. For root partition, the hypervisor has already allocated the
hypercall page. The root is required to query the page, map it in its address
space and wrmsr to enable it. It cannot change the location of the page. For
guest, it can allocate and assign the hypercall page. This is covered a bit in the
hypervisor TLFS (section 3.13 in TLFS v6), for the guest side. The root side is
not covered there, yet.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [EXTERNAL] Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root
2020-09-16 21:34 ` [EXTERNAL] " Sunil Muthuswamy via Virtualization
@ 2020-09-17 11:06 ` Vitaly Kuznetsov
0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2020-09-17 11:06 UTC (permalink / raw)
To: Sunil Muthuswamy, Wei Liu
Cc: Linux on Hyper-V List, Stephen Hemminger, Lillian Grassin-Drake,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
virtualization@lists.linux-foundation.org, Linux Kernel List,
Michael Kelley, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
Thomas Gleixner, Nuno Das Neves, Vineeth Pillai, Haiyang Zhang
Sunil Muthuswamy <sunilmut@microsoft.com> writes:
>>
>> On Tue, Sep 15, 2020 at 12:32:29PM +0200, Vitaly Kuznetsov wrote:
>> > Wei Liu <wei.liu@kernel.org> writes:
>> >
>> > > When Linux is running as the root partition, the hypercall page will
>> > > have already been setup by Hyper-V. Copy the content over to the
>> > > allocated page.
>> >
>> > And we can't setup a new hypercall page by writing something different
>> > to HV_X64_MSR_HYPERCALL, right?
>> >
>>
>> My understanding is that we can't, but Sunil can maybe correct me.
>
> That is correct. For root partition, the hypervisor has already allocated the
> hypercall page. The root is required to query the page, map it in its address
> space and wrmsr to enable it. It cannot change the location of the page. For
> guest, it can allocate and assign the hypercall page. This is covered a bit in the
> hypervisor TLFS (section 3.13 in TLFS v6), for the guest side. The root side is
> not covered there, yet.
Ok, so it is guaranteed that root partition doesn't have this page in
its address space yet, otherwise it could've been used for something
else (in case it's just normal memory from its PoV).
Please add a comment about this as it is not really obvious.
Thanks,
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 04/18] iommu/hyperv: don't setup IRQ remapping when running as root
[not found] ` <20200914112802.80611-5-wei.liu@kernel.org>
@ 2020-09-18 9:12 ` Joerg Roedel
0 siblings, 0 replies; 14+ messages in thread
From: Joerg Roedel @ 2020-09-18 9:12 UTC (permalink / raw)
To: Wei Liu
Cc: Linux on Hyper-V List, Stephen Hemminger, Nuno Das Neves,
Haiyang Zhang, Linux Kernel List, Michael Kelley,
open list:IOMMU DRIVERS, Sunil Muthuswamy, virtualization,
Vineeth Pillai
On Mon, Sep 14, 2020 at 11:27:48AM +0000, Wei Liu wrote:
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> drivers/iommu/hyperv-iommu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Acked-by: Joerg Roedel <jroedel@suse.de>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v1 12/18] asm-generic/hyperv: update hv_interrupt_entry
[not found] ` <20200914115928.83184-4-wei.liu@kernel.org>
@ 2020-10-01 14:33 ` Rob Herring
0 siblings, 0 replies; 14+ messages in thread
From: Rob Herring @ 2020-10-01 14:33 UTC (permalink / raw)
To: Wei Liu
Cc: linux-arch, Linux on Hyper-V List, Lorenzo Pieralisi,
Stephen Hemminger, Arnd Bergmann, linux-pci, Haiyang Zhang,
Linux Kernel List, virtualization, Bjorn Helgaas,
Sunil Muthuswamy, Michael Kelley, Nuno Das Neves, Vineeth Pillai
On Mon, 14 Sep 2020 11:59:21 +0000, Wei Liu wrote:
> We will soon use the same structure to handle IO-APIC interrupts as
> well. Introduce an enum to identify the source and a data structure for
> IO-APIC RTE.
>
> While at it, update pci-hyperv.c to use the enum.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> drivers/pci/controller/pci-hyperv.c | 2 +-
> include/asm-generic/hyperv-tlfs.h | 36 +++++++++++++++++++++++++++--
> 2 files changed, 35 insertions(+), 3 deletions(-)
>
Acked-by: Rob Herring <robh@kernel.org>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-10-01 14:33 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20200914112802.80611-1-wei.liu@kernel.org>
[not found] ` <20200914112802.80611-6-wei.liu@kernel.org>
2020-09-15 10:10 ` [PATCH RFC v1 05/18] clocksource/hyperv: use MSR-based access if running as root Vitaly Kuznetsov
[not found] ` <20200914112802.80611-7-wei.liu@kernel.org>
2020-09-15 10:16 ` [PATCH RFC v1 06/18] x86/hyperv: allocate output arg pages if required Vitaly Kuznetsov
[not found] ` <20200915124318.z6tisek5y4d7e254@liuwe-devbox-debian-v2>
[not found] ` <20200916154200.7nf74vjmqu3f6sfq@liuwe-devbox-debian-v2>
2020-09-16 16:10 ` Vitaly Kuznetsov
[not found] ` <20200914112802.80611-8-wei.liu@kernel.org>
2020-09-15 10:27 ` [PATCH RFC v1 07/18] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary Vitaly Kuznetsov
[not found] ` <20200914112802.80611-9-wei.liu@kernel.org>
2020-09-15 10:32 ` [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root Vitaly Kuznetsov
[not found] ` <20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2>
2020-09-15 11:02 ` Vitaly Kuznetsov
[not found] ` <20200915111657.boa4cneqjqtmcaaq@liuwe-devbox-debian-v2>
2020-09-15 11:23 ` Vitaly Kuznetsov
2020-09-16 21:34 ` [EXTERNAL] " Sunil Muthuswamy via Virtualization
2020-09-17 11:06 ` Vitaly Kuznetsov
[not found] ` <20200914115928.83184-1-wei.liu@kernel.org>
2020-09-15 11:00 ` [PATCH RFC v1 09/18] x86/hyperv: provide a bunch of helper functions Vitaly Kuznetsov
[not found] ` <20200914115928.83184-2-wei.liu@kernel.org>
2020-09-15 11:14 ` [PATCH RFC v1 10/18] x86/hyperv: implement and use hv_smp_prepare_cpus Vitaly Kuznetsov
[not found] ` <20200914115928.83184-5-wei.liu@kernel.org>
2020-09-15 11:16 ` [PATCH RFC v1 13/18] asm-generic/hyperv: introduce hv_device_id and auxiliary structures Vitaly Kuznetsov
[not found] ` <20200914112802.80611-5-wei.liu@kernel.org>
2020-09-18 9:12 ` [PATCH RFC v1 04/18] iommu/hyperv: don't setup IRQ remapping when running as root Joerg Roedel
[not found] ` <20200914115928.83184-4-wei.liu@kernel.org>
2020-10-01 14:33 ` [PATCH RFC v1 12/18] asm-generic/hyperv: update hv_interrupt_entry Rob Herring
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).