* [PATCH v4 2/2] PCI: hv: Refactor hv_irq_unmask() to use cpumask_to_vpset()
From: Maya Nakamura @ 2019-02-28 2:37 UTC (permalink / raw)
To: lorenzo.pieralisi, bhelgaas, linux-pci, kys, sthemmin, olaf, apw,
jasowang, mikelley, Alexander.Levin
Cc: linux-kernel, haiyangz, vkuznets, marcelo.cerri, linux-hyperv
In-Reply-To: <cover.1551319643.git.m.maya.nakamura@gmail.com>
Remove the duplicate implementation of cpumask_to_vpset() and use the
shared implementation. Export hv_max_vp_index, which is required by
cpumask_to_vpset().
Apply changes to hv_irq_unmask() based on feedback.
Based on Vitaly's finding, use GFP_ATOMIC instead of GFP_KERNEL for
alloc_cpumask_var() because hv_irq_unmask() runs while a spinlock is
held.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
Changes in v4:
- Replace GFP_KERNEL with GFP_ATOMIC for alloc_cpumask_var().
- Update the commit message.
Changes in v3:
- Modify to catch all failures from cpumask_to_vpset().
- Correct the v2 change log about the commit message.
Changes in v2:
- Remove unnecessary nr_bank initialization.
- Delete two unnecessary dev_err()'s.
- Unlock before returning.
- Update the commit message.
arch/x86/hyperv/hv_init.c | 1 +
drivers/pci/controller/pci-hyperv.c | 38 +++++++++++++----------------
2 files changed, 18 insertions(+), 21 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7abb09e2eeb8..7f2eed1fc81b 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -96,6 +96,7 @@ void __percpu **hyperv_pcpu_input_arg;
EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
u32 hv_max_vp_index;
+EXPORT_SYMBOL_GPL(hv_max_vp_index);
static int hv_cpu_init(unsigned int cpu)
{
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index d71695db1ba0..95441a35eceb 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -391,8 +391,6 @@ struct hv_interrupt_entry {
u32 data;
};
-#define HV_VP_SET_BANK_COUNT_MAX 5 /* current implementation limit */
-
/*
* flags for hv_device_interrupt_target.flags
*/
@@ -908,12 +906,12 @@ static void hv_irq_unmask(struct irq_data *data)
struct retarget_msi_interrupt *params;
struct hv_pcibus_device *hbus;
struct cpumask *dest;
+ cpumask_var_t tmp;
struct pci_bus *pbus;
struct pci_dev *pdev;
unsigned long flags;
u32 var_size = 0;
- int cpu_vmbus;
- int cpu;
+ int cpu, nr_bank;
u64 res;
dest = irq_data_get_effective_affinity_mask(data);
@@ -953,29 +951,27 @@ static void hv_irq_unmask(struct irq_data *data)
*/
params->int_target.flags |=
HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
- params->int_target.vp_set.valid_bank_mask =
- (1ull << HV_VP_SET_BANK_COUNT_MAX) - 1;
+
+ if (!alloc_cpumask_var(&tmp, GFP_ATOMIC)) {
+ res = 1;
+ goto exit_unlock;
+ }
+
+ cpumask_and(tmp, dest, cpu_online_mask);
+ nr_bank = cpumask_to_vpset(¶ms->int_target.vp_set, tmp);
+ free_cpumask_var(tmp);
+
+ if (nr_bank <= 0) {
+ res = 1;
+ goto exit_unlock;
+ }
/*
* var-sized hypercall, var-size starts after vp_mask (thus
* vp_set.format does not count, but vp_set.valid_bank_mask
* does).
*/
- var_size = 1 + HV_VP_SET_BANK_COUNT_MAX;
-
- for_each_cpu_and(cpu, dest, cpu_online_mask) {
- cpu_vmbus = hv_cpu_number_to_vp_number(cpu);
-
- if (cpu_vmbus >= HV_VP_SET_BANK_COUNT_MAX * 64) {
- dev_err(&hbus->hdev->device,
- "too high CPU %d", cpu_vmbus);
- res = 1;
- goto exit_unlock;
- }
-
- params->int_target.vp_set.bank_contents[cpu_vmbus / 64] |=
- (1ULL << (cpu_vmbus & 63));
- }
+ var_size = 1 + nr_bank;
} else {
for_each_cpu_and(cpu, dest, cpu_online_mask) {
params->int_target.vp_mask |=
--
2.17.1
^ permalink raw reply related
* [PATCH v4 1/2] PCI: hv: Replace hv_vp_set with hv_vpset
From: Maya Nakamura @ 2019-02-28 2:35 UTC (permalink / raw)
To: lorenzo.pieralisi, bhelgaas, linux-pci, kys, sthemmin, olaf, apw,
jasowang, mikelley, Alexander.Levin
Cc: linux-kernel, haiyangz, vkuznets, marcelo.cerri, linux-hyperv
In-Reply-To: <cover.1551319643.git.m.maya.nakamura@gmail.com>
Remove a duplicate definition of VP set (hv_vp_set) and use the common
definition (hv_vpset) that is used in other places.
Change the order of the members in struct hv_pcibus_device so that the
declaration of retarget_msi_interrupt_params is the last member. Struct
hv_vpset, which contains a flexible array, is nested two levels deep in
struct hv_pcibus_device via retarget_msi_interrupt_params.
Add a comment that retarget_msi_interrupt_params should be the last
member of struct hv_pcibus_device.
Based on Vitaly's finding, add __aligned(8) to struct
retarget_msi_interrupt because Hyper-V requires that hypercall arguments
be aligned on an 8 byte boundary.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
Changes in v4:
- Add __aligned(8) to struct retarget_msi_interrupt.
- Update the commit message.
Change in v3:
- Correct the v2 change log.
Change in v2:
- Update the commit message.
drivers/pci/controller/pci-hyperv.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 9ba4d12c179c..d71695db1ba0 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -393,12 +393,6 @@ struct hv_interrupt_entry {
#define HV_VP_SET_BANK_COUNT_MAX 5 /* current implementation limit */
-struct hv_vp_set {
- u64 format; /* 0 (HvGenericSetSparse4k) */
- u64 valid_banks;
- u64 masks[HV_VP_SET_BANK_COUNT_MAX];
-};
-
/*
* flags for hv_device_interrupt_target.flags
*/
@@ -410,7 +404,7 @@ struct hv_device_interrupt_target {
u32 flags;
union {
u64 vp_mask;
- struct hv_vp_set vp_set;
+ struct hv_vpset vp_set;
};
};
@@ -420,7 +414,7 @@ struct retarget_msi_interrupt {
struct hv_interrupt_entry int_entry;
u64 reserved2;
struct hv_device_interrupt_target int_target;
-} __packed;
+} __packed __aligned(8);
/*
* Driver specific state.
@@ -460,12 +454,16 @@ struct hv_pcibus_device {
struct msi_controller msi_chip;
struct irq_domain *irq_domain;
- /* hypercall arg, must not cross page boundary */
- struct retarget_msi_interrupt retarget_msi_interrupt_params;
-
spinlock_t retarget_msi_interrupt_lock;
struct workqueue_struct *wq;
+
+ /* hypercall arg, must not cross page boundary */
+ struct retarget_msi_interrupt retarget_msi_interrupt_params;
+
+ /*
+ * Don't put anything here: retarget_msi_interrupt_params must be last
+ */
};
/*
@@ -955,12 +953,13 @@ static void hv_irq_unmask(struct irq_data *data)
*/
params->int_target.flags |=
HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
- params->int_target.vp_set.valid_banks =
+ params->int_target.vp_set.valid_bank_mask =
(1ull << HV_VP_SET_BANK_COUNT_MAX) - 1;
/*
* var-sized hypercall, var-size starts after vp_mask (thus
- * vp_set.format does not count, but vp_set.valid_banks does).
+ * vp_set.format does not count, but vp_set.valid_bank_mask
+ * does).
*/
var_size = 1 + HV_VP_SET_BANK_COUNT_MAX;
@@ -974,7 +973,7 @@ static void hv_irq_unmask(struct irq_data *data)
goto exit_unlock;
}
- params->int_target.vp_set.masks[cpu_vmbus / 64] |=
+ params->int_target.vp_set.bank_contents[cpu_vmbus / 64] |=
(1ULL << (cpu_vmbus & 63));
}
} else {
--
2.17.1
^ permalink raw reply related
* [PATCH v4 0/2] PCI: hv: Refactor hv_irq_unmask() to use hv_vpset and cpumask_to_vpset()
From: Maya Nakamura @ 2019-02-28 2:32 UTC (permalink / raw)
To: lorenzo.pieralisi, bhelgaas, linux-pci, kys, sthemmin, olaf, apw,
jasowang, mikelley, Alexander.Levin
Cc: linux-kernel, haiyangz, vkuznets, marcelo.cerri, linux-hyperv
This patchset removes a duplicate definition of VP set (hv_vp_set) and
uses the common definition (hv_vpset) that is used in other places. It
changes the order of the members in struct hv_pcibus_device due to
flexible array in hv_vpset.
It also removes the duplicate implementation of cpumask_to_vpset(), uses
the shared implementation, and exports hv_max_vp_index, which is
required by cpumask_to_vpset().
Based on Vitaly's findings, two changes were applied: replace GFP_KERNEL
with GFP_ATOMIC for alloc_cpumask_var() because hv_irq_unmask() runs
while a spinlock is held, and add __aligned(8) to struct
retarget_msi_interrupt because Hyper-V requires that hypercall arguments
be aligned on an 8 byte boundary.
Vitaly, thank you for finding the issues, and Lorenzo and Michael, thank
you for your guidance and support!
Maya Nakamura (2):
PCI: hv: Replace hv_vp_set with hv_vpset
PCI: hv: Refactor hv_irq_unmask() to use cpumask_to_vpset()
arch/x86/hyperv/hv_init.c | 1 +
drivers/pci/controller/pci-hyperv.c | 61 +++++++++++++----------------
2 files changed, 29 insertions(+), 33 deletions(-)
--
2.17.1
^ permalink raw reply
* RE: linux-next: Tree for Feb 27 (mshyperv)
From: Michael Kelley @ 2019-02-27 18:51 UTC (permalink / raw)
To: Randy Dunlap, Stephen Rothwell, Linux Next Mailing List,
joro@8bytes.org, Tianyu Lan
Cc: Linux Kernel Mailing List, linux-hyperv@vger.kernel.org
In-Reply-To: <a6cfb923-c21e-b0e3-44e1-33197aac2625@infradead.org>
From: Randy Dunlap <rdunlap@infradead.org> Sent: Wednesday, February 27, 2019 9:25 AM
>
> on i386:
>
> ../arch/x86/kernel/cpu/mshyperv.c: In function 'ms_hyperv_init_platform':
> ../arch/x86/kernel/cpu/mshyperv.c:339:3: error: 'x2apic_phys' undeclared (first use in this
> function)
> x2apic_phys = 1;
> ^
FYI, the kbuild test robot flagged this issue overnight, and Tianyu Lan submitted a
fix earlier this morning.
Michael
^ permalink raw reply
* RE: [PATCH V6 2/3] IOMMU/Hyper-V: Add Hyper-V stub IOMMU driver
From: Michael Kelley @ 2019-02-27 17:31 UTC (permalink / raw)
To: lantianyu1986@gmail.com, joro@8bytes.org,
mchehab+samsung@kernel.org, davem@davemloft.net,
gregkh@linuxfoundation.org, nicolas.ferre@microchip.com,
arnd@arndb.de, KY Srinivasan, vkuznets,
alex.williamson@redhat.com, sashal@kernel.org,
dan.carpenter@oracle.com
Cc: Tianyu Lan, linux-kernel@vger.kernel.org,
iommu@lists.linux-foundation.org, linux-hyperv@vger.kernel.org
In-Reply-To: <1551279245-25888-3-git-send-email-Tianyu.Lan@microsoft.com>
From: lantianyu1986@gmail.com <lantianyu1986@gmail.com> Sent: Wednesday, February 27, 2019 6:54 AM
>
> On the bare metal, enabling X2APIC mode requires interrupt remapping
> function which helps to deliver irq to cpu with 32-bit APIC ID.
> Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> MSI protocol already supports to deliver interrupt to the CPU whose
> virtual processor index is more than 255. IO-APIC interrupt still has
> 8-bit APIC ID limitation.
>
> This patch is to add Hyper-V stub IOMMU driver in order to enable
> X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> interrupt remapping capability when X2APIC mode is available. Otherwise,
> it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
> and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
>
> Define 24 IO-APIC remapping entries because Hyper-V only expose one
> single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec(
> https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf).
>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Reconfirming my reviewed-by after the change to fix the
compile error detected by the kbuild test robot.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH V6 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available
From: Michael Kelley @ 2019-02-27 17:30 UTC (permalink / raw)
To: lantianyu1986@gmail.com, KY Srinivasan, Haiyang Zhang,
Stephen Hemminger, sashal@kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org,
joro@8bytes.org, davem@davemloft.net, mchehab+samsung@kernel.org,
gregkh@linuxfoundation.org, nicolas.ferre@microchip.com,
arnd@arndb.de, vkuznets, alex.williamson@redhat.com,
dan.carpenter@oracle.com
Cc: Tianyu Lan, linux-kernel@vger.kernel.org,
devel@linuxdriverproject.org, iommu@lists.linux-foundation.org,
linux-hyperv@vger.kernel.org
In-Reply-To: <1551279245-25888-2-git-send-email-Tianyu.Lan@microsoft.com>
From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Wednesday, February 27, 2019 6:54 AM
>
> Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> set x2apic destination mode to physcial mode when x2apic is available
> and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
> 8-bit APIC id.
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
> ---
> Change since v5:
> - Fix comile error due to x2apic_phys
>
> Change since v2:
> - Fix compile error due to x2apic_phys
> - Fix comment indent
> Change since v1:
> - Remove redundant extern for x2apic_phys
> ---
> arch/x86/kernel/cpu/mshyperv.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
Reconfirming my reviewed-by after the change to fix the
compile error detected by the kbuild test robot.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* Re: linux-next: Tree for Feb 27 (mshyperv)
From: Randy Dunlap @ 2019-02-27 17:25 UTC (permalink / raw)
To: Stephen Rothwell, Linux Next Mailing List
Cc: Linux Kernel Mailing List, linux-hyperv
In-Reply-To: <20190227174427.4e8f7f24@canb.auug.org.au>
On 2/26/19 10:44 PM, Stephen Rothwell wrote:
> Hi all,
>
> Changes since 20190226:
>
on i386:
../arch/x86/kernel/cpu/mshyperv.c: In function ‘ms_hyperv_init_platform’:
../arch/x86/kernel/cpu/mshyperv.c:339:3: error: ‘x2apic_phys’ undeclared (first use in this function)
x2apic_phys = 1;
^
--
~Randy
^ permalink raw reply
* [PATCH V6 3/3] MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scope
From: lantianyu1986 @ 2019-02-27 14:54 UTC (permalink / raw)
To: davem, mchehab+samsung, gregkh, nicolas.ferre, arnd,
michael.h.kelley, kys, vkuznets, alex.williamson, joro, sashal,
dan.carpenter
Cc: Lan Tianyu, linux-kernel, linux-hyperv
In-Reply-To: <1551279245-25888-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
This patch is to add Hyper-V IOMMU driver file into Hyper-V CORE and
DRIVERS scope.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 9f64f8d..5fb6306 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7015,6 +7015,7 @@ F: drivers/net/hyperv/
F: drivers/scsi/storvsc_drv.c
F: drivers/uio/uio_hv_generic.c
F: drivers/video/fbdev/hyperv_fb.c
+F: drivers/iommu/hyperv_iommu.c
F: net/vmw_vsock/hyperv_transport.c
F: include/linux/hyperv.h
F: include/uapi/linux/hyperv.h
--
2.7.4
^ permalink raw reply related
* [PATCH V6 2/3] IOMMU/Hyper-V: Add Hyper-V stub IOMMU driver
From: lantianyu1986 @ 2019-02-27 14:54 UTC (permalink / raw)
To: joro, mchehab+samsung, davem, gregkh, nicolas.ferre, arnd,
michael.h.kelley, kys, vkuznets, alex.williamson, sashal,
dan.carpenter
Cc: Lan Tianyu, linux-kernel, iommu, linux-hyperv
In-Reply-To: <1551279245-25888-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
On the bare metal, enabling X2APIC mode requires interrupt remapping
function which helps to deliver irq to cpu with 32-bit APIC ID.
Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
MSI protocol already supports to deliver interrupt to the CPU whose
virtual processor index is more than 255. IO-APIC interrupt still has
8-bit APIC ID limitation.
This patch is to add Hyper-V stub IOMMU driver in order to enable
X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
interrupt remapping capability when X2APIC mode is available. Otherwise,
it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
Define 24 IO-APIC remapping entries because Hyper-V only expose one
single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec(
https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf).
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
Change sine v5:
- Include asm/cpu.h and asm/apic.h to avoid compile error.
Change since v4:
- Fix the loop of scan cpu's APIC id
Change since v3:
- Make Hyper-V IOMMU as Hyper-V default driver
- Fix hypervisor_is_type() input parameter
- Check possible cpu numbers during scan 0~255 cpu's apic id.
Change since v2:
- Improve comment about why save IO-APIC entry in the irq chip data.
- Some code improvement.
- Improve statement in the IOMMU Kconfig.
Change since v1:
- Remove unused pr_fmt
- Make ioapic_ir_domain as static variable
- Remove unused variables cfg and entry in the hyperv_irq_remapping_alloc()
- Fix comments
Fix 2
---
drivers/iommu/Kconfig | 9 ++
drivers/iommu/Makefile | 1 +
drivers/iommu/hyperv-iommu.c | 196 ++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/irq_remapping.c | 3 +
drivers/iommu/irq_remapping.h | 1 +
5 files changed, 210 insertions(+)
create mode 100644 drivers/iommu/hyperv-iommu.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 45d7021..6f07f3b 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -437,4 +437,13 @@ config QCOM_IOMMU
help
Support for IOMMU on certain Qualcomm SoCs.
+config HYPERV_IOMMU
+ bool "Hyper-V x2APIC IRQ Handling"
+ depends on HYPERV
+ select IOMMU_API
+ default HYPERV
+ help
+ Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux
+ guests to run with x2APIC mode enabled.
+
endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a158a68..8c71a15 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
new file mode 100644
index 0000000..a386b83
--- /dev/null
+++ b/drivers/iommu/hyperv-iommu.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Hyper-V stub IOMMU driver.
+ *
+ * Copyright (C) 2019, Microsoft, Inc.
+ *
+ * Author : Lan Tianyu <Tianyu.Lan@microsoft.com>
+ */
+
+#include <linux/types.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+
+#include <asm/apic.h>
+#include <asm/cpu.h>
+#include <asm/hw_irq.h>
+#include <asm/io_apic.h>
+#include <asm/irq_remapping.h>
+#include <asm/hypervisor.h>
+
+#include "irq_remapping.h"
+
+#ifdef CONFIG_IRQ_REMAP
+
+/*
+ * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
+ * Redirection Table. Hyper-V exposes one single IO-APIC and so define
+ * 24 IO APIC remmapping entries.
+ */
+#define IOAPIC_REMAPPING_ENTRY 24
+
+static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
+static struct irq_domain *ioapic_ir_domain;
+
+static int hyperv_ir_set_affinity(struct irq_data *data,
+ const struct cpumask *mask, bool force)
+{
+ struct irq_data *parent = data->parent_data;
+ struct irq_cfg *cfg = irqd_cfg(data);
+ struct IO_APIC_route_entry *entry;
+ int ret;
+
+ /* Return error If new irq affinity is out of ioapic_max_cpumask. */
+ if (!cpumask_subset(mask, &ioapic_max_cpumask))
+ return -EINVAL;
+
+ ret = parent->chip->irq_set_affinity(parent, mask, force);
+ if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+ return ret;
+
+ entry = data->chip_data;
+ entry->dest = cfg->dest_apicid;
+ entry->vector = cfg->vector;
+ send_cleanup_vector(cfg);
+
+ return 0;
+}
+
+static struct irq_chip hyperv_ir_chip = {
+ .name = "HYPERV-IR",
+ .irq_ack = apic_ack_irq,
+ .irq_set_affinity = hyperv_ir_set_affinity,
+};
+
+static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *arg)
+{
+ struct irq_alloc_info *info = arg;
+ struct irq_data *irq_data;
+ struct irq_desc *desc;
+ int ret = 0;
+
+ if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
+ return -EINVAL;
+
+ ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+ if (ret < 0)
+ return ret;
+
+ irq_data = irq_domain_get_irq_data(domain, virq);
+ if (!irq_data) {
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+ return -EINVAL;
+ }
+
+ irq_data->chip = &hyperv_ir_chip;
+
+ /*
+ * If there is interrupt remapping function of IOMMU, setting irq
+ * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
+ * support interrupt remapping function, setting irq affinity of IO-APIC
+ * interrupts still needs to change IO-APIC registers. But ioapic_
+ * configure_entry() will ignore value of cfg->vector and cfg->
+ * dest_apicid when IO-APIC's parent irq domain is not the vector
+ * domain.(See ioapic_configure_entry()) In order to setting vector
+ * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
+ * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
+ * affinity() set vector and dest_apicid directly into IO-APIC entry.
+ */
+ irq_data->chip_data = info->ioapic_entry;
+
+ /*
+ * Hypver-V IO APIC irq affinity should be in the scope of
+ * ioapic_max_cpumask because no irq remapping support.
+ */
+ desc = irq_data_to_desc(irq_data);
+ cpumask_copy(desc->irq_common_data.affinity, &ioapic_max_cpumask);
+
+ return 0;
+}
+
+static void hyperv_irq_remapping_free(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs)
+{
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static int hyperv_irq_remapping_activate(struct irq_domain *domain,
+ struct irq_data *irq_data, bool reserve)
+{
+ struct irq_cfg *cfg = irqd_cfg(irq_data);
+ struct IO_APIC_route_entry *entry = irq_data->chip_data;
+
+ entry->dest = cfg->dest_apicid;
+ entry->vector = cfg->vector;
+
+ return 0;
+}
+
+static struct irq_domain_ops hyperv_ir_domain_ops = {
+ .alloc = hyperv_irq_remapping_alloc,
+ .free = hyperv_irq_remapping_free,
+ .activate = hyperv_irq_remapping_activate,
+};
+
+static int __init hyperv_prepare_irq_remapping(void)
+{
+ struct fwnode_handle *fn;
+ int i;
+
+ if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
+ !x2apic_supported())
+ return -ENODEV;
+
+ fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+ if (!fn)
+ return -ENOMEM;
+
+ ioapic_ir_domain =
+ irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
+ 0, IOAPIC_REMAPPING_ENTRY, fn,
+ &hyperv_ir_domain_ops, NULL);
+
+ irq_domain_free_fwnode(fn);
+
+ /*
+ * Hyper-V doesn't provide irq remapping function for
+ * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
+ * Cpu's APIC ID is read from ACPI MADT table and APIC IDs
+ * in the MADT table on Hyper-v are sorted monotonic increasingly.
+ * APIC ID reflects cpu topology. There maybe some APIC ID
+ * gaps when cpu number in a socket is not power of two. Prepare
+ * max cpu affinity for IOAPIC irqs. Scan cpu 0-255 and set cpu
+ * into ioapic_max_cpumask if its APIC ID is less than 256.
+ */
+ for (i = min_t(unsigned int, num_possible_cpus() - 1, 255); i >= 0; i--)
+ if (cpu_physical_id(i) < 256)
+ cpumask_set_cpu(i, &ioapic_max_cpumask);
+
+ return 0;
+}
+
+static int __init hyperv_enable_irq_remapping(void)
+{
+ return IRQ_REMAP_X2APIC_MODE;
+}
+
+static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
+{
+ if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
+ return ioapic_ir_domain;
+ else
+ return NULL;
+}
+
+struct irq_remap_ops hyperv_irq_remap_ops = {
+ .prepare = hyperv_prepare_irq_remapping,
+ .enable = hyperv_enable_irq_remapping,
+ .get_ir_irq_domain = hyperv_get_ir_irq_domain,
+};
+
+#endif
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index b94ebd4..81cf290 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -103,6 +103,9 @@ int __init irq_remapping_prepare(void)
else if (IS_ENABLED(CONFIG_AMD_IOMMU) &&
amd_iommu_irq_ops.prepare() == 0)
remap_ops = &amd_iommu_irq_ops;
+ else if (IS_ENABLED(CONFIG_HYPERV_IOMMU) &&
+ hyperv_irq_remap_ops.prepare() == 0)
+ remap_ops = &hyperv_irq_remap_ops;
else
return -ENOSYS;
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 0afef6e..f8609e9 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -64,6 +64,7 @@ struct irq_remap_ops {
extern struct irq_remap_ops intel_irq_remap_ops;
extern struct irq_remap_ops amd_iommu_irq_ops;
+extern struct irq_remap_ops hyperv_irq_remap_ops;
#else /* CONFIG_IRQ_REMAP */
--
2.7.4
^ permalink raw reply related
* [PATCH V6 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available
From: lantianyu1986 @ 2019-02-27 14:54 UTC (permalink / raw)
To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86, joro,
davem, mchehab+samsung, gregkh, nicolas.ferre, arnd,
michael.h.kelley, vkuznets, alex.williamson, dan.carpenter
Cc: Lan Tianyu, linux-kernel, devel, iommu, linux-hyperv
In-Reply-To: <1551279245-25888-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
set x2apic destination mode to physcial mode when x2apic is available
and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
8-bit APIC id.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
Change since v5:
- Fix comile error due to x2apic_phys
Change since v2:
- Fix compile error due to x2apic_phys
- Fix comment indent
Change since v1:
- Remove redundant extern for x2apic_phys
---
arch/x86/kernel/cpu/mshyperv.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e81a2db..3fa238a 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -328,6 +328,18 @@ static void __init ms_hyperv_init_platform(void)
# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
# endif
+
+ /*
+ * Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
+ * set x2apic destination mode to physcial mode when x2apic is available
+ * and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs
+ * have 8-bit APIC id.
+ */
+# ifdef CONFIG_X86_X2APIC
+ if (x2apic_supported())
+ x2apic_phys = 1;
+# endif
+
#endif
}
--
2.7.4
^ permalink raw reply related
* [PATCH V6 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
From: lantianyu1986 @ 2019-02-27 14:54 UTC (permalink / raw)
To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86, joro,
davem, mchehab+samsung, gregkh, nicolas.ferre, arnd,
michael.h.kelley, vkuznets, alex.williamson, dan.carpenter
Cc: Lan Tianyu, iommu, linux-kernel, linux-hyperv
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
On the bare metal, enabling X2APIC mode requires interrupt remapping
function which helps to deliver irq to cpu with 32-bit APIC ID.
Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
MSI protocol already supports to deliver interrupt to the CPU whose
virtual processor index is more than 255. IO-APIC interrupt still has
8-bit APIC ID limitation.
This patchset is to add Hyper-V stub IOMMU driver in order to enable
X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
interrupt remapping capability when X2APIC mode is available. X2APIC
destination mode is set to physical by PATCH 1 when X2APIC is available.
Hyper-V IOMMU driver will scan cpu 0~255 and set cpu into IO-APIC MAX cpu
affinity cpumask if its APIC ID is 8-bit. Driver creates a Hyper-V irq domain
to limit IO-APIC interrupts' affinity and make sure cpus assigned with IO-APIC
interrupt are in the scope of IO-APIC MAX cpu affinity.
Lan Tianyu (3):
x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
available
HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
scope
MAINTAINERS | 1 +
arch/x86/kernel/cpu/mshyperv.c | 12 +++
drivers/iommu/Kconfig | 9 ++
drivers/iommu/Makefile | 1 +
drivers/iommu/hyperv-iommu.c | 196 +++++++++++++++++++++++++++++++++++++++++
drivers/iommu/irq_remapping.c | 3 +
drivers/iommu/irq_remapping.h | 1 +
7 files changed, 223 insertions(+)
create mode 100644 drivers/iommu/hyperv-iommu.c
--
2.7.4
^ permalink raw reply
* Re: [PATCH hyperv-fixes] hv_netvsc: Fix IP header checksum for coalesced packets
From: David Miller @ 2019-02-26 22:45 UTC (permalink / raw)
To: haiyangz, haiyangz
Cc: sashal, linux-hyperv, kys, sthemmin, olaf, vkuznets, netdev,
linux-kernel
In-Reply-To: <20190222182503.12160-1-haiyangz@linuxonhyperv.com>
From: Haiyang Zhang <haiyangz@linuxonhyperv.com>
Date: Fri, 22 Feb 2019 18:25:03 +0000
> From: Haiyang Zhang <haiyangz@microsoft.com>
>
> Incoming packets may have IP header checksum verified by the host.
> They may not have IP header checksum computed after coalescing.
> This patch re-compute the checksum when necessary, otherwise the
> packets may be dropped, because Linux network stack always checks it.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Applied and queued up for -stable.
^ permalink raw reply
* [Update PATCH V3 2/10] KVM/VMX: Fill range list in kvm_fill_hv_flush_list_func()
From: lantianyu1986 @ 2019-02-26 14:21 UTC (permalink / raw)
To: pbonzini, rkrcmar, tglx, mingo, bp, hpa, x86, michael.h.kelley,
kys, vkuznets
Cc: Lan Tianyu, kvm, linux-kernel, linux-hyperv
In-Reply-To: <20190222150637.2337-1-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
Populate ranges on the flush list into struct hv_guest_mapping_flush_list
when flush list is available in the struct kvm_tlb_range.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
Update:
- Add check of return value "offset" in the kvm_fill_hv_flush_list_func()
Change since v2:
- Fix calculation of flush pages in the kvm_fill_hv_flush_list_func()
---
arch/x86/include/asm/kvm_host.h | 7 +++++++
arch/x86/kvm/vmx/vmx.c | 21 +++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 875ae7256608..9fc9dd0c92cb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -318,6 +318,12 @@ struct kvm_rmap_head {
struct kvm_mmu_page {
struct list_head link;
+
+ /*
+ * Tlb flush with range list uses struct kvm_mmu_page as list entry
+ * and all list operations should be under protection of mmu_lock.
+ */
+ struct hlist_node flush_link;
struct hlist_node hash_link;
bool unsync;
bool mmio_cached;
@@ -441,6 +447,7 @@ struct kvm_mmu {
struct kvm_tlb_range {
u64 start_gfn;
u64 pages;
+ struct hlist_head *flush_list;
};
enum pmc_type {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 77b5379e3655..197545121355 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -432,9 +432,26 @@ static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush
void *data)
{
struct kvm_tlb_range *range = data;
+ struct kvm_mmu_page *sp;
- return hyperv_fill_flush_guest_mapping_list(flush, 0, range->start_gfn,
- range->pages);
+ if (!range->flush_list) {
+ return hyperv_fill_flush_guest_mapping_list(flush,
+ 0, range->start_gfn, range->pages);
+ } else {
+ int offset = 0;
+
+ hlist_for_each_entry(sp, range->flush_list, flush_link) {
+ int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1);
+
+ offset = hyperv_fill_flush_guest_mapping_list(flush,
+ offset, sp->gfn, pages);
+ if (offset < 0)
+ return offset;
+
+ }
+
+ return offset;
+ }
}
static inline int __hv_remote_flush_tlb_with_range(struct kvm *kvm,
--
2.14.4
^ permalink raw reply related
* [Update PATCH V3 1/10] X86/Hyper-V: Add parameter offset for hyperv_fill_flush_guest_mapping_list()
From: lantianyu1986 @ 2019-02-26 14:09 UTC (permalink / raw)
To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
pbonzini, rkrcmar, michael.h.kelley, vkuznets
Cc: Lan Tianyu, linux-kernel, kvm, linux-hyperv
In-Reply-To: <20190222150637.2337-2-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
Add parameter offset to specify start position to add flush ranges in
guest address list of struct hv_guest_mapping_flush_list.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
Update "offset" parameter's type of hyperv_fill_flush_guest_mapping_list()
arch/x86/hyperv/nested.c | 4 ++--
arch/x86/include/asm/mshyperv.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index dd0a843f766d..d54c2276c922 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -58,11 +58,11 @@ EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
int hyperv_fill_flush_guest_mapping_list(
struct hv_guest_mapping_flush_list *flush,
- u64 start_gfn, u64 pages)
+ u32 offset, u64 start_gfn, u64 pages)
{
u64 cur = start_gfn;
u64 additional_pages;
- int gpa_n = 0;
+ int gpa_n = offset;
do {
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index cc60e617931c..8b63ed95780e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -357,7 +357,7 @@ int hyperv_flush_guest_mapping_range(u64 as,
hyperv_fill_flush_list_func fill_func, void *data);
int hyperv_fill_flush_guest_mapping_list(
struct hv_guest_mapping_flush_list *flush,
- u64 start_gfn, u64 end_gfn);
+ u32 offset, u64 start_gfn, u64 end_gfn);
#ifdef CONFIG_X86_64
void hv_apic_init(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4950bb20e06a..77b5379e3655 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -433,7 +433,7 @@ static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush
{
struct kvm_tlb_range *range = data;
- return hyperv_fill_flush_guest_mapping_list(flush, range->start_gfn,
+ return hyperv_fill_flush_guest_mapping_list(flush, 0, range->start_gfn,
range->pages);
}
--
2.14.4
^ permalink raw reply related
* Re: [Resend PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
From: Tianyu Lan @ 2019-02-26 13:24 UTC (permalink / raw)
To: Joerg Roedel
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, bp, H. Peter Anvin,
the arch/x86 maintainers, mchehab+samsung, davem,
Greg Kroah-Hartman, nicolas.ferre, Arnd Bergmann,
michael.h.kelley, Vitaly Kuznetsov, Alex Williamson,
Dan Carpenter, Lan Tianyu, devel, iommu,
linux-kernel@vger kernel org, linux-hyperv
In-Reply-To: <20190226130741.GD32526@8bytes.org>
On Tue, Feb 26, 2019 at 9:07 PM Joerg Roedel <joro@8bytes.org> wrote:
>
> On Tue, Feb 26, 2019 at 08:07:17PM +0800, lantianyu1986@gmail.com wrote:
> > Lan Tianyu (3):
> > x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
> > available
> > HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
> > MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
> > scope
>
> Applied (patch 2 with slight subject changes
> 'HYPERV/IOMMU' -> 'iommu/hyper-v'), thanks.
Great. Thanks.
--
Best regards
Tianyu Lan
^ permalink raw reply
* Re: [Resend PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
From: Joerg Roedel @ 2019-02-26 13:07 UTC (permalink / raw)
To: lantianyu1986
Cc: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
mchehab+samsung, davem, gregkh, nicolas.ferre, arnd,
michael.h.kelley, vkuznets, alex.williamson, dan.carpenter,
Lan Tianyu, devel, iommu, linux-kernel, linux-hyperv
In-Reply-To: <1551182840-15069-1-git-send-email-Tianyu.Lan@microsoft.com>
On Tue, Feb 26, 2019 at 08:07:17PM +0800, lantianyu1986@gmail.com wrote:
> Lan Tianyu (3):
> x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
> available
> HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
> MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
> scope
Applied (patch 2 with slight subject changes
'HYPERV/IOMMU' -> 'iommu/hyper-v'), thanks.
^ permalink raw reply
* [Resend PATCH V5 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available
From: lantianyu1986 @ 2019-02-26 12:07 UTC (permalink / raw)
To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86, joro,
mchehab+samsung, davem, gregkh, nicolas.ferre, arnd,
michael.h.kelley, vkuznets, alex.williamson, dan.carpenter
Cc: Lan Tianyu, linux-kernel, devel, iommu, linux-hyperv
In-Reply-To: <1551182840-15069-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
set x2apic destination mode to physcial mode when x2apic is available
and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
8-bit APIC id.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
Change since v2:
- Fix compile error due to x2apic_phys
- Fix comment indent
Change since v1:
- Remove redundant extern for x2apic_phys
---
arch/x86/kernel/cpu/mshyperv.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e81a2db..0c29e4e 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -328,6 +328,16 @@ static void __init ms_hyperv_init_platform(void)
# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
# endif
+
+ /*
+ * Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
+ * set x2apic destination mode to physcial mode when x2apic is available
+ * and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs
+ * have 8-bit APIC id.
+ */
+ if (IS_ENABLED(CONFIG_X86_X2APIC) && x2apic_supported())
+ x2apic_phys = 1;
+
#endif
}
--
2.7.4
^ permalink raw reply related
* [Resend PATCH V5 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
From: lantianyu1986 @ 2019-02-26 12:07 UTC (permalink / raw)
To: joro, mchehab+samsung, davem, gregkh, nicolas.ferre, arnd,
michael.h.kelley, kys, vkuznets, alex.williamson, sashal,
dan.carpenter
Cc: Lan Tianyu, linux-kernel, iommu, linux-hyperv
In-Reply-To: <1551182840-15069-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
On the bare metal, enabling X2APIC mode requires interrupt remapping
function which helps to deliver irq to cpu with 32-bit APIC ID.
Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
MSI protocol already supports to deliver interrupt to the CPU whose
virtual processor index is more than 255. IO-APIC interrupt still has
8-bit APIC ID limitation.
This patch is to add Hyper-V stub IOMMU driver in order to enable
X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
interrupt remapping capability when X2APIC mode is available. Otherwise,
it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
Define 24 IO-APIC remapping entries because Hyper-V only expose one
single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec(
https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf).
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
Change since v4:
- Fix the loop of scan cpu's APIC id
Change since v3:
- Make Hyper-V IOMMU as Hyper-V default driver
- Fix hypervisor_is_type() input parameter
- Check possible cpu numbers during scan 0~255 cpu's apic id.
Change since v2:
- Improve comment about why save IO-APIC entry in the irq chip data.
- Some code improvement.
- Improve statement in the IOMMU Kconfig.
Change since v1:
- Remove unused pr_fmt
- Make ioapic_ir_domain as static variable
- Remove unused variables cfg and entry in the hyperv_irq_remapping_alloc()
- Fix comments
---
drivers/iommu/Kconfig | 9 ++
drivers/iommu/Makefile | 1 +
drivers/iommu/hyperv-iommu.c | 194 ++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/irq_remapping.c | 3 +
drivers/iommu/irq_remapping.h | 1 +
5 files changed, 208 insertions(+)
create mode 100644 drivers/iommu/hyperv-iommu.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 45d7021..6f07f3b 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -437,4 +437,13 @@ config QCOM_IOMMU
help
Support for IOMMU on certain Qualcomm SoCs.
+config HYPERV_IOMMU
+ bool "Hyper-V x2APIC IRQ Handling"
+ depends on HYPERV
+ select IOMMU_API
+ default HYPERV
+ help
+ Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux
+ guests to run with x2APIC mode enabled.
+
endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a158a68..8c71a15 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
new file mode 100644
index 0000000..eb358e9
--- /dev/null
+++ b/drivers/iommu/hyperv-iommu.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Hyper-V stub IOMMU driver.
+ *
+ * Copyright (C) 2019, Microsoft, Inc.
+ *
+ * Author : Lan Tianyu <Tianyu.Lan@microsoft.com>
+ */
+
+#include <linux/types.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+
+#include <asm/hw_irq.h>
+#include <asm/io_apic.h>
+#include <asm/irq_remapping.h>
+#include <asm/hypervisor.h>
+
+#include "irq_remapping.h"
+
+#ifdef CONFIG_IRQ_REMAP
+
+/*
+ * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
+ * Redirection Table. Hyper-V exposes one single IO-APIC and so define
+ * 24 IO APIC remmapping entries.
+ */
+#define IOAPIC_REMAPPING_ENTRY 24
+
+static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
+static struct irq_domain *ioapic_ir_domain;
+
+static int hyperv_ir_set_affinity(struct irq_data *data,
+ const struct cpumask *mask, bool force)
+{
+ struct irq_data *parent = data->parent_data;
+ struct irq_cfg *cfg = irqd_cfg(data);
+ struct IO_APIC_route_entry *entry;
+ int ret;
+
+ /* Return error If new irq affinity is out of ioapic_max_cpumask. */
+ if (!cpumask_subset(mask, &ioapic_max_cpumask))
+ return -EINVAL;
+
+ ret = parent->chip->irq_set_affinity(parent, mask, force);
+ if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+ return ret;
+
+ entry = data->chip_data;
+ entry->dest = cfg->dest_apicid;
+ entry->vector = cfg->vector;
+ send_cleanup_vector(cfg);
+
+ return 0;
+}
+
+static struct irq_chip hyperv_ir_chip = {
+ .name = "HYPERV-IR",
+ .irq_ack = apic_ack_irq,
+ .irq_set_affinity = hyperv_ir_set_affinity,
+};
+
+static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *arg)
+{
+ struct irq_alloc_info *info = arg;
+ struct irq_data *irq_data;
+ struct irq_desc *desc;
+ int ret = 0;
+
+ if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
+ return -EINVAL;
+
+ ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+ if (ret < 0)
+ return ret;
+
+ irq_data = irq_domain_get_irq_data(domain, virq);
+ if (!irq_data) {
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+ return -EINVAL;
+ }
+
+ irq_data->chip = &hyperv_ir_chip;
+
+ /*
+ * If there is interrupt remapping function of IOMMU, setting irq
+ * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
+ * support interrupt remapping function, setting irq affinity of IO-APIC
+ * interrupts still needs to change IO-APIC registers. But ioapic_
+ * configure_entry() will ignore value of cfg->vector and cfg->
+ * dest_apicid when IO-APIC's parent irq domain is not the vector
+ * domain.(See ioapic_configure_entry()) In order to setting vector
+ * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
+ * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
+ * affinity() set vector and dest_apicid directly into IO-APIC entry.
+ */
+ irq_data->chip_data = info->ioapic_entry;
+
+ /*
+ * Hypver-V IO APIC irq affinity should be in the scope of
+ * ioapic_max_cpumask because no irq remapping support.
+ */
+ desc = irq_data_to_desc(irq_data);
+ cpumask_copy(desc->irq_common_data.affinity, &ioapic_max_cpumask);
+
+ return 0;
+}
+
+static void hyperv_irq_remapping_free(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs)
+{
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static int hyperv_irq_remapping_activate(struct irq_domain *domain,
+ struct irq_data *irq_data, bool reserve)
+{
+ struct irq_cfg *cfg = irqd_cfg(irq_data);
+ struct IO_APIC_route_entry *entry = irq_data->chip_data;
+
+ entry->dest = cfg->dest_apicid;
+ entry->vector = cfg->vector;
+
+ return 0;
+}
+
+static struct irq_domain_ops hyperv_ir_domain_ops = {
+ .alloc = hyperv_irq_remapping_alloc,
+ .free = hyperv_irq_remapping_free,
+ .activate = hyperv_irq_remapping_activate,
+};
+
+static int __init hyperv_prepare_irq_remapping(void)
+{
+ struct fwnode_handle *fn;
+ int i;
+
+ if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
+ !x2apic_supported())
+ return -ENODEV;
+
+ fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+ if (!fn)
+ return -ENOMEM;
+
+ ioapic_ir_domain =
+ irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
+ 0, IOAPIC_REMAPPING_ENTRY, fn,
+ &hyperv_ir_domain_ops, NULL);
+
+ irq_domain_free_fwnode(fn);
+
+ /*
+ * Hyper-V doesn't provide irq remapping function for
+ * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
+ * Cpu's APIC ID is read from ACPI MADT table and APIC IDs
+ * in the MADT table on Hyper-v are sorted monotonic increasingly.
+ * APIC ID reflects cpu topology. There maybe some APIC ID
+ * gaps when cpu number in a socket is not power of two. Prepare
+ * max cpu affinity for IOAPIC irqs. Scan cpu 0-255 and set cpu
+ * into ioapic_max_cpumask if its APIC ID is less than 256.
+ */
+ for (i = min_t(unsigned int, num_possible_cpus() - 1, 255); i >= 0; i--)
+ if (cpu_physical_id(i) < 256)
+ cpumask_set_cpu(i, &ioapic_max_cpumask);
+
+ return 0;
+}
+
+static int __init hyperv_enable_irq_remapping(void)
+{
+ return IRQ_REMAP_X2APIC_MODE;
+}
+
+static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
+{
+ if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
+ return ioapic_ir_domain;
+ else
+ return NULL;
+}
+
+struct irq_remap_ops hyperv_irq_remap_ops = {
+ .prepare = hyperv_prepare_irq_remapping,
+ .enable = hyperv_enable_irq_remapping,
+ .get_ir_irq_domain = hyperv_get_ir_irq_domain,
+};
+
+#endif
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index b94ebd4..81cf290 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -103,6 +103,9 @@ int __init irq_remapping_prepare(void)
else if (IS_ENABLED(CONFIG_AMD_IOMMU) &&
amd_iommu_irq_ops.prepare() == 0)
remap_ops = &amd_iommu_irq_ops;
+ else if (IS_ENABLED(CONFIG_HYPERV_IOMMU) &&
+ hyperv_irq_remap_ops.prepare() == 0)
+ remap_ops = &hyperv_irq_remap_ops;
else
return -ENOSYS;
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 0afef6e..f8609e9 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -64,6 +64,7 @@ struct irq_remap_ops {
extern struct irq_remap_ops intel_irq_remap_ops;
extern struct irq_remap_ops amd_iommu_irq_ops;
+extern struct irq_remap_ops hyperv_irq_remap_ops;
#else /* CONFIG_IRQ_REMAP */
--
2.7.4
^ permalink raw reply related
* [Resend PATCH V5 3/3] MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scope
From: lantianyu1986 @ 2019-02-26 12:07 UTC (permalink / raw)
To: mchehab+samsung, davem, gregkh, nicolas.ferre, arnd,
michael.h.kelley, kys, vkuznets, alex.williamson, joro, sashal,
dan.carpenter
Cc: Lan Tianyu, linux-kernel, linux-hyperv
In-Reply-To: <1551182840-15069-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
This patch is to add Hyper-V IOMMU driver file into Hyper-V CORE and
DRIVERS scope.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 9f64f8d..5fb6306 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7015,6 +7015,7 @@ F: drivers/net/hyperv/
F: drivers/scsi/storvsc_drv.c
F: drivers/uio/uio_hv_generic.c
F: drivers/video/fbdev/hyperv_fb.c
+F: drivers/iommu/hyperv_iommu.c
F: net/vmw_vsock/hyperv_transport.c
F: include/linux/hyperv.h
F: include/uapi/linux/hyperv.h
--
2.7.4
^ permalink raw reply related
* [Resend PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
From: lantianyu1986 @ 2019-02-26 12:07 UTC (permalink / raw)
To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86, joro,
mchehab+samsung, davem, gregkh, nicolas.ferre, arnd,
michael.h.kelley, vkuznets, alex.williamson, dan.carpenter
Cc: Lan Tianyu, devel, iommu, linux-kernel, linux-hyperv
From: Lan Tianyu <Tianyu.Lan@microsoft.com>
On the bare metal, enabling X2APIC mode requires interrupt remapping
function which helps to deliver irq to cpu with 32-bit APIC ID.
Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
MSI protocol already supports to deliver interrupt to the CPU whose
virtual processor index is more than 255. IO-APIC interrupt still has
8-bit APIC ID limitation.
This patchset is to add Hyper-V stub IOMMU driver in order to enable
X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
interrupt remapping capability when X2APIC mode is available. X2APIC
destination mode is set to physical by PATCH 1 when X2APIC is available.
Hyper-V IOMMU driver will scan cpu 0~255 and set cpu into IO-APIC MAX cpu
affinity cpumask if its APIC ID is 8-bit. Driver creates a Hyper-V irq domain
to limit IO-APIC interrupts' affinity and make sure cpus assigned with IO-APIC
interrupt are in the scope of IO-APIC MAX cpu affinity.
Lan Tianyu (3):
x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
available
HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
scope
MAINTAINERS | 1 +
arch/x86/kernel/cpu/mshyperv.c | 10 +++
drivers/iommu/Kconfig | 9 ++
drivers/iommu/Makefile | 1 +
drivers/iommu/hyperv-iommu.c | 194 +++++++++++++++++++++++++++++++++++++++++
drivers/iommu/irq_remapping.c | 3 +
drivers/iommu/irq_remapping.h | 1 +
7 files changed, 219 insertions(+)
create mode 100644 drivers/iommu/hyperv-iommu.c
--
2.7.4
^ permalink raw reply
* Re: [PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
From: joro @ 2019-02-26 11:01 UTC (permalink / raw)
To: Michael Kelley
Cc: Tianyu Lan, arnd@arndb.de, bp@alien8.de, davem@davemloft.net,
devel@linuxdriverproject.org, gregkh@linuxfoundation.org,
Haiyang Zhang, hpa@zytor.com, iommu@lists.linux-foundation.org,
KY Srinivasan, linux-kernel@vger.kernel.org,
mchehab+samsung@kernel.org, mingo@redhat.com,
nicolas.ferre@microchip.com, sashal@kernel.org, Stephen Hemminger,
tglx@linutronix.de, x86@kernel.org, vkuznets,
alex.williamson@redhat.com, dan.carpenter@oracle.com,
linux-hyperv@vger.kernel.org
In-Reply-To: <DM5PR2101MB0918EF12A3B75142EC54D754D77A0@DM5PR2101MB0918.namprd21.prod.outlook.com>
On Mon, Feb 25, 2019 at 08:51:22PM +0000, Michael Kelley wrote:
> Joerg -- What's your take on this patch set now that it has settled down? If
> you are good with it, from the Microsoft standpoint we're hoping that it
> can get into linux-next this week (given the extra week due to 5.0-rc8).
I can't find v5 of the patch-set in my inbox or my iommu-list folder.
Please re-send to me with all Acks and Reviewed-by's added.
Thanks,
Joerg
^ permalink raw reply
* Re: [PATCH v3] Drivers: hv: vmbus: Expose monitor data when channel uses monitor pages
From: Greg KH @ 2019-02-26 8:18 UTC (permalink / raw)
To: Kimberly Brown
Cc: Michael Kelley, Long Li, Sasha Levin, Stephen Hemminger,
Dexuan Cui, K. Y. Srinivasan, Haiyang Zhang, linux-hyperv,
linux-kernel
In-Reply-To: <20190226053530.GA2897@ubu-Virtual-Machine>
On Tue, Feb 26, 2019 at 12:35:30AM -0500, Kimberly Brown wrote:
> There are two methods for signaling the host: the monitor page mechanism
> and hypercalls. The monitor page mechanism is used by performance
> critical channels (storage, networking, etc.) because it provides
> improved throughput. However, latency is increased. Monitor pages are
> allocated to these channels.
>
> Monitor pages are not allocated to channels that do not use the monitor
> page mechanism. Therefore, these channels do not have a valid monitor id
> or valid monitor page data. In these cases, some of the "_show"
> functions return incorrect data. They return an invalid monitor id and
> data that is beyond the bounds of the hv_monitor_page array fields.
>
> The "channel->offermsg.monitor_allocated" value can be used to determine
> whether monitor pages have been allocated to a channel.
>
> Move the device-level monitor page attributes to a separate
> attribute_group struct. If the channel uses the monitor page mechanism,
> set up the sysfs files for these attributes in vmbus_device_register().
>
> Move the channel-level monitor page attributes to a separate
> attribute_group struct. If the channel uses the monitor page mechanism,
> set up the sysfs files for these attributes in vmbus_add_channel_kobj().
>
> Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
> ---
> Changes in v3:
> The monitor "_show" functions no longer return an error when a channel
> does not use the monitor page mechanism. Instead, the monitor page sysfs
> files are created only when a channel uses the monitor page mechanism.
> This change was suggested by G. Kroah-Hartman.
>
> Note: this patch was originally patch 2/2 in a patchset. Patch 1/2 has
> already been added to char-misc-testing, so I'm not resending it.
>
> Changes in v2:
> - Changed the return value for cases where monitor_allocated is not set
> to "-EINVAL".
> - Updated the commit message to provide more details about the monitor
> page mechanism.
> - Updated the sysfs documentation to describe the new return value.
>
> Documentation/ABI/stable/sysfs-bus-vmbus | 12 ++++--
> drivers/hv/vmbus_drv.c | 52 +++++++++++++++++++-----
> 2 files changed, 51 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
> index 826689dcc2e6..6d5cb195b119 100644
> --- a/Documentation/ABI/stable/sysfs-bus-vmbus
> +++ b/Documentation/ABI/stable/sysfs-bus-vmbus
> @@ -81,7 +81,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/latency
> Date: September. 2017
> KernelVersion: 4.14
> Contact: Stephen Hemminger <sthemmin@microsoft.com>
> -Description: Channel signaling latency
> +Description: Channel signaling latency. This file is available only for
> + performance critical channels (storage, network, etc.) that use
> + the monitor page mechanism.
> Users: Debugging tools
>
> What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/out_mask
> @@ -95,7 +97,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/pending
> Date: September. 2017
> KernelVersion: 4.14
> Contact: Stephen Hemminger <sthemmin@microsoft.com>
> -Description: Channel interrupt pending state
> +Description: Channel interrupt pending state. This file is available only for
> + performance critical channels (storage, network, etc.) that use
> + the monitor page mechanism.
> Users: Debugging tools
>
> What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/read_avail
> @@ -137,7 +141,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/monitor_id
> Date: January. 2018
> KernelVersion: 4.16
> Contact: Stephen Hemminger <sthemmin@microsoft.com>
> -Description: Monitor bit associated with channel
> +Description: Monitor bit associated with channel. This file is available only
> + for performance critical channels (storage, network, etc.) that
> + use the monitor page mechanism.
> Users: Debugging tools and userspace drivers
>
> What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/ring
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 000b53e5a17a..ede858b0ee46 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -601,19 +601,12 @@ static DEVICE_ATTR_RW(driver_override);
> static struct attribute *vmbus_dev_attrs[] = {
> &dev_attr_id.attr,
> &dev_attr_state.attr,
> - &dev_attr_monitor_id.attr,
> &dev_attr_class_id.attr,
> &dev_attr_device_id.attr,
> &dev_attr_modalias.attr,
> #ifdef CONFIG_NUMA
> &dev_attr_numa_node.attr,
> #endif
> - &dev_attr_server_monitor_pending.attr,
> - &dev_attr_client_monitor_pending.attr,
> - &dev_attr_server_monitor_latency.attr,
> - &dev_attr_client_monitor_latency.attr,
> - &dev_attr_server_monitor_conn_id.attr,
> - &dev_attr_client_monitor_conn_id.attr,
> &dev_attr_out_intr_mask.attr,
> &dev_attr_out_read_index.attr,
> &dev_attr_out_write_index.attr,
> @@ -632,6 +625,22 @@ static struct attribute *vmbus_dev_attrs[] = {
> };
> ATTRIBUTE_GROUPS(vmbus_dev);
>
> +/*
> + * Set up per device monitor page attributes. These attributes are used only by
> + * channels that use the monitor page mechanism.
> + */
> +static struct attribute *vmbus_dev_monitor_attrs[] = {
> + &dev_attr_monitor_id.attr,
> + &dev_attr_server_monitor_pending.attr,
> + &dev_attr_client_monitor_pending.attr,
> + &dev_attr_server_monitor_latency.attr,
> + &dev_attr_client_monitor_latency.attr,
> + &dev_attr_server_monitor_conn_id.attr,
> + &dev_attr_client_monitor_conn_id.attr,
> + NULL
> +};
> +ATTRIBUTE_GROUPS(vmbus_dev_monitor);
No need to create a special group for this and then manually add it if
you need it. Just the is_visible() callback in the attribute instead, as
that is what it is there for.
Thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH v2 2/2] Drivers: hv: vmbus: Add a channel ring buffer mutex lock
From: Kimberly Brown @ 2019-02-26 6:24 UTC (permalink / raw)
To: Michael Kelley
Cc: Long Li, Sasha Levin, Stephen Hemminger, Dexuan Cui,
KY Srinivasan, Haiyang Zhang, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <DM5PR2101MB0918921A96F16D862BA2AF0FD7790@DM5PR2101MB0918.namprd21.prod.outlook.com>
On Sun, Feb 24, 2019 at 04:53:03PM +0000, Michael Kelley wrote:
> From: Kimberly Brown <kimbrownkd@gmail.com> Sent: Thursday, February 21, 2019 7:47 PM
> >
> > The "_show" functions that access channel ring buffer data are
> > vulnerable to a race condition that can result in a NULL pointer
> > dereference. This problem was discussed here:
> > https://lkml.org/lkml/2018/10/18/779
> >
> > To prevent this from occurring, add a new mutex lock,
> > "ring_buffer_mutex", to the vmbus_channel struct.
> >
> > Acquire/release "ring_buffer_mutex" in the functions that can set the
> > ring buffer pointer to NULL: vmbus_free_ring() and __vmbus_open().
> >
> > Acquire/release "ring_buffer_mutex" in the four channel-level "_show"
> > functions that access ring buffer data. Remove the "const" qualifier
> > from the "struct vmbus_channel *chan" parameter of the channel-level
> > "_show" functions so that "ring_buffer_mutex" can be acquired/released
> > in these functions.
> >
> > Acquire/release "ring_buffer_mutex" in hv_ringbuffer_get_debuginfo().
> > Pass the channel pointer to hv_ringbuffer_get_debuginfo() so that
> > "ring_buffer_mutex" can be accessed in this function.
> >
> > Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
>
> I've reviewed the code. I believe it is correct and fixes the race
> condition. Unfortunately, the code ended up being messier than I
> had hoped, and in particular, the need to pass the channel pointer
> into the ring buffer functions is distasteful. An alternate idea is to
> put the new mutex into the hv_ring_buffer_info structure. This results
> in two mutex's since there's a separate hv_ring_buffer_info structure for
> the "in" ring and the "out" ring. But it makes the ring buffer functions
> more self-contained and able to operate without knowledge of the
> channel. The mutex can be obtained in hv_ringbuffer_cleanup() instead
> of in the vmbus functions, and hv_ringbuffer_get_debuginfo() doesn't
> need the channel pointer.
>
> The "const" still has to dropped from the channel pointer because
> the hv_ring_buffer_info structures are inline in the channel structure,
> but that's less objectionable. The extra memory for two mutex's isn't
> really a problem, and none of the code paths are performance
> sensitive.
>
> It's a tradeoff. I think I slightly prefer moving the mutex to the
> hv_ring_buffer_info structure, but could also be persuaded to
> take it like it is.
>
Thanks for the feedback! I don't have a compelling reason to keep the
lock in the vmbus_channel struct. I chose this approach because only one
lock would be required, rather than two. But, as you noted, using one
lock requires some tradeoffs.
I've looked through the changes that would be required to use two locks,
and I agree with you; I prefer using two locks. I'll submit a v3 for this
patch.
Thanks,
Kim
> Thoughts?
>
> Michael
>
^ permalink raw reply
* [PATCH v3] Drivers: hv: vmbus: Expose monitor data when channel uses monitor pages
From: Kimberly Brown @ 2019-02-26 5:35 UTC (permalink / raw)
To: Michael Kelley, Long Li, Sasha Levin, Stephen Hemminger,
Dexuan Cui, gregkh
Cc: K. Y. Srinivasan, Haiyang Zhang, linux-hyperv, linux-kernel
In-Reply-To: <7481d15f52427917a5f620e29308c1aa5c63f3eb.1550554279.git.kimbrownkd@gmail.com>
There are two methods for signaling the host: the monitor page mechanism
and hypercalls. The monitor page mechanism is used by performance
critical channels (storage, networking, etc.) because it provides
improved throughput. However, latency is increased. Monitor pages are
allocated to these channels.
Monitor pages are not allocated to channels that do not use the monitor
page mechanism. Therefore, these channels do not have a valid monitor id
or valid monitor page data. In these cases, some of the "_show"
functions return incorrect data. They return an invalid monitor id and
data that is beyond the bounds of the hv_monitor_page array fields.
The "channel->offermsg.monitor_allocated" value can be used to determine
whether monitor pages have been allocated to a channel.
Move the device-level monitor page attributes to a separate
attribute_group struct. If the channel uses the monitor page mechanism,
set up the sysfs files for these attributes in vmbus_device_register().
Move the channel-level monitor page attributes to a separate
attribute_group struct. If the channel uses the monitor page mechanism,
set up the sysfs files for these attributes in vmbus_add_channel_kobj().
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
---
Changes in v3:
The monitor "_show" functions no longer return an error when a channel
does not use the monitor page mechanism. Instead, the monitor page sysfs
files are created only when a channel uses the monitor page mechanism.
This change was suggested by G. Kroah-Hartman.
Note: this patch was originally patch 2/2 in a patchset. Patch 1/2 has
already been added to char-misc-testing, so I'm not resending it.
Changes in v2:
- Changed the return value for cases where monitor_allocated is not set
to "-EINVAL".
- Updated the commit message to provide more details about the monitor
page mechanism.
- Updated the sysfs documentation to describe the new return value.
Documentation/ABI/stable/sysfs-bus-vmbus | 12 ++++--
drivers/hv/vmbus_drv.c | 52 +++++++++++++++++++-----
2 files changed, 51 insertions(+), 13 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
index 826689dcc2e6..6d5cb195b119 100644
--- a/Documentation/ABI/stable/sysfs-bus-vmbus
+++ b/Documentation/ABI/stable/sysfs-bus-vmbus
@@ -81,7 +81,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/latency
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
-Description: Channel signaling latency
+Description: Channel signaling latency. This file is available only for
+ performance critical channels (storage, network, etc.) that use
+ the monitor page mechanism.
Users: Debugging tools
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/out_mask
@@ -95,7 +97,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/pending
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
-Description: Channel interrupt pending state
+Description: Channel interrupt pending state. This file is available only for
+ performance critical channels (storage, network, etc.) that use
+ the monitor page mechanism.
Users: Debugging tools
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/read_avail
@@ -137,7 +141,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/monitor_id
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin@microsoft.com>
-Description: Monitor bit associated with channel
+Description: Monitor bit associated with channel. This file is available only
+ for performance critical channels (storage, network, etc.) that
+ use the monitor page mechanism.
Users: Debugging tools and userspace drivers
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/ring
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 000b53e5a17a..ede858b0ee46 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -601,19 +601,12 @@ static DEVICE_ATTR_RW(driver_override);
static struct attribute *vmbus_dev_attrs[] = {
&dev_attr_id.attr,
&dev_attr_state.attr,
- &dev_attr_monitor_id.attr,
&dev_attr_class_id.attr,
&dev_attr_device_id.attr,
&dev_attr_modalias.attr,
#ifdef CONFIG_NUMA
&dev_attr_numa_node.attr,
#endif
- &dev_attr_server_monitor_pending.attr,
- &dev_attr_client_monitor_pending.attr,
- &dev_attr_server_monitor_latency.attr,
- &dev_attr_client_monitor_latency.attr,
- &dev_attr_server_monitor_conn_id.attr,
- &dev_attr_client_monitor_conn_id.attr,
&dev_attr_out_intr_mask.attr,
&dev_attr_out_read_index.attr,
&dev_attr_out_write_index.attr,
@@ -632,6 +625,22 @@ static struct attribute *vmbus_dev_attrs[] = {
};
ATTRIBUTE_GROUPS(vmbus_dev);
+/*
+ * Set up per device monitor page attributes. These attributes are used only by
+ * channels that use the monitor page mechanism.
+ */
+static struct attribute *vmbus_dev_monitor_attrs[] = {
+ &dev_attr_monitor_id.attr,
+ &dev_attr_server_monitor_pending.attr,
+ &dev_attr_client_monitor_pending.attr,
+ &dev_attr_server_monitor_latency.attr,
+ &dev_attr_client_monitor_latency.attr,
+ &dev_attr_server_monitor_conn_id.attr,
+ &dev_attr_client_monitor_conn_id.attr,
+ NULL
+};
+ATTRIBUTE_GROUPS(vmbus_dev_monitor);
+
/*
* vmbus_uevent - add uevent for our device
*
@@ -1537,19 +1546,30 @@ static struct attribute *vmbus_chan_attrs[] = {
&chan_attr_read_avail.attr,
&chan_attr_write_avail.attr,
&chan_attr_cpu.attr,
- &chan_attr_pending.attr,
- &chan_attr_latency.attr,
&chan_attr_interrupts.attr,
&chan_attr_events.attr,
&chan_attr_intr_in_full.attr,
&chan_attr_intr_out_empty.attr,
&chan_attr_out_full_first.attr,
&chan_attr_out_full_total.attr,
- &chan_attr_monitor_id.attr,
&chan_attr_subchannel_id.attr,
NULL
};
+/* Set up per channel monitor page attributes. These attributes are used only by
+ * channels that use the monitor page mechanism.
+ */
+static struct attribute *vmbus_chan_monitor_attrs[] = {
+ &chan_attr_pending.attr,
+ &chan_attr_latency.attr,
+ &chan_attr_monitor_id.attr,
+ NULL
+};
+
+static struct attribute_group vmbus_chan_monitor_attr_group = {
+ .attrs = vmbus_chan_monitor_attrs,
+};
+
static struct kobj_type vmbus_chan_ktype = {
.sysfs_ops = &vmbus_chan_sysfs_ops,
.release = vmbus_chan_release,
@@ -1571,6 +1591,15 @@ int vmbus_add_channel_kobj(struct hv_device *dev, struct vmbus_channel *channel)
if (ret)
return ret;
+ if (channel->offermsg.monitor_allocated) {
+ ret = sysfs_create_group(kobj, &vmbus_chan_monitor_attr_group);
+ if (ret) {
+ pr_err("Unable to set up monitor page sysfs files");
+ kobject_put(kobj);
+ return ret;
+ }
+ }
+
kobject_uevent(kobj, KOBJ_ADD);
return 0;
@@ -1615,6 +1644,9 @@ int vmbus_device_register(struct hv_device *child_device_obj)
child_device_obj->device.parent = &hv_acpi_dev->dev;
child_device_obj->device.release = vmbus_device_release;
+ if (child_device_obj->channel->offermsg.monitor_allocated)
+ child_device_obj->device.groups = vmbus_dev_monitor_groups;
+
/*
* Register with the LDM. This will kick off the driver/device
* binding...which will eventually call vmbus_match() and vmbus_probe()
--
2.17.1
^ permalink raw reply related
* RE: [PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
From: Michael Kelley @ 2019-02-25 20:51 UTC (permalink / raw)
To: joro@8bytes.org
Cc: Tianyu Lan, arnd@arndb.de, bp@alien8.de, davem@davemloft.net,
devel@linuxdriverproject.org, gregkh@linuxfoundation.org,
Haiyang Zhang, hpa@zytor.com, iommu@lists.linux-foundation.org,
KY Srinivasan, linux-kernel@vger.kernel.org,
mchehab+samsung@kernel.org, mingo@redhat.com,
nicolas.ferre@microchip.com, sashal@kernel.org, Stephen Hemminger,
tglx@linutronix.de, x86@kernel.org, vkuznets,
alex.williamson@redhat.com, dan.carpenter@oracle.com,
linux-hyperv@vger.kernel.org
In-Reply-To: <1550837545-10760-1-git-send-email-Tianyu.Lan@microsoft.com>
From: Tianyu Lan <Tianyu.Lan@microsoft.com> Sent: Friday, February 22, 2019 4:12 AM
>
> On the bare metal, enabling X2APIC mode requires interrupt remapping
> function which helps to deliver irq to cpu with 32-bit APIC ID.
> Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> MSI protocol already supports to deliver interrupt to the CPU whose
> virtual processor index is more than 255. IO-APIC interrupt still has
> 8-bit APIC ID limitation.
>
> This patchset is to add Hyper-V stub IOMMU driver in order to enable
> X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> interrupt remapping capability when X2APIC mode is available. X2APIC
> destination mode is set to physical by PATCH 1 when X2APIC is available.
> Hyper-V IOMMU driver will scan cpu 0~255 and set cpu into IO-APIC MAX cpu
> affinity cpumask if its APIC ID is 8-bit. Driver creates a Hyper-V irq domain
> to limit IO-APIC interrupts' affinity and make sure cpus assigned with IO-APIC
> interrupt are in the scope of IO-APIC MAX cpu affinity.
>
> Lan Tianyu (3):
> x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
> available
> HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
> MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
> scope
>
Joerg -- What's your take on this patch set now that it has settled down? If
you are good with it, from the Microsoft standpoint we're hoping that it
can get into linux-next this week (given the extra week due to 5.0-rc8).
Thanks,
Michael Kelley
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox