Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* Re: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Hannes Reinecke @ 2019-06-06  6:02 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-11-hch@lst.de>

On 6/5/19 9:08 PM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.  Note that the effect is global, as the IOMMU merging
> is based off a paramters in struct device.  We could still turn if off
> if no PCIe devices are present, but I don't know how to find that out.
> 
> Also remove the bogus nomerges flag, merges do take the virt_boundary
> into account.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 46 +++++----------------
>  drivers/scsi/megaraid/megaraid_sas_fusion.c |  7 ++++
>  2 files changed, 18 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 3dd1df472dc6..20b3b3f8bc16 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -1870,39 +1870,6 @@ void megasas_set_dynamic_target_properties(struct scsi_device *sdev,
>  	}
>  }
>  
> -/*
> - * megasas_set_nvme_device_properties -
> - * set nomerges=2
> - * set virtual page boundary = 4K (current mr_nvme_pg_size is 4K).
> - * set maximum io transfer = MDTS of NVME device provided by MR firmware.
> - *
> - * MR firmware provides value in KB. Caller of this function converts
> - * kb into bytes.
> - *
> - * e.a MDTS=5 means 2^5 * nvme page size. (In case of 4K page size,
> - * MR firmware provides value 128 as (32 * 4K) = 128K.
> - *
> - * @sdev:				scsi device
> - * @max_io_size:				maximum io transfer size
> - *
> - */
> -static inline void
> -megasas_set_nvme_device_properties(struct scsi_device *sdev, u32 max_io_size)
> -{
> -	struct megasas_instance *instance;
> -	u32 mr_nvme_pg_size;
> -
> -	instance = (struct megasas_instance *)sdev->host->hostdata;
> -	mr_nvme_pg_size = max_t(u32, instance->nvme_page_size,
> -				MR_DEFAULT_NVME_PAGE_SIZE);
> -
> -	blk_queue_max_hw_sectors(sdev->request_queue, (max_io_size / 512));
> -
> -	blk_queue_flag_set(QUEUE_FLAG_NOMERGES, sdev->request_queue);
> -	blk_queue_virt_boundary(sdev->request_queue, mr_nvme_pg_size - 1);
> -}
> -
> -
>  /*
>   * megasas_set_static_target_properties -
>   * Device property set by driver are static and it is not required to be
> @@ -1961,8 +1928,10 @@ static void megasas_set_static_target_properties(struct scsi_device *sdev,
>  		max_io_size_kb = le32_to_cpu(instance->tgt_prop->max_io_size_kb);
>  	}
>  
> -	if (instance->nvme_page_size && max_io_size_kb)
> -		megasas_set_nvme_device_properties(sdev, (max_io_size_kb << 10));
> +	if (instance->nvme_page_size && max_io_size_kb) {
> +		blk_queue_max_hw_sectors(sdev->request_queue,
> +				(max_io_size_kb << 10) / 512);
> +	}
>  
>  	scsi_change_queue_depth(sdev, device_qd);
>  
What happened to the NOMERGES queue flag?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH 08/13] IB/iser: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-06  6:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jens Axboe, Sebastian Ott, Sagi Grimberg,
	Max Gurtovoy, Bart Van Assche, Ulf Hansson, Alan Stern,
	Oliver Neukum, linux-block, linux-rdma, linux-mmc, linux-nvme,
	linux-scsi, megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
	linux-usb, usb-storage, linux-kernel
In-Reply-To: <20190605202235.GC3273@ziepe.ca>

On Wed, Jun 05, 2019 at 05:22:35PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 05, 2019 at 09:08:31PM +0200, Christoph Hellwig wrote:
> > This ensures all proper DMA layer handling is taken care of by the
> > SCSI midlayer.
> 
> Maybe not entirely related to this series, but it looks like the SCSI
> layer is changing the device global dma_set_max_seg_size() - at least
> in RDMA the dma device is being shared between many users, so we
> really don't want SCSI to make this value smaller.
> 
> Can we do something about this?

We could do something about it as outlined in my mail - pass the
dma_params explicitly to the dma_map_sg call.  But that isn't really
suitable for a short term fix and will take a little more time.

Until we've sorted that out the device paramter needs to be set to
the smallest value supported.

^ permalink raw reply

* Re: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-06  6:41 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Jens Axboe, Sebastian Ott, Sagi Grimberg,
	Max Gurtovoy, Bart Van Assche, Ulf Hansson, Alan Stern,
	Oliver Neukum, linux-block, linux-rdma, linux-mmc, linux-nvme,
	linux-scsi, megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
	linux-usb, usb-storage, linux-kernel
In-Reply-To: <345c3931-0940-7d59-ebc6-fa1ea56c60ac@suse.de>

On Thu, Jun 06, 2019 at 08:02:07AM +0200, Hannes Reinecke wrote:
> >  	scsi_change_queue_depth(sdev, device_qd);
> >  
> What happened to the NOMERGES queue flag?

Quote from the patch description:

"Also remove the bogus nomerges flag, merges do take the virt_boundary
 into account."

^ permalink raw reply

* [PATCH v2 0/5] hv: Remove dependencies on guest page size
From: Maya Nakamura @ 2019-06-06  8:00 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

The Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86. Hyper-V
code mixes up the two, so this patchset begins to address that by
creating and using a set of Hyper-V specific page definitions.

A major benefit of those new definitions is that they support non-x86
architectures, such as ARM64, that use different page sizes. On ARM64,
the guest page size may not be 4096, and Hyper-V always runs with a page
size of 4096.

In this patchset, the first two patches lay the foundation for the
others, creating definitions and preparing for allocation of memory with
the size and alignment that Hyper-V expects as a page. Patch 3 applies
the page size definition where the guest VM and Hyper-V communicate, and
where the code intends to use the Hyper-V page size. The last two
patches set the ring buffer size to a fixed value, removing the
dependency on the guest page size.

This is the initial set of changes to the Hyper-V code, and future
patches will make additional changes using the same foundation, for
example, replace __vmalloc() and related functions when Hyper-V pages
are intended.

Changes in v2:
- [PATCH 2/5] Replace with a new patch.

Maya Nakamura (5):
  x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
  x86: hv: hv_init.c: Add functions to allocate/deallocate page for
    Hyper-V
  hv: vmbus: Replace page definition with Hyper-V specific one
  HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
  Input: hv: Remove dependencies on PAGE_SIZE for ring buffer

 arch/x86/hyperv/hv_init.c             | 14 ++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h    | 12 +++++++++++-
 drivers/hid/hid-hyperv.c              |  4 ++--
 drivers/hv/hyperv_vmbus.h             |  8 ++++----
 drivers/input/serio/hyperv-keyboard.c |  4 ++--
 5 files changed, 33 insertions(+), 9 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 1/5] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
From: Maya Nakamura @ 2019-06-06  8:03 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1559807514.git.m.maya.nakamura@gmail.com>

Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
the Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86.

Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index cdf44aa9a501..44bd68aefd00 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -12,6 +12,16 @@
 #include <linux/types.h>
 #include <asm/page.h>
 
+/*
+ * While not explicitly listed in the TLFS, Hyper-V always runs with a page size
+ * of 4096. These definitions are used when communicating with Hyper-V using
+ * guest physical pages and guest physical page addresses, since the guest page
+ * size may not be 4096 on all architectures.
+ */
+#define HV_HYP_PAGE_SHIFT	12
+#define HV_HYP_PAGE_SIZE	BIT(HV_HYP_PAGE_SHIFT)
+#define HV_HYP_PAGE_MASK	(~(HV_HYP_PAGE_SIZE - 1))
+
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
  * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
@@ -841,7 +851,7 @@ union hv_gpa_page_range {
  * count is equal with how many entries of union hv_gpa_page_range can
  * be populated into the input parameter page.
  */
-#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) /	\
+#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) /	\
 				sizeof(union hv_gpa_page_range))
 
 struct hv_guest_mapping_flush_list {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 2/5] x86: hv: hv_init.c: Add functions to allocate/deallocate page for Hyper-V
From: Maya Nakamura @ 2019-06-06  8:05 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1559807514.git.m.maya.nakamura@gmail.com>

Introduce two new functions, hv_alloc_hyperv_page() and
hv_free_hyperv_page(), to allocate/deallocate memory with the size and
alignment that Hyper-V expects as a page. Although currently they are
not used, they are ready to be used to allocate/deallocate memory on x86
when their ARM64 counterparts are implemented, keeping symmetry between
architectures with potentially different guest page sizes.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/hyperv/hv_init.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e4ba467a9fc6..84baf0e9a2d4 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -98,6 +98,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
+void *hv_alloc_hyperv_page(void)
+{
+	BUILD_BUG_ON(!(PAGE_SIZE == HV_HYP_PAGE_SIZE));
+
+	return (void *)__get_free_page(GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
+
+void hv_free_hyperv_page(unsigned long addr)
+{
+	free_page(addr);
+}
+EXPORT_SYMBOL_GPL(hv_free_hyperv_page);
+
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 3/5] hv: vmbus: Replace page definition with Hyper-V specific one
From: Maya Nakamura @ 2019-06-06  8:06 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1559807514.git.m.maya.nakamura@gmail.com>

Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
not be 4096 on all architectures and Hyper-V always runs with a page
size of 4096.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/hv/hyperv_vmbus.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index e5467b821f41..5489b061d261 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -208,11 +208,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 		       u64 *requestid, bool raw);
 
 /*
- * Maximum channels is determined by the size of the interrupt page
- * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
- * and the other is receive endpoint interrupt
+ * Maximum channels, 16348, is determined by the size of the interrupt page,
+ * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
+ * interrupt, and the other is to receive endpoint interrupt.
  */
-#define MAX_NUM_CHANNELS	((PAGE_SIZE >> 1) << 3)	/* 16348 channels */
+#define MAX_NUM_CHANNELS	((HV_HYP_PAGE_SIZE >> 1) << 3)
 
 /* The value here must be in multiple of 32 */
 /* TODO: Need to make this configurable */
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 4/5] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-06-06  8:07 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1559807514.git.m.maya.nakamura@gmail.com>

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/hid/hid-hyperv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index d3311d714d35..e8b154fa38e2 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -112,8 +112,8 @@ struct synthhid_input_report {
 
 #pragma pack(pop)
 
-#define INPUTVSC_SEND_RING_BUFFER_SIZE		(10*PAGE_SIZE)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE		(10*PAGE_SIZE)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
 
 
 enum pipe_prot_msg_type {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 5/5] Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-06-06  8:09 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1559807514.git.m.maya.nakamura@gmail.com>

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/input/serio/hyperv-keyboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
index 7935e52b5435..a3480dbfadd2 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -83,8 +83,8 @@ struct synth_kbd_keystroke {
 
 #define HK_MAXIMUM_MESSAGE_SIZE 256
 
-#define KBD_VSC_SEND_RING_BUFFER_SIZE		(10 * PAGE_SIZE)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE		(10 * PAGE_SIZE)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
 
 #define XTKBD_EMUL0     0xe0
 #define XTKBD_EMUL1     0xe1
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 08/13] IB/iser: set virt_boundary_mask in the scsi host
From: Jason Gunthorpe @ 2019-06-06 12:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <20190606062441.GB26745@lst.de>

On Thu, Jun 06, 2019 at 08:24:41AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 05, 2019 at 05:22:35PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 05, 2019 at 09:08:31PM +0200, Christoph Hellwig wrote:
> > > This ensures all proper DMA layer handling is taken care of by the
> > > SCSI midlayer.
> > 
> > Maybe not entirely related to this series, but it looks like the SCSI
> > layer is changing the device global dma_set_max_seg_size() - at least
> > in RDMA the dma device is being shared between many users, so we
> > really don't want SCSI to make this value smaller.
> > 
> > Can we do something about this?
> 
> We could do something about it as outlined in my mail - pass the
> dma_params explicitly to the dma_map_sg call.  But that isn't really
> suitable for a short term fix and will take a little more time.

Sounds good to me, having every dma mapping specify its restrictions
makes a lot more sense than a device global setting, IMHO.

In RDMA the restrictions to build a SGL, create a device queue or
build a MR are all a little different.

ie for MRs alignment of the post-IOMMU DMA address is very important
for performance as the MR logic can only build device huge pages out
of properly aligned DMA addresses. While for SGLs we don't care about
this, instead SGLs usually have the 32 bit per-element length limit in
the HW that MRs do not.

> Until we've sorted that out the device paramter needs to be set to
> the smallest value supported.

smallest? largest? We've been setting it to the largest value the
device can handle (ie 2G)

Jason

^ permalink raw reply

* Re: [PATCH 08/13] IB/iser: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-06 14:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jens Axboe, Sebastian Ott, Sagi Grimberg,
	Max Gurtovoy, Bart Van Assche, Ulf Hansson, Alan Stern,
	Oliver Neukum, linux-block, linux-rdma, linux-mmc, linux-nvme,
	linux-scsi, megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
	linux-usb, usb-storage, linux-kernel
In-Reply-To: <20190606125935.GA17373@ziepe.ca>

On Thu, Jun 06, 2019 at 09:59:35AM -0300, Jason Gunthorpe wrote:
> > Until we've sorted that out the device paramter needs to be set to
> > the smallest value supported.
> 
> smallest? largest? We've been setting it to the largest value the
> device can handle (ie 2G)

Well, in general we need the smallest value supported by any ULP,
because if any ULP can't support a larger segment size, we must not
allow the IOMMU to merge it to that size.  That being said I can't
really see why any RDMA ULP should limit the size given how the MRs
work.

^ permalink raw reply

* RE: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Kashyap Desai @ 2019-06-06 15:37 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, PDL,MEGARAIDLINUX,
	PDL-MPT-FUSIONLINUX, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-11-hch@lst.de>

>
> This ensures all proper DMA layer handling is taken care of by the SCSI
> midlayer.  Note that the effect is global, as the IOMMU merging is based
> off a
> paramters in struct device.  We could still turn if off if no PCIe devices
> are
> present, but I don't know how to find that out.
>
> Also remove the bogus nomerges flag, merges do take the virt_boundary into
> account.

Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We want
to confirm few sanity before ACK. BTW, what benefit we will see moving
virt_boundry setting to SCSI mid layer ? Is it just modular approach OR any
functional fix ?

Kashyap

^ permalink raw reply

* Re: [PATCH v3 0/2] Drivers: hv: Move Hyper-V clock/timer code to separate clocksource driver
From: Sasha Levin @ 2019-06-06 16:37 UTC (permalink / raw)
  To: Michael Kelley
  Cc: will.deacon@arm.com, marc.zyngier@arm.com,
	linux-arm-kernel@lists.infradead.org, gregkh@linuxfoundation.org,
	linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, vkuznets, jasowang@redhat.com,
	marcelo.cerri@canonical.com, Sunil Muthuswamy, KY Srinivasan,
	catalin.marinas@arm.com, mark.rutland@arm.com
In-Reply-To: <1558969089-13204-1-git-send-email-mikelley@microsoft.com>

On Mon, May 27, 2019 at 02:59:07PM +0000, Michael Kelley wrote:
>This patch series moves Hyper-V clock/timer code to a separate Hyper-V
>clocksource driver. Previously, Hyper-V clock/timer code and data
>structures were mixed in with other Hyper-V code in the ISA independent
>drivers/hv code as well as in arch dependent code. The new Hyper-V
>clocksource driver is ISA independent, with a just few dependencies on
>arch specific functions. The patch series does not change any behavior
>or functionality -- it only reorganizes the existing code and fixes up
>the linkages. A few places outside of Hyper-V code are fixed up to use
>the new #include file structure.
>
>This restructuring is in response to Marc Zyngier's review comments
>on supporting Hyper-V running on ARM64, and is a good idea in general.
>It increases the amount of code shared between the x86 and ARM64
>architectures, and reduces the size of the new code for supporting
>Hyper-V on ARM64. A new version of the Hyper-V on ARM64 patches will
>follow once this clocksource restructuring is accepted.
>
>The code is diff'ed against Linux 5.2.0-rc1-next-20190524.
>
>Changes in v3:
>* Removed boolean argument to hv_init_clocksource(). Always call
>sched_clock_register, which is needed on ARM64 but a no-op on x86.
>* Removed separate cpuhp setup in hv_stimer_alloc() and instead
>directly call hv_stimer_init() and hv_stimer_cleanup() from
>corresponding VMbus functions.  This more closely matches original
>code and avoids clocksource stop/restart problems on ARM64 when
>VMbus code denies CPU offlining request.
>
>Changes in v2:
>* Revised commit short descriptions so the distinction between
>the first and second patches is clearer [GregKH]
>* Renamed new clocksource driver files and functions to use
>existing "timer" and "stimer" names instead of introducing
>"syntimer". [Vitaly Kuznetsov]
>* Introduced CONFIG_HYPER_TIMER to fix build problem when
>CONFIG_HYPERV=m [Vitaly Kuznetsov]
>* Added "Suggested-by: Marc Zyngier"
>
>Michael Kelley (2):
>  Drivers: hv: Create Hyper-V clocksource driver from existing
>    clockevents code
>  Drivers: hv: Move Hyper-V clocksource code to new clocksource driver
>
> MAINTAINERS                          |   2 +
> arch/x86/entry/vdso/vclock_gettime.c |   1 +
> arch/x86/entry/vdso/vma.c            |   2 +-
> arch/x86/hyperv/hv_init.c            |  91 +---------
> arch/x86/include/asm/hyperv-tlfs.h   |   6 +
> arch/x86/include/asm/mshyperv.h      |  81 ++-------
> arch/x86/kernel/cpu/mshyperv.c       |   2 +
> arch/x86/kvm/x86.c                   |   1 +
> drivers/clocksource/Makefile         |   1 +
> drivers/clocksource/hyperv_timer.c   | 321 +++++++++++++++++++++++++++++++++++
> drivers/hv/Kconfig                   |   3 +
> drivers/hv/hv.c                      | 156 +----------------
> drivers/hv/hv_util.c                 |   1 +
> drivers/hv/hyperv_vmbus.h            |   3 -
> drivers/hv/vmbus_drv.c               |  42 ++---
> include/clocksource/hyperv_timer.h   | 105 ++++++++++++
> 16 files changed, 484 insertions(+), 334 deletions(-)
> create mode 100644 drivers/clocksource/hyperv_timer.c
> create mode 100644 include/clocksource/hyperv_timer.h

Queued for hyperv-next, thanks!

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v2 1/1] Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h
From: Sasha Levin @ 2019-06-06 16:50 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, vkuznets, jasowang@redhat.com,
	marcelo.cerri@canonical.com, Sunil Muthuswamy, KY Srinivasan
In-Reply-To: <1559175219-17823-1-git-send-email-mikelley@microsoft.com>

On Thu, May 30, 2019 at 12:14:00AM +0000, Michael Kelley wrote:
>Break out parts of mshyperv.h that are ISA independent into a
>separate file in include/asm-generic. This move facilitates
>ARM64 code reusing these definitions and avoids code
>duplication. No functionality or behavior is changed.
>
>Signed-off-by: Michael Kelley <mikelley@microsoft.com>
>Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>---
>Changes in v2:
>* Removed unneeded #includes in asm-generic/mshyperv.h;
>added two #includes that are needed. [Vitaly Kuznetsov]

Queued for hyperv-next, thanks!

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH] revert async probing of VMBus network devices.
From: Stephen Hemminger @ 2019-06-06 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-hyperv, Stephen Hemminger
In-Reply-To: <20190605185114.12456-1-sthemmin@microsoft.com>

On Wed,  5 Jun 2019 11:51:14 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> Doing asynchronous probing can lead to reordered network device names.
> And because udev doesn't have any useful information to construct a
> persistent name, this causes VM's to sporadically boot with reordered
> device names and no connectivity.
> 
> This shows up on the Ubuntu image on larger VM's where 30% of the
> time eth0 and eth1 get swapped.
> 
> Note: udev MAC address policy is disabled on Azure images
> because the netvsc and PCI VF will have the same mac address.
> 
> Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc_drv.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 06393b215102..1a2c32111106 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -2411,9 +2411,6 @@ static struct  hv_driver netvsc_drv = {
>  	.id_table = id_table,
>  	.probe = netvsc_probe,
>  	.remove = netvsc_remove,
> -	.driver = {
> -		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
> -	},
>  };
>  
>  /*

Even though storage can handle out of order devices, networking can not.
The network devices in Hyper-V do not have any persistant properties that will
work with existing udev. The current kernel is breaking current distributions.
This patch fixes it, why did you reject it?

^ permalink raw reply

* Re: properly communicate queue limits to the DMA layer
From: Jens Axboe @ 2019-06-07  5:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605192405.GA18243@lst.de>

On 6/5/19 1:24 PM, Christoph Hellwig wrote:
> On Wed, Jun 05, 2019 at 01:17:15PM -0600, Jens Axboe wrote:
>> Since I'm heading out shortly and since I think this should make
>> the next -rc, I'll tentatively queue this up.
> 
> The SCSI bits will need a bit more review, and possibly tweaking
> fo megaraid and mpt3sas.  But they are really independent of the
> other patches, so maybe skip them for now and leave them for Martin
> to deal with.

I dropped the SCSI bits.

-- 
Jens Axboe


^ permalink raw reply

* Re: properly communicate queue limits to the DMA layer
From: Martin K. Petersen @ 2019-06-07 17:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <f07d0abf-b3eb-f530-37b9-e66454740b3f@kernel.dk>


Jens,

>> The SCSI bits will need a bit more review, and possibly tweaking
>> fo megaraid and mpt3sas.  But they are really independent of the
>> other patches, so maybe skip them for now and leave them for Martin
>> to deal with.
>
> I dropped the SCSI bits.

I'll monitor and merge them.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: properly communicate queue limits to the DMA layer
From: Jens Axboe @ 2019-06-08  8:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <yq1o939i9qh.fsf@oracle.com>

On 6/7/19 11:30 AM, Martin K. Petersen wrote:
> 
> Jens,
> 
>>> The SCSI bits will need a bit more review, and possibly tweaking
>>> fo megaraid and mpt3sas.  But they are really independent of the
>>> other patches, so maybe skip them for now and leave them for Martin
>>> to deal with.
>>
>> I dropped the SCSI bits.
> 
> I'll monitor and merge them.

Great, thanks Martin.

-- 
Jens Axboe


^ permalink raw reply

* Re: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-08  8:14 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Christoph Hellwig, Jens Axboe, Sebastian Ott, Sagi Grimberg,
	Max Gurtovoy, Bart Van Assche, Ulf Hansson, Alan Stern,
	Oliver Neukum, linux-block, linux-rdma, linux-mmc, linux-nvme,
	linux-scsi, PDL,MEGARAIDLINUX, PDL-MPT-FUSIONLINUX, linux-hyperv,
	linux-usb, usb-storage, linux-kernel
In-Reply-To: <cd713506efb9579d1f69a719d831c28d@mail.gmail.com>

On Thu, Jun 06, 2019 at 09:07:27PM +0530, Kashyap Desai wrote:
> Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We want
> to confirm few sanity before ACK. BTW, what benefit we will see moving
> virt_boundry setting to SCSI mid layer ? Is it just modular approach OR any
> functional fix ?

The big difference is that virt_boundary now also changes the
max_segment_size, and this ensures that this limit is also communicated
to the DMA mapping layer.

^ permalink raw reply

* [PATCH hyperv-fixes] hv_netvsc: Set probe mode to sync
From: Haiyang Zhang @ 2019-06-10 20:49 UTC (permalink / raw)
  To: sashal@kernel.org, linux-hyperv@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger

For better consistency of synthetic NIC names, we set the probe mode to
PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
channel offer sequence.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 03ea5a7..afdcc56 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2407,7 +2407,7 @@ static int netvsc_remove(struct hv_device *dev)
 	.probe = netvsc_probe,
 	.remove = netvsc_remove,
 	.driver = {
-		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
+		.probe_type = PROBE_FORCE_SYNCHRONOUS,
 	},
 };
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Dmitry Safonov @ 2019-06-11 21:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Prasanna Panchamukhi, Andy Lutomirski,
	Borislav Petkov, Cathy Avery, Haiyang Zhang, H. Peter Anvin,
	Ingo Molnar, K. Y. Srinivasan, Michael Kelley (EOSG),
	Mohammed Gamal, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, Sasha Levin, Stephen Hemminger, Thomas Gleixner,
	Vitaly Kuznetsov, devel, kvm, linux-hyperv, x86

KVM support may be compiled as dynamic module, which triggers the
following splat on modprobe:

 KVM: vmx: using Hyper-V Enlightened VMCS
 BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
 CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
 Call Trace:
  dump_stack+0x61/0x7e
  check_preemption_disabled+0xd4/0xe6
  debug_smp_processor_id+0x17/0x19
  set_hv_tscchange_cb+0x1b/0x89
  kvm_arch_init+0x14a/0x163 [kvm]
  kvm_init+0x30/0x259 [kvm]
  vmx_init+0xed/0x3db [kvm_intel]
  do_one_initcall+0x89/0x1bc
  do_init_module+0x5f/0x207
  load_module+0x1b34/0x209b
  __ia32_sys_init_module+0x17/0x19
  do_fast_syscall_32+0x121/0x1fa
  entry_SYSENTER_compat+0x7f/0x91

The easiest solution seems to be disabling preemption while setting up
reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.

Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
support")

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>

Cc: devel@linuxdriverproject.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/hyperv/hv_init.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1608050e9df9..0bdd79ecbff8 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 	void **input_arg;
 	struct page *pg;
 
@@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
 
 	hv_get_vp_index(msr_vp_index);
 
-	hv_vp_index[smp_processor_id()] = msr_vp_index;
+	hv_vp_index[cpu] = msr_vp_index;
 
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
@@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
 	struct hv_reenlightenment_control re_ctrl = {
 		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
 		.enabled = 1,
-		.target_vp = hv_vp_index[smp_processor_id()]
 	};
 	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
 
@@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
 	/* Make sure callback is registered before we write to MSRs */
 	wmb();
 
+	preempt_disable();
+	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
 	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	preempt_enable();
+
 	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
 }
 EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Peter Zijlstra @ 2019-06-12  9:35 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Prasanna Panchamukhi, Andy Lutomirski,
	Borislav Petkov, Cathy Avery, Haiyang Zhang, H. Peter Anvin,
	Ingo Molnar, K. Y. Srinivasan, Michael Kelley (EOSG),
	Mohammed Gamal, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, Sasha Levin, Stephen Hemminger, Thomas Gleixner,
	Vitaly Kuznetsov, devel, kvm, linux-hyperv, x86
In-Reply-To: <20190611212003.26382-1-dima@arista.com>

On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote:
> KVM support may be compiled as dynamic module, which triggers the
> following splat on modprobe:
> 
>  KVM: vmx: using Hyper-V Enlightened VMCS
>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>  Call Trace:
>   dump_stack+0x61/0x7e
>   check_preemption_disabled+0xd4/0xe6
>   debug_smp_processor_id+0x17/0x19
>   set_hv_tscchange_cb+0x1b/0x89
>   kvm_arch_init+0x14a/0x163 [kvm]
>   kvm_init+0x30/0x259 [kvm]
>   vmx_init+0xed/0x3db [kvm_intel]
>   do_one_initcall+0x89/0x1bc
>   do_init_module+0x5f/0x207
>   load_module+0x1b34/0x209b
>   __ia32_sys_init_module+0x17/0x19
>   do_fast_syscall_32+0x121/0x1fa
>   entry_SYSENTER_compat+0x7f/0x91
> 
> The easiest solution seems to be disabling preemption while setting up
> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
> 
> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
> support")
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Cathy Avery <cavery@redhat.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
> Cc: Mohammed Gamal <mmorsy@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> Cc: devel@linuxdriverproject.org
> Cc: kvm@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1608050e9df9..0bdd79ecbff8 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	void **input_arg;
>  	struct page *pg;
>  
> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +	hv_vp_index[cpu] = msr_vp_index;
>  
>  	if (msr_vp_index > hv_max_vp_index)
>  		hv_max_vp_index = msr_vp_index;
> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  
> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	preempt_disable();
> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	preempt_enable();
> +
>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);

This looks bogus, MSRs are a per-cpu resource, you had better know what
CPUs you're on and be stuck to it when you do wrmsr. This just fudges
the code to make the warning go away and doesn't fix the actual problem
afaict.

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Vitaly Kuznetsov @ 2019-06-12 10:17 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel
  Cc: Prasanna Panchamukhi, Andy Lutomirski, Borislav Petkov,
	Cathy Avery, Haiyang Zhang, H. Peter Anvin, Ingo Molnar,
	K. Y. Srinivasan, Michael Kelley (EOSG), Mohammed Gamal,
	Paolo Bonzini, Radim Krčmář, Roman Kagan,
	Sasha Levin, Stephen Hemminger, Thomas Gleixner, devel, kvm,
	linux-hyperv, x86
In-Reply-To: <20190611212003.26382-1-dima@arista.com>

Dmitry Safonov <dima@arista.com> writes:

> KVM support may be compiled as dynamic module, which triggers the
> following splat on modprobe:
>
>  KVM: vmx: using Hyper-V Enlightened VMCS
>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>  Call Trace:
>   dump_stack+0x61/0x7e
>   check_preemption_disabled+0xd4/0xe6
>   debug_smp_processor_id+0x17/0x19
>   set_hv_tscchange_cb+0x1b/0x89
>   kvm_arch_init+0x14a/0x163 [kvm]
>   kvm_init+0x30/0x259 [kvm]
>   vmx_init+0xed/0x3db [kvm_intel]
>   do_one_initcall+0x89/0x1bc
>   do_init_module+0x5f/0x207
>   load_module+0x1b34/0x209b
>   __ia32_sys_init_module+0x17/0x19
>   do_fast_syscall_32+0x121/0x1fa
>   entry_SYSENTER_compat+0x7f/0x91

Hm, I never noticed this one, you probably need something like
CONFIG_PREEMPT enabled so see it.

>
> The easiest solution seems to be disabling preemption while setting up
> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
>
> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
> support")
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Cathy Avery <cavery@redhat.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
> Cc: Mohammed Gamal <mmorsy@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> Cc: devel@linuxdriverproject.org
> Cc: kvm@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1608050e9df9..0bdd79ecbff8 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	void **input_arg;
>  	struct page *pg;
>  
> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +	hv_vp_index[cpu] = msr_vp_index;
>  
>  	if (msr_vp_index > hv_max_vp_index)
>  		hv_max_vp_index = msr_vp_index;

The above is unrelated cleanup (as cpu == smp_processor_id() for
CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
preempted.

> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  
> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	preempt_disable();
> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	preempt_enable();
> +

My personal preference would be to do something like
   int cpu = get_cpu();

   ... set things up ...

   put_cpu();

instead, there are no long-running things in the whole function. But
what you've done should work too, so

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Vitaly Kuznetsov @ 2019-06-12 10:25 UTC (permalink / raw)
  To: Peter Zijlstra, Dmitry Safonov
  Cc: linux-kernel, Prasanna Panchamukhi, Andy Lutomirski,
	Borislav Petkov, Cathy Avery, Haiyang Zhang, H. Peter Anvin,
	Ingo Molnar, K. Y. Srinivasan, Michael Kelley (EOSG),
	Mohammed Gamal, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, Sasha Levin, Stephen Hemminger, Thomas Gleixner,
	devel, kvm, linux-hyperv, x86
In-Reply-To: <20190612093506.GH3436@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote:
>> KVM support may be compiled as dynamic module, which triggers the
>> following splat on modprobe:
>> 
>>  KVM: vmx: using Hyper-V Enlightened VMCS
>>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>>  Call Trace:
>>   dump_stack+0x61/0x7e
>>   check_preemption_disabled+0xd4/0xe6
>>   debug_smp_processor_id+0x17/0x19
>>   set_hv_tscchange_cb+0x1b/0x89
>>   kvm_arch_init+0x14a/0x163 [kvm]
>>   kvm_init+0x30/0x259 [kvm]
>>   vmx_init+0xed/0x3db [kvm_intel]
>>   do_one_initcall+0x89/0x1bc
>>   do_init_module+0x5f/0x207
>>   load_module+0x1b34/0x209b
>>   __ia32_sys_init_module+0x17/0x19
>>   do_fast_syscall_32+0x121/0x1fa
>>   entry_SYSENTER_compat+0x7f/0x91
>> 
>> The easiest solution seems to be disabling preemption while setting up
>> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
>> 
>> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
>> support")
>> 
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Cathy Avery <cavery@redhat.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
>> Cc: Mohammed Gamal <mmorsy@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Roman Kagan <rkagan@virtuozzo.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> Cc: devel@linuxdriverproject.org
>> Cc: kvm@vger.kernel.org
>> Cc: linux-hyperv@vger.kernel.org
>> Cc: x86@kernel.org
>> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>> ---
>>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 1608050e9df9..0bdd79ecbff8 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>>  static int hv_cpu_init(unsigned int cpu)
>>  {
>>  	u64 msr_vp_index;
>> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>>  	void **input_arg;
>>  	struct page *pg;
>>  
>> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>>  
>>  	hv_get_vp_index(msr_vp_index);
>>  
>> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
>> +	hv_vp_index[cpu] = msr_vp_index;
>>  
>>  	if (msr_vp_index > hv_max_vp_index)
>>  		hv_max_vp_index = msr_vp_index;
>> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	struct hv_reenlightenment_control re_ctrl = {
>>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>>  		.enabled = 1,
>> -		.target_vp = hv_vp_index[smp_processor_id()]
>>  	};
>>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>>  
>> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	/* Make sure callback is registered before we write to MSRs */
>>  	wmb();
>>  
>> +	preempt_disable();
>> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
>> +	preempt_enable();
>> +
>>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>>  }
>>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
>
> This looks bogus, MSRs are a per-cpu resource, you had better know what
> CPUs you're on and be stuck to it when you do wrmsr. This just fudges
> the code to make the warning go away and doesn't fix the actual problem
> afaict.

Actually, we don't care which CPU will receive the reenlightenment
notification and TSC Emulation in Hyper-V is, of course, global. We have
code which re-assignes the notification to some other CPU in case the
one it's currently assigned to goes away (see hv_cpu_die()).

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v2 1/5] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
From: Vitaly Kuznetsov @ 2019-06-12 10:36 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: x86, linux-hyperv, linux-kernel, mikelley, kys, haiyangz,
	sthemmin, sashal
In-Reply-To: <67be3e283c0f28326f9c31a64f399fe659ad5690.1559807514.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
> the Linux guest page size and hypervisor page size concepts are
> different, even though they happen to be the same value on x86.
>
> Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index cdf44aa9a501..44bd68aefd00 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -12,6 +12,16 @@
>  #include <linux/types.h>
>  #include <asm/page.h>
>  
> +/*
> + * While not explicitly listed in the TLFS, Hyper-V always runs with a page size
> + * of 4096. These definitions are used when communicating with Hyper-V using
> + * guest physical pages and guest physical page addresses, since the guest page
> + * size may not be 4096 on all architectures.
> + */
> +#define HV_HYP_PAGE_SHIFT	12
> +#define HV_HYP_PAGE_SIZE	BIT(HV_HYP_PAGE_SHIFT)
> +#define HV_HYP_PAGE_MASK	(~(HV_HYP_PAGE_SIZE - 1))
> +
>  /*
>   * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
>   * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
> @@ -841,7 +851,7 @@ union hv_gpa_page_range {
>   * count is equal with how many entries of union hv_gpa_page_range can
>   * be populated into the input parameter page.
>   */
> -#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) /	\
> +#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) /	\
>  				sizeof(union hv_gpa_page_range))
>  
>  struct hv_guest_mapping_flush_list {

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox