* [PATCH v3 3/5] hv: vmbus: Replace page definition with Hyper-V specific one
From: Maya Nakamura @ 2019-06-18 6:14 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
not be 4096 on all architectures and Hyper-V always runs with a page
size of 4096.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/hv/hyperv_vmbus.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 362e70e9d145..019469c3cbca 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -192,11 +192,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
u64 *requestid, bool raw);
/*
- * Maximum channels is determined by the size of the interrupt page
- * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
- * and the other is receive endpoint interrupt
+ * Maximum channels, 16348, is determined by the size of the interrupt page,
+ * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
+ * interrupt, and the other is to receive endpoint interrupt.
*/
-#define MAX_NUM_CHANNELS ((PAGE_SIZE >> 1) << 3) /* 16348 channels */
+#define MAX_NUM_CHANNELS ((HV_HYP_PAGE_SIZE >> 1) << 3)
/* The value here must be in multiple of 32 */
/* TODO: Need to make this configurable */
--
2.17.1
^ permalink raw reply related
* [PATCH v3 0/5] hv: Remove dependencies on guest page size
From: Maya Nakamura @ 2019-06-18 6:09 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
The Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86. Hyper-V
code mixes up the two, so this patchset begins to address that by
creating and using a set of Hyper-V specific page definitions.
A major benefit of those new definitions is that they support non-x86
architectures, such as ARM64, that use different page sizes. On ARM64,
the guest page size may not be 4096, and Hyper-V always runs with a
page size of 4096.
In this patchset, the first two patches lay the foundation for the
others, creating definitions and preparing for allocation of memory with
the size and alignment that Hyper-V expects as a page. Patch 3 applies
the page size definition where the guest VM and Hyper-V communicate, and
where the code intends to use the Hyper-V page size. The last two
patches set the ring buffer size to a fixed value, removing the
dependency on the guest page size.
This is the initial set of changes to the Hyper-V code, and future
patches will make additional changes using the same foundation, for
example, replace __vmalloc() and related functions when Hyper-V pages
are intended.
Changes in v3:
- [PATCH v2 2/5] Simplify expression for BUILD_BUG_ON().
- Add Link and Reviewed-by tags.
Change in v2:
- [PATCH 2/5] Replace with a new patch.
Maya Nakamura (5):
x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
x86: hv: hv_init.c: Add functions to allocate/deallocate page for
Hyper-V
hv: vmbus: Replace page definition with Hyper-V specific one
HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
arch/x86/hyperv/hv_init.c | 14 ++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
drivers/hid/hid-hyperv.c | 4 ++--
drivers/hv/hyperv_vmbus.h | 8 ++++----
drivers/input/serio/hyperv-keyboard.c | 4 ++--
5 files changed, 33 insertions(+), 9 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH v3 5/5] Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-06-18 6:17 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Define the ring buffer size as a constant expression because it should
not depend on the guest page size.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
drivers/input/serio/hyperv-keyboard.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
index 8e457e50f837..88ae7c2ac3c8 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -75,8 +75,8 @@ struct synth_kbd_keystroke {
#define HK_MAXIMUM_MESSAGE_SIZE 256
-#define KBD_VSC_SEND_RING_BUFFER_SIZE (10 * PAGE_SIZE)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE (10 * PAGE_SIZE)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE (40 * 1024)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE (40 * 1024)
#define XTKBD_EMUL0 0xe0
#define XTKBD_EMUL1 0xe1
--
2.17.1
^ permalink raw reply related
* [PATCH v3 2/5] x86: hv: hv_init.c: Add functions to allocate/deallocate page for Hyper-V
From: Maya Nakamura @ 2019-06-18 6:13 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Introduce two new functions, hv_alloc_hyperv_page() and
hv_free_hyperv_page(), to allocate/deallocate memory with the size and
alignment that Hyper-V expects as a page. Although currently they are
not used, they are ready to be used to allocate/deallocate memory on x86
when their ARM64 counterparts are implemented, keeping symmetry between
architectures with potentially different guest page sizes.
Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/hyperv/hv_init.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0e033ef11a9f..e8960a83add7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
u32 hv_max_vp_index;
EXPORT_SYMBOL_GPL(hv_max_vp_index);
+void *hv_alloc_hyperv_page(void)
+{
+ BUILD_BUG_ON(PAGE_SIZE != HV_HYP_PAGE_SIZE);
+
+ return (void *)__get_free_page(GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
+
+void hv_free_hyperv_page(unsigned long addr)
+{
+ free_page(addr);
+}
+EXPORT_SYMBOL_GPL(hv_free_hyperv_page);
+
static int hv_cpu_init(unsigned int cpu)
{
u64 msr_vp_index;
--
2.17.1
^ permalink raw reply related
* [PATCH v3 4/5] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-06-18 6:15 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Define the ring buffer size as a constant expression because it should
not depend on the guest page size.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
drivers/hid/hid-hyperv.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 7795831d37c2..cc5b09b87ab0 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -104,8 +104,8 @@ struct synthhid_input_report {
#pragma pack(pop)
-#define INPUTVSC_SEND_RING_BUFFER_SIZE (10*PAGE_SIZE)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE (10*PAGE_SIZE)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE (40 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE (40 * 1024)
enum pipe_prot_msg_type {
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 7/8] mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
From: Ming Lei @ 2019-06-18 0:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K . Petersen, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
linux-rdma, Linux SCSI List, megaraidlinux.pdl,
MPT-FusionLinux.pdl, linux-hyperv, Linux Kernel Mailing List
In-Reply-To: <20190617122000.22181-8-hch@lst.de>
On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
>
> When using a virt_boundary_mask, as done for NVMe devices attached to
> mpt3sas controllers we require an unlimited max_segment_size, as the
> virt boundary merging code assumes that. But we also need to propagate
> that to the DMA mapping layer to make dma-debug happy. The SCSI layer
> takes care of that when using the per-host virt_boundary setting, but
> given that mpt3sas only wants to set the virt_boundary for actual
> NVMe devices we can't rely on that. The DMA layer maximum segment
> is global to the HBA however, so we have to set it explicitly. This
> patch assumes that mpt3sas does not have a segment size limitation,
> which seems true based on the SGL format, but will need to be verified.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 1ccfbc7eebe0..c719b807f6d8 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -10222,6 +10222,7 @@ static struct scsi_host_template mpt3sas_driver_template = {
> .this_id = -1,
> .sg_tablesize = MPT3SAS_SG_DEPTH,
> .max_sectors = 32767,
> + .max_segment_size = 0xffffffff,
.max_segment_size should be aligned, either setting it here correctly or
forcing to make it aligned in scsi-core.
Thanks,
Ming Lei
^ permalink raw reply
* Re: [PATCH 1/8] scsi: add a host / host template field for the virt boundary
From: Ming Lei @ 2019-06-18 0:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K . Petersen, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
linux-rdma, Linux SCSI List, megaraidlinux.pdl,
MPT-FusionLinux.pdl, linux-hyperv, Linux Kernel Mailing List
In-Reply-To: <20190617122000.22181-2-hch@lst.de>
On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
>
> This allows drivers setting it up easily instead of branching out to
> block layer calls in slave_alloc, and ensures the upgraded
> max_segment_size setting gets picked up by the DMA layer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/hosts.c | 3 +++
> drivers/scsi/scsi_lib.c | 3 ++-
> include/scsi/scsi_host.h | 3 +++
> 3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index ff0d8c6a8d0c..55522b7162d3 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -462,6 +462,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
> else
> shost->dma_boundary = 0xffffffff;
>
> + if (sht->virt_boundary_mask)
> + shost->virt_boundary_mask = sht->virt_boundary_mask;
> +
> device_initialize(&shost->shost_gendev);
> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
> shost->shost_gendev.bus = &scsi_bus_type;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 65d0a10c76ad..d333bb6b1c59 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1775,7 +1775,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> dma_set_seg_boundary(dev, shost->dma_boundary);
>
> blk_queue_max_segment_size(q, shost->max_segment_size);
> - dma_set_max_seg_size(dev, shost->max_segment_size);
> + blk_queue_virt_boundary(q, shost->virt_boundary_mask);
> + dma_set_max_seg_size(dev, queue_max_segment_size(q));
The patch looks fine, also suggest to make sure that max_segment_size
is block-size aligned, and un-aligned max segment size has caused trouble
on mmc.
Thanks,
Ming Lei
^ permalink raw reply
* Re: [PATCH 6/8] IB/srp: set virt_boundary_mask in the scsi host
From: Bart Van Assche @ 2019-06-17 21:01 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-7-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.
Acked-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply
* Re: [PATCH 4/8] storvsc: set virt_boundary_mask in the scsi host template
From: Bart Van Assche @ 2019-06-17 20:59 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-5-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/storvsc_drv.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index b89269120a2d..7ed6f2fc1446 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1422,9 +1422,6 @@ static int storvsc_device_configure(struct scsi_device *sdevice)
> {
> blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout * HZ));
>
> - /* Ensure there are no gaps in presented sgls */
> - blk_queue_virt_boundary(sdevice->request_queue, PAGE_SIZE - 1);
> -
> sdevice->no_write_same = 1;
>
> /*
> @@ -1697,6 +1694,8 @@ static struct scsi_host_template scsi_driver = {
> .this_id = -1,
> /* Make sure we dont get a sg segment crosses a page boundary */
> .dma_boundary = PAGE_SIZE-1,
> + /* Ensure there are no gaps in presented sgls */
> + .virt_boundary_mask = PAGE_SIZE-1,
> .no_write_same = 1,
> .track_queue_depth = 1,
> };
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply
* Re: [PATCH 3/8] ufshcd: set max_segment_size in the scsi host template
From: Bart Van Assche @ 2019-06-17 20:58 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-4-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> We need to also mirror the value to the device to ensure IOMMU merging
> doesn't undo it, and the SCSI host level parameter will ensure that.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/ufs/ufshcd.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 3fe3029617a8..505d625ed28d 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4587,8 +4587,6 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
> struct request_queue *q = sdev->request_queue;
>
> blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
> - blk_queue_max_segment_size(q, PRDT_DATA_BYTE_COUNT_MAX);
> -
> return 0;
> }
>
> @@ -6991,6 +6989,7 @@ static struct scsi_host_template ufshcd_driver_template = {
> .sg_tablesize = SG_ALL,
> .cmd_per_lun = UFSHCD_CMD_PER_LUN,
> .can_queue = UFSHCD_CAN_QUEUE,
> + .max_segment_size = PRDT_DATA_BYTE_COUNT_MAX,
> .max_host_blocked = 1,
> .track_queue_depth = 1,
> .sdev_groups = ufshcd_driver_groups,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply
* Re: [PATCH 2/8] scsi: take the DMA max mapping size into account
From: Bart Van Assche @ 2019-06-17 20:56 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-3-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> We need to limit the devices max_sectors to what the DMA mapping
> implementation can support. If not we risk running out of swiotlb
> buffers easily.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/scsi_lib.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index d333bb6b1c59..f233bfd84cd7 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> }
>
> + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> + dma_max_mapping_size(dev) << SECTOR_SHIFT);
> blk_queue_max_hw_sectors(q, shost->max_sectors);
> if (shost->unchecked_isa_dma)
> blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
Does dma_max_mapping_size() return a value in bytes? Is
shost->max_sectors a number of sectors? If so, are you sure that "<<
SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
SECTOR_SHIFT" instead?
Additionally, how about adding a comment above dma_max_mapping_size()
that documents the unit of the returned number?
Thanks,
Bart.
^ permalink raw reply
* Re: [PATCH 1/8] scsi: add a host / host template field for the virt boundary
From: Bart Van Assche @ 2019-06-17 20:51 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-2-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 65d0a10c76ad..d333bb6b1c59 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1775,7 +1775,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> dma_set_seg_boundary(dev, shost->dma_boundary);
>
> blk_queue_max_segment_size(q, shost->max_segment_size);
> - dma_set_max_seg_size(dev, shost->max_segment_size);
> + blk_queue_virt_boundary(q, shost->virt_boundary_mask);
> + dma_set_max_seg_size(dev, queue_max_segment_size(q));
Although this looks fine to me for LLDs that own a PCIe device, I doubt
this is correct for SCSI LLDs that share a PCIe device with other ULP
drivers. From the RDMA core:
/* Setup default max segment size for all IB devices */
dma_set_max_seg_size(device->dma_device, SZ_2G);
Will instantiating a SCSI host (iSER or SRP) for an RDMA adapter cause
the maximum segment size to be modified for all ULP drivers associated
with that HCA?
Thanks,
Bart.
^ permalink raw reply
* Re: [PATCH net] hvsock: fix epollout hang from race condition
From: David Miller @ 2019-06-17 20:04 UTC (permalink / raw)
To: sunilmut
Cc: decui, kys, haiyangz, sthemmin, sashal, mikelley, netdev,
linux-hyperv, linux-kernel
In-Reply-To: <MW2PR2101MB11168BA3D46BEC843D694E04C0EB0@MW2PR2101MB1116.namprd21.prod.outlook.com>
From: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Mon, 17 Jun 2019 19:27:45 +0000
> The patch does not change at all. So, I was hoping we could reapply
> it. But, I have resubmitted the patch. Thanks.
It's easy for me to track things if you just resubmit the patch.
That's why I ask for things to be done this way, it helps my workflow
a lot.
Thank you.
^ permalink raw reply
* RE: [PATCH net] hvsock: fix epollout hang from race condition
From: Sunil Muthuswamy @ 2019-06-17 19:27 UTC (permalink / raw)
To: David Miller
Cc: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, Michael Kelley, netdev@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190617.115615.91633577273679753.davem@davemloft.net>
> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Monday, June 17, 2019 11:56 AM
> To: Sunil Muthuswamy <sunilmut@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; sashal@kernel.org; Michael Kelley <mikelley@microsoft.com>; netdev@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net] hvsock: fix epollout hang from race condition
>
> From: Sunil Muthuswamy <sunilmut@microsoft.com>
> Date: Mon, 17 Jun 2019 18:47:08 +0000
>
> >
> >
> >> -----Original Message-----
> >> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-owner@vger.kernel.org> On Behalf Of David Miller
> >> Sent: Sunday, June 16, 2019 1:55 PM
> >> To: Dexuan Cui <decui@microsoft.com>
> >> Cc: Sunil Muthuswamy <sunilmut@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>;
> >> Stephen Hemminger <sthemmin@microsoft.com>; sashal@kernel.org; Michael Kelley <mikelley@microsoft.com>;
> >> netdev@vger.kernel.org; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH net] hvsock: fix epollout hang from race condition
> >>
> >> From: Dexuan Cui <decui@microsoft.com>
> >> Date: Sat, 15 Jun 2019 03:22:32 +0000
> >>
> >> > These warnings are not introduced by this patch from Sunil.
> >> >
> >> > I'm not sure why I didn't notice these warnings before.
> >> > Probably my gcc version is not new eought?
> >> >
> >> > Actually these warnings are bogus, as I checked the related functions,
> >> > which may confuse the compiler's static analysis.
> >> >
> >> > I'm going to make a patch to initialize the pointers to NULL to suppress
> >> > the warnings. My patch will be based on the latest's net.git + this patch
> >> > from Sunil.
> >>
> >> Sunil should then resubmit his patch against something that has the
> >> warning suppression patch applied.
> >
> > David, Dexuan's patch to suppress the warnings seems to be applied now
> > to the 'net' branch. Can we please get this patch applied as well?
>
> I don't know how else to say "Suni should then resubmit his patch"
>
> Please just resubmit it!
The patch does not change at all. So, I was hoping we could reapply it. But, I have
resubmitted the patch. Thanks.
^ permalink raw reply
* [PATCH net v2] hvsock: fix epollout hang from race condition
From: Sunil Muthuswamy @ 2019-06-17 19:26 UTC (permalink / raw)
To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
David S. Miller, Dexuan Cui, Michael Kelley
Cc: netdev@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org
Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
not return even when the hvsock socket is writable, under some race
condition. This can happen under the following sequence:
- fd = socket(hvsocket)
- fd_out = dup(fd)
- fd_in = dup(fd)
- start a writer thread that writes data to fd_out with a combination of
epoll_wait(fd_out, EPOLLOUT) and
- start a reader thread that reads data from fd_in with a combination of
epoll_wait(fd_in, EPOLLIN)
- On the host, there are two threads that are reading/writing data to the
hvsocket
stack:
hvs_stream_has_space
hvs_notify_poll_out
vsock_poll
sock_poll
ep_poll
Race condition:
check for epollout from ep_poll():
assume no writable space in the socket
hvs_stream_has_space() returns 0
check for epollin from ep_poll():
assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
hvs_stream_has_space() will clear the channel pending send size
host will not notify the guest because the pending send size has
been cleared and so the hvsocket will never mark the
socket writable
Now, the EPOLLOUT will never return even if the socket write buffer is
empty.
The fix is to set the pending size to the default size and never change it.
This way the host will always notify the guest whenever the writable space
is bigger than the pending size. The host is already optimized to *only*
notify the guest when the pending size threshold boundary is crossed and
not everytime.
This change also reduces the cpu usage somewhat since hv_stream_has_space()
is in the hotpath of send:
vsock_stream_sendmsg()->hv_stream_has_space()
Earlier hv_stream_has_space was setting/clearing the pending size on every
call.
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
---
- Resubmitting the patch after taking care of some spurious warnings.
net/vmw_vsock/hyperv_transport.c | 39 ++++++++-------------------------------
1 file changed, 8 insertions(+), 31 deletions(-)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e4801c7..cd3f47f 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -220,18 +220,6 @@ static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
set_channel_pending_send_size(chan,
HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
- /* See hvs_stream_has_space(): we must make sure the host has seen
- * the new pending send size, before we can re-check the writable
- * bytes.
- */
- virt_mb();
-}
-
-static void hvs_clear_channel_pending_send_size(struct vmbus_channel *chan)
-{
- set_channel_pending_send_size(chan, 0);
-
- /* Ditto */
virt_mb();
}
@@ -301,9 +289,6 @@ static void hvs_channel_cb(void *ctx)
if (hvs_channel_readable(chan))
sk->sk_data_ready(sk);
- /* See hvs_stream_has_space(): when we reach here, the writable bytes
- * may be already less than HVS_PKT_LEN(HVS_SEND_BUF_SIZE).
- */
if (hv_get_bytes_to_write(&chan->outbound) > 0)
sk->sk_write_space(sk);
}
@@ -404,6 +389,13 @@ static void hvs_open_connection(struct vmbus_channel *chan)
set_per_channel_state(chan, conn_from_host ? new : sk);
vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
+ /* Set the pending send size to max packet size to always get
+ * notifications from the host when there is enough writable space.
+ * The host is optimized to send notifications only when the pending
+ * size boundary is crossed, and not always.
+ */
+ hvs_set_channel_pending_send_size(chan);
+
if (conn_from_host) {
new->sk_state = TCP_ESTABLISHED;
sk->sk_ack_backlog++;
@@ -697,23 +689,8 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
static s64 hvs_stream_has_space(struct vsock_sock *vsk)
{
struct hvsock *hvs = vsk->trans;
- struct vmbus_channel *chan = hvs->chan;
- s64 ret;
-
- ret = hvs_channel_writable_bytes(chan);
- if (ret > 0) {
- hvs_clear_channel_pending_send_size(chan);
- } else {
- /* See hvs_channel_cb() */
- hvs_set_channel_pending_send_size(chan);
-
- /* Re-check the writable bytes to avoid race */
- ret = hvs_channel_writable_bytes(chan);
- if (ret > 0)
- hvs_clear_channel_pending_send_size(chan);
- }
- return ret;
+ return hvs_channel_writable_bytes(hvs->chan);
}
static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net] hvsock: fix epollout hang from race condition
From: David Miller @ 2019-06-17 18:56 UTC (permalink / raw)
To: sunilmut
Cc: decui, kys, haiyangz, sthemmin, sashal, mikelley, netdev,
linux-hyperv, linux-kernel
In-Reply-To: <MW2PR2101MB111697FDA0BEDA81237FECB3C0EB0@MW2PR2101MB1116.namprd21.prod.outlook.com>
From: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Mon, 17 Jun 2019 18:47:08 +0000
>
>
>> -----Original Message-----
>> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-owner@vger.kernel.org> On Behalf Of David Miller
>> Sent: Sunday, June 16, 2019 1:55 PM
>> To: Dexuan Cui <decui@microsoft.com>
>> Cc: Sunil Muthuswamy <sunilmut@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
>> Stephen Hemminger <sthemmin@microsoft.com>; sashal@kernel.org; Michael Kelley <mikelley@microsoft.com>;
>> netdev@vger.kernel.org; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH net] hvsock: fix epollout hang from race condition
>>
>> From: Dexuan Cui <decui@microsoft.com>
>> Date: Sat, 15 Jun 2019 03:22:32 +0000
>>
>> > These warnings are not introduced by this patch from Sunil.
>> >
>> > I'm not sure why I didn't notice these warnings before.
>> > Probably my gcc version is not new eought?
>> >
>> > Actually these warnings are bogus, as I checked the related functions,
>> > which may confuse the compiler's static analysis.
>> >
>> > I'm going to make a patch to initialize the pointers to NULL to suppress
>> > the warnings. My patch will be based on the latest's net.git + this patch
>> > from Sunil.
>>
>> Sunil should then resubmit his patch against something that has the
>> warning suppression patch applied.
>
> David, Dexuan's patch to suppress the warnings seems to be applied now
> to the 'net' branch. Can we please get this patch applied as well?
I don't know how else to say "Suni should then resubmit his patch"
Please just resubmit it!
^ permalink raw reply
* RE: [PATCH net] hvsock: fix epollout hang from race condition
From: Sunil Muthuswamy @ 2019-06-17 18:47 UTC (permalink / raw)
To: David Miller, Dexuan Cui
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, Michael Kelley, netdev@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190616.135445.822152500838073831.davem@davemloft.net>
> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-owner@vger.kernel.org> On Behalf Of David Miller
> Sent: Sunday, June 16, 2019 1:55 PM
> To: Dexuan Cui <decui@microsoft.com>
> Cc: Sunil Muthuswamy <sunilmut@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; sashal@kernel.org; Michael Kelley <mikelley@microsoft.com>;
> netdev@vger.kernel.org; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net] hvsock: fix epollout hang from race condition
>
> From: Dexuan Cui <decui@microsoft.com>
> Date: Sat, 15 Jun 2019 03:22:32 +0000
>
> > These warnings are not introduced by this patch from Sunil.
> >
> > I'm not sure why I didn't notice these warnings before.
> > Probably my gcc version is not new eought?
> >
> > Actually these warnings are bogus, as I checked the related functions,
> > which may confuse the compiler's static analysis.
> >
> > I'm going to make a patch to initialize the pointers to NULL to suppress
> > the warnings. My patch will be based on the latest's net.git + this patch
> > from Sunil.
>
> Sunil should then resubmit his patch against something that has the
> warning suppression patch applied.
David, Dexuan's patch to suppress the warnings seems to be applied now
to the 'net' branch. Can we please get this patch applied as well?
^ permalink raw reply
* Re: [PATCH 6/8] IB/srp: set virt_boundary_mask in the scsi host
From: Sagi Grimberg @ 2019-06-17 17:35 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Max Gurtovoy, Bart Van Assche, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-7-hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply
* Re: [PATCH 5/8] IB/iser: set virt_boundary_mask in the scsi host
From: Sagi Grimberg @ 2019-06-17 17:34 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Max Gurtovoy, Bart Van Assche, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-6-hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply
* RE: [PATCH] scsi: storvsc: Add ability to change scsi queue depth
From: KY Srinivasan @ 2019-06-17 17:08 UTC (permalink / raw)
To: Michael Kelley, brandonbonaby94, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com
Cc: brandonbonaby94, linux-hyperv@vger.kernel.org,
linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <BL0PR2101MB13482F73342F77866D6A6AABD7E90@BL0PR2101MB1348.namprd21.prod.outlook.com>
> -----Original Message-----
> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Saturday, June 15, 2019 11:19 AM
> To: brandonbonaby94 <brandonbonaby94@gmail.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; sashal@kernel.org;
> jejb@linux.ibm.com; martin.petersen@oracle.com
> Cc: brandonbonaby94 <brandonbonaby94@gmail.com>; linux-
> hyperv@vger.kernel.org; linux-scsi@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: RE: [PATCH] scsi: storvsc: Add ability to change scsi queue depth
>
> From: Branden Bonaby <brandonbonaby94@gmail.com> Sent: Friday, June
> 14, 2019 4:48 PM
> >
> > Adding functionality to allow the SCSI queue depth to be changed,
> > by utilizing the "scsi_change_queue_depth" function.
> >
> > Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
> > ---
> > drivers/scsi/storvsc_drv.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index 8472de1007ff..719ca9906fc2 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -387,6 +387,7 @@ enum storvsc_request_type {
> >
> > static int storvsc_ringbuffer_size = (128 * 1024);
> > static u32 max_outstanding_req_per_channel;
> > +static int storvsc_change_queue_depth(struct scsi_device *sdev, int
> queue_depth);
> >
> > static int storvsc_vcpus_per_sub_channel = 4;
> >
> > @@ -1711,6 +1712,7 @@ static struct scsi_host_template scsi_driver = {
> > .dma_boundary = PAGE_SIZE-1,
> > .no_write_same = 1,
> > .track_queue_depth = 1,
> > + .change_queue_depth = storvsc_change_queue_depth,
> > };
> >
> > enum {
> > @@ -1917,6 +1919,15 @@ static int storvsc_probe(struct hv_device
> *device,
> > return ret;
> > }
> >
> > +/* Change a scsi target's queue depth */
> > +static int storvsc_change_queue_depth(struct scsi_device *sdev, int
> queue_depth)
> > +{
> > + if (queue_depth > scsi_driver.can_queue){
> > + queue_depth = scsi_driver.can_queue;
> > + }
> > + return scsi_change_queue_depth(sdev, queue_depth);
> > +}
> > +
> > static int storvsc_remove(struct hv_device *dev)
> > {
> > struct storvsc_device *stor_device = hv_get_drvdata(dev);
> > --
> > 2.17.1
>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
^ permalink raw reply
* [PATCHv2] x86/hyperv: Hold cpus_read_lock() on assigning reenlightenment vector
From: Dmitry Safonov @ 2019-06-17 16:39 UTC (permalink / raw)
To: linux-kernel
Cc: Dmitry Safonov, Dmitry Safonov, Prasanna Panchamukhi,
Andy Lutomirski, Borislav Petkov, Cathy Avery, Haiyang Zhang,
H. Peter Anvin, Ingo Molnar, K. Y. Srinivasan,
Michael Kelley (EOSG), Mohammed Gamal, Paolo Bonzini,
Peter Zijlstra, Radim Krčmář, Roman Kagan,
Sasha Levin, Stephen Hemminger, Thomas Gleixner, Vitaly Kuznetsov,
devel, kvm, linux-hyperv, x86
KVM support may be compiled as dynamic module, which triggers the
following splat on modprobe (under CONFIG_DEBUG_PREEMPT):
KVM: vmx: using Hyper-V Enlightened VMCS
BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466
caller is debug_smp_processor_id+0x17/0x19
CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
Call Trace:
dump_stack+0x61/0x7e
check_preemption_disabled+0xd4/0xe6
debug_smp_processor_id+0x17/0x19
set_hv_tscchange_cb+0x1b/0x89
kvm_arch_init+0x14a/0x163 [kvm]
kvm_init+0x30/0x259 [kvm]
vmx_init+0xed/0x3db [kvm_intel]
do_one_initcall+0x89/0x1bc
do_init_module+0x5f/0x207
load_module+0x1b34/0x209b
__ia32_sys_init_module+0x17/0x19
do_fast_syscall_32+0x121/0x1fa
entry_SYSENTER_compat+0x7f/0x91
Hold cpus_read_lock() so that MSR will be written for an online CPU,
even if set_hv_tscchange_cb() gets being preempted.
While at it, cleanup smp_processor_id()'s in hv_cpu_init() and add a
lockdep assert into hv_cpu_die().
Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
support")
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
v1 link: lkml.kernel.org/r/20190611212003.26382-1-dima@arista.com
NOTE that I hadn't a chance to test v2 on hyperv machine so far,
ONLY BUILD TESTED. (In hope that the patch still makes sense and Kbuild
bot will report any issue).
arch/x86/hyperv/hv_init.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1608050e9df9..ec7fd7d6c125 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
#include <linux/clockchips.h>
#include <linux/hyperv.h>
#include <linux/slab.h>
+#include <linux/cpu.h>
#include <linux/cpuhotplug.h>
#ifdef CONFIG_HYPERV_TSCPAGE
@@ -91,7 +92,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
static int hv_cpu_init(unsigned int cpu)
{
u64 msr_vp_index;
- struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+ struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
void **input_arg;
struct page *pg;
@@ -103,7 +104,7 @@ static int hv_cpu_init(unsigned int cpu)
hv_get_vp_index(msr_vp_index);
- hv_vp_index[smp_processor_id()] = msr_vp_index;
+ hv_vp_index[cpu] = msr_vp_index;
if (msr_vp_index > hv_max_vp_index)
hv_max_vp_index = msr_vp_index;
@@ -182,7 +183,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
struct hv_reenlightenment_control re_ctrl = {
.vector = HYPERV_REENLIGHTENMENT_VECTOR,
.enabled = 1,
- .target_vp = hv_vp_index[smp_processor_id()]
};
struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
@@ -196,7 +196,16 @@ void set_hv_tscchange_cb(void (*cb)(void))
/* Make sure callback is registered before we write to MSRs */
wmb();
+ /*
+ * As reenlightenment vector is global, there is no difference which
+ * CPU will register MSR, though it should be an online CPU.
+ * hv_cpu_die() callback guarantees that on CPU teardown
+ * another CPU will re-register MSR back.
+ */
+ cpus_read_lock();
+ re_ctrl.target_vp = hv_vp_index[raw_smp_processor_id()];
wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+ cpus_read_unlock();
wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
}
EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
@@ -239,6 +248,7 @@ static int hv_cpu_die(unsigned int cpu)
rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
if (re_ctrl.target_vp == hv_vp_index[cpu]) {
+ lockdep_assert_cpus_held();
/* Reassign to some other online CPU */
new_cpu = cpumask_any_but(cpu_online_mask, cpu);
--
2.22.0
^ permalink raw reply related
* Re: [PATCH] ACPI: PM: Export the function acpi_sleep_state_supported()
From: Lorenzo Pieralisi @ 2019-06-17 16:14 UTC (permalink / raw)
To: Dexuan Cui
Cc: Michael Kelley, linux-acpi@vger.kernel.org, rjw@rjwysocki.net,
lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
Russell King, Russ Dill, Sebastian Capella, Pavel Machek,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB01699020B5BC4287C58F5335BFEE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
On Fri, Jun 14, 2019 at 10:19:02PM +0000, Dexuan Cui wrote:
> > -----Original Message-----
> > From: Michael Kelley <mikelley@microsoft.com>
> > Sent: Friday, June 14, 2019 1:48 PM
> > To: Dexuan Cui <decui@microsoft.com>; linux-acpi@vger.kernel.org;
> > rjw@rjwysocki.net; lenb@kernel.org; robert.moore@intel.com;
> > erik.schmauss@intel.com
> > Cc: linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; KY Srinivasan
> > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> > Haiyang Zhang <haiyangz@microsoft.com>; Sasha Levin
> > <Alexander.Levin@microsoft.com>; olaf@aepfle.de; apw@canonical.com;
> > jasowang@redhat.com; vkuznets <vkuznets@redhat.com>;
> > marcelo.cerri@canonical.com
> > Subject: RE: [PATCH] ACPI: PM: Export the function
> > acpi_sleep_state_supported()
> >
> > From: Dexuan Cui <decui@microsoft.com> Sent: Friday, June 14, 2019 11:19
> > AM
> > >
> > > In a Linux VM running on Hyper-V, when ACPI S4 is enabled, the balloon
> > > driver (drivers/hv/hv_balloon.c) needs to ask the host not to do memory
> > > hot-add/remove.
> > >
> > > So let's export acpi_sleep_state_supported() for the hv_balloon driver.
> > > This might also be useful to the other drivers in the future.
> > >
> > > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > > ---
> > > drivers/acpi/sleep.c | 3 ++-
> > > include/acpi/acpi_bus.h | 2 ++
> > > 2 files changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
> > > index a34deccd7317..69755411e008 100644
> > > --- a/drivers/acpi/sleep.c
> > > +++ b/drivers/acpi/sleep.c
> > > @@ -79,7 +79,7 @@ static int acpi_sleep_prepare(u32 acpi_state)
> > > return 0;
> > > }
> > >
> > > -static bool acpi_sleep_state_supported(u8 sleep_state)
> > > +bool acpi_sleep_state_supported(u8 sleep_state)
> > > {
> > > acpi_status status;
> > > u8 type_a, type_b;
> > > @@ -89,6 +89,7 @@ static bool acpi_sleep_state_supported(u8 sleep_state)
> > > || (acpi_gbl_FADT.sleep_control.address
> > > && acpi_gbl_FADT.sleep_status.address));
> > > }
> > > +EXPORT_SYMBOL_GPL(acpi_sleep_state_supported);
> > >
> > > #ifdef CONFIG_ACPI_SLEEP
> > > static u32 acpi_target_sleep_state = ACPI_STATE_S0;
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> > > index 31b6c87d6240..5b102e7bbf25 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -651,6 +651,8 @@ static inline int acpi_pm_set_bridge_wakeup(struct
> > device *dev,
> > > bool enable)
> > > }
> > > #endif
> > >
> > > +bool acpi_sleep_state_supported(u8 sleep_state);
> > > +
> > > #ifdef CONFIG_ACPI_SLEEP
> > > u32 acpi_target_system_state(void);
> > > #else
> > > --
> > > 2.19.1
> >
> > It seems that sleep.c isn't built when on the ARM64 architecture. Using
> > acpi_sleep_state_supported() directly in hv_balloon.c will be problematic
> > since hv_balloon.c needs to be architecture independent when the
> > Hyper-V ARM64 support is added. If that doesn't change, a per-architecture
> > wrapper will be needed to give hv_balloon.c the correct information. This
> > may affect whether acpi_sleep_state_supported() needs to be exported vs.
> > just removing the "static". I'm not sure what the best approach is.
> >
> > Michael
>
> + some ARM experts who worked on arch/arm/kernel/hibernate.c.
>
> drivers/acpi/sleep.c is only built if ACPI_SYSTEM_POWER_STATES_SUPPORT
> is defined, but it looks this option is not defined on ARM.
>
> It looks ARM does not support the ACPI S4 state, then how do we know
> if an ARM host supports hibernation or not?
Maybe we should start from understanding why you need to know whether
Hibernate is possible to answer your question ?
On ARM64 platforms system states are entered through PSCI firmware
interface that works for ACPI and device tree alike.
Lorenzo
^ permalink raw reply
* Re: [PATCH] ACPI: PM: Export the function acpi_sleep_state_supported()
From: Pavel Machek @ 2019-06-17 13:09 UTC (permalink / raw)
To: Dexuan Cui
Cc: Michael Kelley, linux-acpi@vger.kernel.org, rjw@rjwysocki.net,
lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
Russell King, Russ Dill, Sebastian Capella, Lorenzo Pieralisi,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB01699020B5BC4287C58F5335BFEE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]
> > It seems that sleep.c isn't built when on the ARM64 architecture. Using
> > acpi_sleep_state_supported() directly in hv_balloon.c will be problematic
> > since hv_balloon.c needs to be architecture independent when the
> > Hyper-V ARM64 support is added. If that doesn't change, a per-architecture
> > wrapper will be needed to give hv_balloon.c the correct information. This
> > may affect whether acpi_sleep_state_supported() needs to be exported vs.
> > just removing the "static". I'm not sure what the best approach is.
> >
> > Michael
>
> + some ARM experts who worked on arch/arm/kernel/hibernate.c.
>
> drivers/acpi/sleep.c is only built if ACPI_SYSTEM_POWER_STATES_SUPPORT
> is defined, but it looks this option is not defined on ARM.
>
> It looks ARM does not support the ACPI S4 state, then how do we know
> if an ARM host supports hibernation or not?
But actually... I remember ELCE talk about hibernation or ARM32. Not
sure if patches are mainline, but someone was working on that.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [PATCH] ACPI: PM: Export the function acpi_sleep_state_supported()
From: Pavel Machek @ 2019-06-17 13:08 UTC (permalink / raw)
To: Dexuan Cui
Cc: Michael Kelley, linux-acpi@vger.kernel.org, rjw@rjwysocki.net,
lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
Russell King, Russ Dill, Sebastian Capella, Lorenzo Pieralisi,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB01699020B5BC4287C58F5335BFEE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]
Hi!
> > > In a Linux VM running on Hyper-V, when ACPI S4 is enabled, the balloon
> > > driver (drivers/hv/hv_balloon.c) needs to ask the host not to do memory
> > > hot-add/remove.
> > >
> > > So let's export acpi_sleep_state_supported() for the hv_balloon driver.
> > > This might also be useful to the other drivers in the future.
> > >
> > > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> >
> > It seems that sleep.c isn't built when on the ARM64 architecture. Using
> > acpi_sleep_state_supported() directly in hv_balloon.c will be problematic
> > since hv_balloon.c needs to be architecture independent when the
> > Hyper-V ARM64 support is added. If that doesn't change, a per-architecture
> > wrapper will be needed to give hv_balloon.c the correct information. This
> > may affect whether acpi_sleep_state_supported() needs to be exported vs.
> > just removing the "static". I'm not sure what the best approach is.
> >
> > Michael
>
> + some ARM experts who worked on arch/arm/kernel/hibernate.c.
>
> drivers/acpi/sleep.c is only built if ACPI_SYSTEM_POWER_STATES_SUPPORT
> is defined, but it looks this option is not defined on ARM.
>
> It looks ARM does not support the ACPI S4 state, then how do we know
> if an ARM host supports hibernation or not?
You should be able to do hibernation without ACPI S4 support. All you
need is ability to powerdown...
It is well possible that noone tested hibernation on ARM.. people
usually do suspend-to-ram there.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* [PATCH 2/8] scsi: take the DMA max mapping size into account
From: Christoph Hellwig @ 2019-06-17 12:19 UTC (permalink / raw)
To: Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, Bart Van Assche, linux-rdma,
linux-scsi, megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-1-hch@lst.de>
We need to limit the devices max_sectors to what the DMA mapping
implementation can support. If not we risk running out of swiotlb
buffers easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/scsi_lib.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d333bb6b1c59..f233bfd84cd7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
}
+ shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+ dma_max_mapping_size(dev) << SECTOR_SHIFT);
blk_queue_max_hw_sectors(q, shost->max_sectors);
if (shost->unchecked_isa_dma)
blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
--
2.20.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox