* Re: [PATCH 4/8] storvsc: set virt_boundary_mask in the scsi host template
From: Bart Van Assche @ 2019-06-17 20:59 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-5-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/storvsc_drv.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index b89269120a2d..7ed6f2fc1446 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1422,9 +1422,6 @@ static int storvsc_device_configure(struct scsi_device *sdevice)
> {
> blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout * HZ));
>
> - /* Ensure there are no gaps in presented sgls */
> - blk_queue_virt_boundary(sdevice->request_queue, PAGE_SIZE - 1);
> -
> sdevice->no_write_same = 1;
>
> /*
> @@ -1697,6 +1694,8 @@ static struct scsi_host_template scsi_driver = {
> .this_id = -1,
> /* Make sure we dont get a sg segment crosses a page boundary */
> .dma_boundary = PAGE_SIZE-1,
> + /* Ensure there are no gaps in presented sgls */
> + .virt_boundary_mask = PAGE_SIZE-1,
> .no_write_same = 1,
> .track_queue_depth = 1,
> };
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply
* Re: [PATCH 6/8] IB/srp: set virt_boundary_mask in the scsi host
From: Bart Van Assche @ 2019-06-17 21:01 UTC (permalink / raw)
To: Christoph Hellwig, Martin K . Petersen
Cc: Sagi Grimberg, Max Gurtovoy, linux-rdma, linux-scsi,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
linux-kernel
In-Reply-To: <20190617122000.22181-7-hch@lst.de>
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.
Acked-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply
* Re: [PATCH 1/8] scsi: add a host / host template field for the virt boundary
From: Ming Lei @ 2019-06-18 0:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K . Petersen, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
linux-rdma, Linux SCSI List, megaraidlinux.pdl,
MPT-FusionLinux.pdl, linux-hyperv, Linux Kernel Mailing List
In-Reply-To: <20190617122000.22181-2-hch@lst.de>
On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
>
> This allows drivers setting it up easily instead of branching out to
> block layer calls in slave_alloc, and ensures the upgraded
> max_segment_size setting gets picked up by the DMA layer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/hosts.c | 3 +++
> drivers/scsi/scsi_lib.c | 3 ++-
> include/scsi/scsi_host.h | 3 +++
> 3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index ff0d8c6a8d0c..55522b7162d3 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -462,6 +462,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
> else
> shost->dma_boundary = 0xffffffff;
>
> + if (sht->virt_boundary_mask)
> + shost->virt_boundary_mask = sht->virt_boundary_mask;
> +
> device_initialize(&shost->shost_gendev);
> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
> shost->shost_gendev.bus = &scsi_bus_type;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 65d0a10c76ad..d333bb6b1c59 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1775,7 +1775,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> dma_set_seg_boundary(dev, shost->dma_boundary);
>
> blk_queue_max_segment_size(q, shost->max_segment_size);
> - dma_set_max_seg_size(dev, shost->max_segment_size);
> + blk_queue_virt_boundary(q, shost->virt_boundary_mask);
> + dma_set_max_seg_size(dev, queue_max_segment_size(q));
The patch looks fine, also suggest to make sure that max_segment_size
is block-size aligned, and un-aligned max segment size has caused trouble
on mmc.
Thanks,
Ming Lei
^ permalink raw reply
* Re: [PATCH 7/8] mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
From: Ming Lei @ 2019-06-18 0:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K . Petersen, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
linux-rdma, Linux SCSI List, megaraidlinux.pdl,
MPT-FusionLinux.pdl, linux-hyperv, Linux Kernel Mailing List
In-Reply-To: <20190617122000.22181-8-hch@lst.de>
On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
>
> When using a virt_boundary_mask, as done for NVMe devices attached to
> mpt3sas controllers we require an unlimited max_segment_size, as the
> virt boundary merging code assumes that. But we also need to propagate
> that to the DMA mapping layer to make dma-debug happy. The SCSI layer
> takes care of that when using the per-host virt_boundary setting, but
> given that mpt3sas only wants to set the virt_boundary for actual
> NVMe devices we can't rely on that. The DMA layer maximum segment
> is global to the HBA however, so we have to set it explicitly. This
> patch assumes that mpt3sas does not have a segment size limitation,
> which seems true based on the SGL format, but will need to be verified.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 1ccfbc7eebe0..c719b807f6d8 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -10222,6 +10222,7 @@ static struct scsi_host_template mpt3sas_driver_template = {
> .this_id = -1,
> .sg_tablesize = MPT3SAS_SG_DEPTH,
> .max_sectors = 32767,
> + .max_segment_size = 0xffffffff,
.max_segment_size should be aligned, either setting it here correctly or
forcing to make it aligned in scsi-core.
Thanks,
Ming Lei
^ permalink raw reply
* [PATCH v3 4/5] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-06-18 6:15 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Define the ring buffer size as a constant expression because it should
not depend on the guest page size.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
drivers/hid/hid-hyperv.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 7795831d37c2..cc5b09b87ab0 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -104,8 +104,8 @@ struct synthhid_input_report {
#pragma pack(pop)
-#define INPUTVSC_SEND_RING_BUFFER_SIZE (10*PAGE_SIZE)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE (10*PAGE_SIZE)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE (40 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE (40 * 1024)
enum pipe_prot_msg_type {
--
2.17.1
^ permalink raw reply related
* [PATCH v3 2/5] x86: hv: hv_init.c: Add functions to allocate/deallocate page for Hyper-V
From: Maya Nakamura @ 2019-06-18 6:13 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Introduce two new functions, hv_alloc_hyperv_page() and
hv_free_hyperv_page(), to allocate/deallocate memory with the size and
alignment that Hyper-V expects as a page. Although currently they are
not used, they are ready to be used to allocate/deallocate memory on x86
when their ARM64 counterparts are implemented, keeping symmetry between
architectures with potentially different guest page sizes.
Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/hyperv/hv_init.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0e033ef11a9f..e8960a83add7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
u32 hv_max_vp_index;
EXPORT_SYMBOL_GPL(hv_max_vp_index);
+void *hv_alloc_hyperv_page(void)
+{
+ BUILD_BUG_ON(PAGE_SIZE != HV_HYP_PAGE_SIZE);
+
+ return (void *)__get_free_page(GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
+
+void hv_free_hyperv_page(unsigned long addr)
+{
+ free_page(addr);
+}
+EXPORT_SYMBOL_GPL(hv_free_hyperv_page);
+
static int hv_cpu_init(unsigned int cpu)
{
u64 msr_vp_index;
--
2.17.1
^ permalink raw reply related
* [PATCH v3 5/5] Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-06-18 6:17 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Define the ring buffer size as a constant expression because it should
not depend on the guest page size.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
drivers/input/serio/hyperv-keyboard.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
index 8e457e50f837..88ae7c2ac3c8 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -75,8 +75,8 @@ struct synth_kbd_keystroke {
#define HK_MAXIMUM_MESSAGE_SIZE 256
-#define KBD_VSC_SEND_RING_BUFFER_SIZE (10 * PAGE_SIZE)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE (10 * PAGE_SIZE)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE (40 * 1024)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE (40 * 1024)
#define XTKBD_EMUL0 0xe0
#define XTKBD_EMUL1 0xe1
--
2.17.1
^ permalink raw reply related
* [PATCH v3 0/5] hv: Remove dependencies on guest page size
From: Maya Nakamura @ 2019-06-18 6:09 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
The Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86. Hyper-V
code mixes up the two, so this patchset begins to address that by
creating and using a set of Hyper-V specific page definitions.
A major benefit of those new definitions is that they support non-x86
architectures, such as ARM64, that use different page sizes. On ARM64,
the guest page size may not be 4096, and Hyper-V always runs with a
page size of 4096.
In this patchset, the first two patches lay the foundation for the
others, creating definitions and preparing for allocation of memory with
the size and alignment that Hyper-V expects as a page. Patch 3 applies
the page size definition where the guest VM and Hyper-V communicate, and
where the code intends to use the Hyper-V page size. The last two
patches set the ring buffer size to a fixed value, removing the
dependency on the guest page size.
This is the initial set of changes to the Hyper-V code, and future
patches will make additional changes using the same foundation, for
example, replace __vmalloc() and related functions when Hyper-V pages
are intended.
Changes in v3:
- [PATCH v2 2/5] Simplify expression for BUILD_BUG_ON().
- Add Link and Reviewed-by tags.
Change in v2:
- [PATCH 2/5] Replace with a new patch.
Maya Nakamura (5):
x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
x86: hv: hv_init.c: Add functions to allocate/deallocate page for
Hyper-V
hv: vmbus: Replace page definition with Hyper-V specific one
HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
arch/x86/hyperv/hv_init.c | 14 ++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
drivers/hid/hid-hyperv.c | 4 ++--
drivers/hv/hyperv_vmbus.h | 8 ++++----
drivers/input/serio/hyperv-keyboard.c | 4 ++--
5 files changed, 33 insertions(+), 9 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH v3 3/5] hv: vmbus: Replace page definition with Hyper-V specific one
From: Maya Nakamura @ 2019-06-18 6:14 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
not be 4096 on all architectures and Hyper-V always runs with a page
size of 4096.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/hv/hyperv_vmbus.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 362e70e9d145..019469c3cbca 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -192,11 +192,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
u64 *requestid, bool raw);
/*
- * Maximum channels is determined by the size of the interrupt page
- * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
- * and the other is receive endpoint interrupt
+ * Maximum channels, 16348, is determined by the size of the interrupt page,
+ * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
+ * interrupt, and the other is to receive endpoint interrupt.
*/
-#define MAX_NUM_CHANNELS ((PAGE_SIZE >> 1) << 3) /* 16348 channels */
+#define MAX_NUM_CHANNELS ((HV_HYP_PAGE_SIZE >> 1) << 3)
/* The value here must be in multiple of 32 */
/* TODO: Need to make this configurable */
--
2.17.1
^ permalink raw reply related
* [PATCH v3 1/5] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
From: Maya Nakamura @ 2019-06-18 6:11 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1560837096.git.m.maya.nakamura@gmail.com>
Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
the Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86.
Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.
Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index af78cd72b8f3..4097edf552b3 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -12,6 +12,16 @@
#include <linux/types.h>
#include <asm/page.h>
+/*
+ * While not explicitly listed in the TLFS, Hyper-V always runs with a page size
+ * of 4096. These definitions are used when communicating with Hyper-V using
+ * guest physical pages and guest physical page addresses, since the guest page
+ * size may not be 4096 on all architectures.
+ */
+#define HV_HYP_PAGE_SHIFT 12
+#define HV_HYP_PAGE_SIZE BIT(HV_HYP_PAGE_SHIFT)
+#define HV_HYP_PAGE_MASK (~(HV_HYP_PAGE_SIZE - 1))
+
/*
* The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
* is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
@@ -847,7 +857,7 @@ union hv_gpa_page_range {
* count is equal with how many entries of union hv_gpa_page_range can
* be populated into the input parameter page.
*/
-#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) / \
+#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) / \
sizeof(union hv_gpa_page_range))
struct hv_guest_mapping_flush_list {
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] scsi: storvsc: Add ability to change scsi queue depth
From: Martin K. Petersen @ 2019-06-19 1:08 UTC (permalink / raw)
To: Branden Bonaby
Cc: kys, haiyangz, sthemmin, sashal, jejb, martin.petersen,
linux-hyperv, linux-scsi, linux-kernel
In-Reply-To: <20190614234822.5193-1-brandonbonaby94@gmail.com>
Branden,
> Adding functionality to allow the SCSI queue depth to be changed, by
> utilizing the "scsi_change_queue_depth" function.
Applied to 5.3/scsi-queue. Please run checkpatch before submission. I
fixed it up this time.
Thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* Re: [PATCH net v2] hvsock: fix epollout hang from race condition
From: David Miller @ 2019-06-19 1:42 UTC (permalink / raw)
To: sunilmut
Cc: kys, haiyangz, sthemmin, sashal, decui, mikelley, netdev,
linux-hyperv, linux-kernel
In-Reply-To: <MW2PR2101MB111670139C7F5DC567129DD3C0EB0@MW2PR2101MB1116.namprd21.prod.outlook.com>
From: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Mon, 17 Jun 2019 19:26:25 +0000
> Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
> not return even when the hvsock socket is writable, under some race
> condition. This can happen under the following sequence:
...
> Now, the EPOLLOUT will never return even if the socket write buffer is
> empty.
>
> The fix is to set the pending size to the default size and never change it.
> This way the host will always notify the guest whenever the writable space
> is bigger than the pending size. The host is already optimized to *only*
> notify the guest when the pending size threshold boundary is crossed and
> not everytime.
>
> This change also reduces the cpu usage somewhat since hv_stream_has_space()
> is in the hotpath of send:
> vsock_stream_sendmsg()->hv_stream_has_space()
> Earlier hv_stream_has_space was setting/clearing the pending size on every
> call.
>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> ---
> - Resubmitting the patch after taking care of some spurious warnings.
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH] scsi: storvsc: Add ability to change scsi queue depth
From: Branden Bonaby @ 2019-06-19 1:56 UTC (permalink / raw)
To: Martin K. Petersen
Cc: kys, haiyangz, sthemmin, sashal, jejb, linux-hyperv, linux-scsi,
linux-kernel
In-Reply-To: <yq1fto6xtxw.fsf@oracle.com>
On Tue, Jun 18, 2019 at 09:08:59PM -0400, Martin K. Petersen wrote:
>
> Branden,
>
> > Adding functionality to allow the SCSI queue depth to be changed, by
> > utilizing the "scsi_change_queue_depth" function.
>
> Applied to 5.3/scsi-queue. Please run checkpatch before submission. I
> fixed it up this time.
>
> Thanks!
Oh I see what you mean about the brackets, thanks will do next time.
^ permalink raw reply
* RE: [PATCH] ACPI: PM: Export the function acpi_sleep_state_supported()
From: Dexuan Cui @ 2019-06-19 19:54 UTC (permalink / raw)
To: Lorenzo Pieralisi, linux-acpi@vger.kernel.org, rjw@rjwysocki.net,
lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
Russell King, Russ Dill, Sebastian Capella, Pavel Machek
Cc: Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
Haiyang Zhang, Sasha Levin, olaf@aepfle.de, apw@canonical.com,
jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <20190617161454.GB27113@e121166-lin.cambridge.arm.com>
> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Lorenzo Pieralisi
> Sent: Monday, June 17, 2019 9:15 AM
> > ...
> > + some ARM experts who worked on arch/arm/kernel/hibernate.c.
> >
> > drivers/acpi/sleep.c is only built if ACPI_SYSTEM_POWER_STATES_SUPPORT
> > is defined, but it looks this option is not defined on ARM.
> >
> > It looks ARM does not support the ACPI S4 state, then how do we know
> > if an ARM host supports hibernation or not?
>
> Maybe we should start from understanding why you need to know whether
> Hibernate is possible to answer your question ?
>
> On ARM64 platforms system states are entered through PSCI firmware
> interface that works for ACPI and device tree alike.
>
> Lorenzo
Hi Lorenzo,
It looks I may have confused you as I didn't realize the word "ARM" only means
32-bit ARM. It looks the "ARM" arch and the "ARM64" arch are very different.
As far as I know, Hyper-V only supports x86 and "ARM64", and it's unlikely to
support 32-bit ARM in the future, so actually I don't really need to know if and
how a 32-bit ARM machine supports hibernation.
When a Linux guest runs on Hyper-V (x86_32, x86_64, or ARM64) , we have a
front-end balloon driver in the guest, which balloons up/down and
hot adds/removes the guest's memory when the host requests that. The problem
is: the back-end driver on the host can not really save and restore the states
related to the front-end balloon driver on guest hibernation, so we made the
decision that balloon up/down and hot-add/remove are not supported when
we enable hibernation for a guest; BTW, we still want to load the front-end
driver in the guest, because the dirver has a functionality of reporting the
guest's memory pressure to the host, which we think is useful.
On x86_32 and x86_64, we enable hibernation for a guest by enabling
the virtual ACPI S4 state for the guest; on ARM64, so far we don't have the
host side changes required to support guest hibernation, so the details are
still unclear.
After I discussed with Michael Kelley, it looks we don't really need to
export drivers/acpi/sleep.c: acpi_sleep_state_supported(), but I think we do
need to make it non-static.
Now I propose the below changes. I plan to submit a patch first for the
changes made to drivers/acpi/sleep.c and include/acpi/acpi_bus.h in a few
days, if there is no objection.
Please let me know how you think of this. Thanks!
diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index a34deccd7317..18d2e5922448 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -79,7 +79,7 @@ static int acpi_sleep_prepare(u32 acpi_state)
return 0;
}
-static bool acpi_sleep_state_supported(u8 sleep_state)
+bool acpi_sleep_state_supported(u8 sleep_state)
{
acpi_status status;
u8 type_a, type_b;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 31b6c87d6240..3e6563e1a2c0 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -651,6 +651,12 @@ static inline int acpi_pm_set_bridge_wakeup(struct device *dev, bool enable)
}
#endif
+#ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
+bool acpi_sleep_state_supported(u8 sleep_state);
+#else
+bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
+#endif
+
#ifdef CONFIG_ACPI_SLEEP
u32 acpi_target_system_state(void);
#else
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 79f626ab8306..197e8fa32871 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -17,6 +17,7 @@
*
*/
+#include <linux/acpi.h>
#include <linux/efi.h>
#include <linux/types.h>
#include <asm/apic.h>
@@ -420,3 +421,9 @@ bool hv_is_hyperv_initialized(void)
return hypercall_msr.enable;
}
EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
+
+bool hv_is_hibernation_supported(void)
+{
+ return acpi_sleep_state_supported(ACPI_STATE_S4);
+}
+EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0becb7d9704d..1cb40016ee53 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -166,9 +166,11 @@ static inline int cpumask_to_vpset(struct hv_vpset *vpset,
void hyperv_report_panic(struct pt_regs *regs, long err);
void hyperv_report_panic_msg(phys_addr_t pa, size_t size);
bool hv_is_hyperv_initialized(void);
+bool hv_is_hibernation_supported(void);
void hyperv_cleanup(void);
#else /* CONFIG_HYPERV */
static inline bool hv_is_hyperv_initialized(void) { return false; }
+static inline bool hv_is_hibernation_supported(void) { return false; }
static inline void hyperv_cleanup(void) {}
#endif /* CONFIG_HYPERV */
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 111ea3599659..39fd4e4a8fd7 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -34,6 +34,8 @@
#include <linux/hyperv.h>
+#include <asm/mshyperv.h>
+
#define CREATE_TRACE_POINTS
#include "hv_trace_balloon.h"
@@ -1682,6 +1684,9 @@ static int balloon_probe(struct hv_device *dev,
{
int ret;
+ if (hv_is_hibernation_supported())
+ hot_add = false;
+
#ifdef CONFIG_MEMORY_HOTPLUG
do_hot_add = hot_add;
#else
Thanks,
-- Dexuan
^ permalink raw reply related
* RE: [PATCH 1/8] scsi: add a host / host template field for the virt boundary
From: Kashyap Desai @ 2019-06-20 6:17 UTC (permalink / raw)
To: Ming Lei, Christoph Hellwig
Cc: Martin K . Petersen, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
linux-rdma, Linux SCSI List, PDL,MEGARAIDLINUX,
PDL-MPT-FUSIONLINUX, linux-hyperv, Linux Kernel Mailing List,
Kashyap Desai
In-Reply-To: <CACVXFVOwCeM2JzefBpKsVZrEaWpSBR0DF8qp4oKfoHm+pwLBYw@mail.gmail.com>
> -----Original Message-----
> From: megaraidlinux.pdl@broadcom.com
> [mailto:megaraidlinux.pdl@broadcom.com] On Behalf Of Ming Lei
> Sent: Tuesday, June 18, 2019 6:05 AM
> To: Christoph Hellwig <hch@lst.de>
> Cc: Martin K . Petersen <martin.petersen@oracle.com>; Sagi Grimberg
> <sagi@grimberg.me>; Max Gurtovoy <maxg@mellanox.com>; Bart Van
> Assche <bvanassche@acm.org>; linux-rdma <linux-rdma@vger.kernel.org>;
> Linux SCSI List <linux-scsi@vger.kernel.org>;
> megaraidlinux.pdl@broadcom.com; MPT-FusionLinux.pdl@broadcom.com;
> linux-hyperv@vger.kernel.org; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>
> Subject: Re: [PATCH 1/8] scsi: add a host / host template field for the
> virt
> boundary
>
> On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > This allows drivers setting it up easily instead of branching out to
> > block layer calls in slave_alloc, and ensures the upgraded
> > max_segment_size setting gets picked up by the DMA layer.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > drivers/scsi/hosts.c | 3 +++
> > drivers/scsi/scsi_lib.c | 3 ++-
> > include/scsi/scsi_host.h | 3 +++
> > 3 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index
> > ff0d8c6a8d0c..55522b7162d3 100644
> > --- a/drivers/scsi/hosts.c
> > +++ b/drivers/scsi/hosts.c
> > @@ -462,6 +462,9 @@ struct Scsi_Host *scsi_host_alloc(struct
> scsi_host_template *sht, int privsize)
> > else
> > shost->dma_boundary = 0xffffffff;
> >
> > + if (sht->virt_boundary_mask)
> > + shost->virt_boundary_mask = sht->virt_boundary_mask;
> > +
> > device_initialize(&shost->shost_gendev);
> > dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
> > shost->shost_gendev.bus = &scsi_bus_type; diff --git
> > a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index
> > 65d0a10c76ad..d333bb6b1c59 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1775,7 +1775,8 @@ void __scsi_init_queue(struct Scsi_Host *shost,
> struct request_queue *q)
> > dma_set_seg_boundary(dev, shost->dma_boundary);
> >
> > blk_queue_max_segment_size(q, shost->max_segment_size);
> > - dma_set_max_seg_size(dev, shost->max_segment_size);
> > + blk_queue_virt_boundary(q, shost->virt_boundary_mask);
> > + dma_set_max_seg_size(dev, queue_max_segment_size(q));
>
> The patch looks fine, also suggest to make sure that max_segment_size is
> block-size aligned, and un-aligned max segment size has caused trouble on
> mmc.
I validated changes on latest and few older series controllers.
Post changes, I noticed max_segment_size is updated.
find /sys/ -name max_segment_size |grep sdb |xargs grep -r .
/sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:04.0/0000:40:00.0/0000:41:00.0/0000:42:00.0/host0/target0:2:12/0:2:12:0/block/sdb/queue/max_segment_size:4294967295
I verify that single SGE having 1MB transfer length is working fine.
Acked-by: Kashyap Desai < kashyap.desai@broadcom.com>
>
> Thanks,
> Ming Lei
>
^ permalink raw reply
* Re: [PATCH 7/8] mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
From: Suganath Prabu Subramani @ 2019-06-20 7:04 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Martin K . Petersen, Sagi Grimberg,
Max Gurtovoy, Bart Van Assche, linux-rdma, Linux SCSI List,
megaraidlinux.pdl, PDL-MPT-FUSIONLINUX, linux-hyperv,
Linux Kernel Mailing List
In-Reply-To: <CACVXFVObpdjN6V9qS-C9NG5xcrPqmx-X22qVamOSZf81Vog6zw@mail.gmail.com>
Please consider this as Acked-by: Suganath Prabu
<suganath-prabu.subramani@broadcom.com>
On Tue, Jun 18, 2019 at 6:16 AM Ming Lei <tom.leiming@gmail.com> wrote:
>
> On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > When using a virt_boundary_mask, as done for NVMe devices attached to
> > mpt3sas controllers we require an unlimited max_segment_size, as the
> > virt boundary merging code assumes that. But we also need to propagate
> > that to the DMA mapping layer to make dma-debug happy. The SCSI layer
> > takes care of that when using the per-host virt_boundary setting, but
> > given that mpt3sas only wants to set the virt_boundary for actual
> > NVMe devices we can't rely on that. The DMA layer maximum segment
> > is global to the HBA however, so we have to set it explicitly. This
> > patch assumes that mpt3sas does not have a segment size limitation,
> > which seems true based on the SGL format, but will need to be verified.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index 1ccfbc7eebe0..c719b807f6d8 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -10222,6 +10222,7 @@ static struct scsi_host_template mpt3sas_driver_template = {
> > .this_id = -1,
> > .sg_tablesize = MPT3SAS_SG_DEPTH,
> > .max_sectors = 32767,
> > + .max_segment_size = 0xffffffff,
>
> .max_segment_size should be aligned, either setting it here correctly or
> forcing to make it aligned in scsi-core.
>
> Thanks,
> Ming Lei
^ permalink raw reply
* Re: [PATCH 7/8] mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
From: Suganath Prabu Subramani @ 2019-06-20 7:08 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Martin K . Petersen, Sagi Grimberg,
Max Gurtovoy, Bart Van Assche, linux-rdma, Linux SCSI List,
megaraidlinux.pdl, PDL-MPT-FUSIONLINUX, linux-hyperv,
Linux Kernel Mailing List
In-Reply-To: <CA+RiK64sFfY79i7q2YbN5HcZ4wzVOcLWgDJnPbf6=ycdcmC-Mg@mail.gmail.com>
Acked-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
On Thu, Jun 20, 2019 at 12:34 PM Suganath Prabu Subramani
<suganath-prabu.subramani@broadcom.com> wrote:
>
> Please consider this as Acked-by: Suganath Prabu
> <suganath-prabu.subramani@broadcom.com>
>
>
> On Tue, Jun 18, 2019 at 6:16 AM Ming Lei <tom.leiming@gmail.com> wrote:
> >
> > On Mon, Jun 17, 2019 at 8:21 PM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > When using a virt_boundary_mask, as done for NVMe devices attached to
> > > mpt3sas controllers we require an unlimited max_segment_size, as the
> > > virt boundary merging code assumes that. But we also need to propagate
> > > that to the DMA mapping layer to make dma-debug happy. The SCSI layer
> > > takes care of that when using the per-host virt_boundary setting, but
> > > given that mpt3sas only wants to set the virt_boundary for actual
> > > NVMe devices we can't rely on that. The DMA layer maximum segment
> > > is global to the HBA however, so we have to set it explicitly. This
> > > patch assumes that mpt3sas does not have a segment size limitation,
> > > which seems true based on the SGL format, but will need to be verified.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > index 1ccfbc7eebe0..c719b807f6d8 100644
> > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > @@ -10222,6 +10222,7 @@ static struct scsi_host_template mpt3sas_driver_template = {
> > > .this_id = -1,
> > > .sg_tablesize = MPT3SAS_SG_DEPTH,
> > > .max_sectors = 32767,
> > > + .max_segment_size = 0xffffffff,
> >
> > .max_segment_size should be aligned, either setting it here correctly or
> > forcing to make it aligned in scsi-core.
> >
> > Thanks,
> > Ming Lei
^ permalink raw reply
* Re: [PATCH] ACPI: PM: Export the function acpi_sleep_state_supported()
From: Pavel Machek @ 2019-06-20 11:30 UTC (permalink / raw)
To: Dexuan Cui
Cc: Lorenzo Pieralisi, linux-acpi@vger.kernel.org, rjw@rjwysocki.net,
lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
Russell King, Russ Dill, Sebastian Capella, Michael Kelley,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB016902786ABA34BD01430F83BFE50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Lorenzo Pieralisi
> > Sent: Monday, June 17, 2019 9:15 AM
> > > ...
> > > + some ARM experts who worked on arch/arm/kernel/hibernate.c.
> > >
> > > drivers/acpi/sleep.c is only built if ACPI_SYSTEM_POWER_STATES_SUPPORT
> > > is defined, but it looks this option is not defined on ARM.
> > >
> > > It looks ARM does not support the ACPI S4 state, then how do we know
> > > if an ARM host supports hibernation or not?
> >
> > Maybe we should start from understanding why you need to know whether
> > Hibernate is possible to answer your question ?
> >
> > On ARM64 platforms system states are entered through PSCI firmware
> > interface that works for ACPI and device tree alike.
> >
> > Lorenzo
>
> Hi Lorenzo,
> It looks I may have confused you as I didn't realize the word "ARM" only means
> 32-bit ARM. It looks the "ARM" arch and the "ARM64" arch are very different.
>
> As far as I know, Hyper-V only supports x86 and "ARM64", and it's unlikely to
> support 32-bit ARM in the future, so actually I don't really need to know if and
> how a 32-bit ARM machine supports hibernation.
>
> When a Linux guest runs on Hyper-V (x86_32, x86_64, or ARM64) , we have a
> front-end balloon driver in the guest, which balloons up/down and
> hot adds/removes the guest's memory when the host requests that. The problem
> is: the back-end driver on the host can not really save and restore the states
> related to the front-end balloon driver on guest hibernation, so we made the
> decision that balloon up/down and hot-add/remove are not supported when
> we enable hibernation for a guest; BTW, we still want to load the front-end
> driver in the guest, because the dirver has a functionality of reporting the
> guest's memory pressure to the host, which we think is useful.
>
> On x86_32 and x86_64, we enable hibernation for a guest by enabling
> the virtual ACPI S4 state for the guest; on ARM64, so far we don't have the
> host side changes required to support guest hibernation, so the details are
> still unclear.
>
> After I discussed with Michael Kelley, it looks we don't really need to
> export drivers/acpi/sleep.c: acpi_sleep_state_supported(), but I think we do
> need to make it non-static.
>
> Now I propose the below changes. I plan to submit a patch first for the
> changes made to drivers/acpi/sleep.c and include/acpi/acpi_bus.h in a few
> days, if there is no objection.
>
> Please let me know how you think of this. Thanks!
No.
Hibernation should be always supported, no matter what firmware. If it
can powerdown, it can hibernate.
That is for x86-32/64, too.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* RE: [PATCH] ACPI: PM: Export the function acpi_sleep_state_supported()
From: Dexuan Cui @ 2019-06-21 7:15 UTC (permalink / raw)
To: Pavel Machek
Cc: Lorenzo Pieralisi, linux-acpi@vger.kernel.org, rjw@rjwysocki.net,
lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
Russell King, Russ Dill, Sebastian Capella, Michael Kelley,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
In-Reply-To: <20190620113057.GA16460@atrey.karlin.mff.cuni.cz>
> From: linux-hyperv-owner@vger.kernel.org
> > ...
> > When a Linux guest runs on Hyper-V (x86_32, x86_64, or ARM64) , we have a
> > front-end balloon driver in the guest, which balloons up/down and
> > hot adds/removes the guest's memory when the host requests that. The
> > problem
> > is: the back-end driver on the host can not really save and restore the states
> > related to the front-end balloon driver on guest hibernation, so we made the
> > decision that balloon up/down and hot-add/remove are not supported when
> > we enable hibernation for a guest; BTW, we still want to load the front-end
> > driver in the guest, because the driver has a functionality of reporting the
> > guest's memory pressure to the host, which we think is useful.
> >
> > On x86_32 and x86_64, we enable hibernation for a guest by enabling
> > the virtual ACPI S4 state for the guest; on ARM64, so far we don't have the
> > host side changes required to support guest hibernation, so the details are
> > still unclear.
> >
> > After I discussed with Michael Kelley, it looks we don't really need to
> > export drivers/acpi/sleep.c: acpi_sleep_state_supported(), but I think we do
> > need to make it non-static.
> >
> > Now I propose the below changes. I plan to submit a patch first for the
> > changes made to drivers/acpi/sleep.c and include/acpi/acpi_bus.h in a few
> > days, if there is no objection.
> >
> > Please let me know how you think of this. Thanks!
>
> No.
>
> Hibernation should be always supported, no matter what firmware. If it
> can powerdown, it can hibernate.
>
> That is for x86-32/64, too.
> Pavel
Hi Pavel,
Yes, I totally agree hibernation should be always supported, as long as the
system can power down. This is also true for Linux guest running on
Hyper-V, when the pava-virtualized drivers are not loaded in the guest
However, unluckily the situation is a little different when the pava-virtualized
drivers are loaded in the guest:
1) Old Hyper-V hosts are unable to support guest hibernation and these hosts
don't support the virtual ACPI S4 state.
2) The recent Hyper-V host is able to support guest hibernation, but when
guest hibernation is used, the guest needs to disable some features in the
guest balloon driver, as I explained in the previous mail; on the other hand,
we also want to keep the ability to enable the features, so the Hyper-V guys
decided to use the absence of the virtual ACPI S4 state to signify the ability
of enabling the features, and there is a host command tool which can
enable or disable the virtual ACPI S4 state for a guest.
In summary:
1) On old Hyper-V hosts or a recent host, the virtual ACPI S4 state is
unsupported or disabled, respectively, and we don't allow Linux guest
hibernation so we can use the full features of the guest balloon driver.
2) On a recent host, after we use the host command tool to enable the
virtual ACPI S4 state, we allow Linux guest hibernation and we only use a
subset of the full features of the guest balloon driver.
This is why I need to know if the virtual ACPI S4 state is supported-and-enabled.
Exporting acpi_sleep_state_supported() may be overkill, and now making it
non-static seems the only way to help me out. IMO the function is unlikely
to change in the future, and it should be stable enough to become non-static.
Looking forward to your insights!
Thanks,
-- Dexuan
^ permalink raw reply
* [PATCH] PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
From: Dexuan Cui @ 2019-06-21 19:02 UTC (permalink / raw)
To: linux-pci@vger.kernel.org, Lorenzo Pieralisi, bhelgaas@google.com,
Haiyang Zhang, KY Srinivasan, Stephen Hemminger, Sasha Levin,
linux-hyperv@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
Michael Kelley
Cc: Lili Deng (Wicresoft North America Ltd),
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org
The commit 05f151a73ec2 itself is correct, but it exposes this
use-after-free bug, which is caught by some memory debug options.
Add the Fixes tag to indicate the dependency.
Fixes: 05f151a73ec2 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---
Sorry for not spotting the bug when sending 05f151a73ec2.
Now I have enabled the mm debug options to help catch such mistakes in future.
drivers/pci/controller/pci-hyperv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 808a182830e5..42ace1a690f9 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1880,6 +1880,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device *hbus,
static void hv_eject_device_work(struct work_struct *work)
{
struct pci_eject_response *ejct_pkt;
+ struct hv_pcibus_device *hbus;
struct hv_pci_dev *hpdev;
struct pci_dev *pdev;
unsigned long flags;
@@ -1890,6 +1891,7 @@ static void hv_eject_device_work(struct work_struct *work)
} ctxt;
hpdev = container_of(work, struct hv_pci_dev, wrk);
+ hbus = hpdev->hbus;
WARN_ON(hpdev->state != hv_pcichild_ejecting);
@@ -1929,7 +1931,9 @@ static void hv_eject_device_work(struct work_struct *work)
/* For the two refs got in new_pcichild_device() */
put_pcichild(hpdev);
put_pcichild(hpdev);
- put_hvpcibus(hpdev->hbus);
+ /* hpdev has been freed. Do not use it any more. */
+
+ put_hvpcibus(hbus);
}
/**
--
2.17.1
^ permalink raw reply related
* RE: [PATCH] PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
From: Michael Kelley @ 2019-06-21 23:24 UTC (permalink / raw)
To: Dexuan Cui, linux-pci@vger.kernel.org, Lorenzo Pieralisi,
bhelgaas@google.com, Haiyang Zhang, KY Srinivasan,
Stephen Hemminger, Sasha Levin, linux-hyperv@vger.kernel.org,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
Cc: Lili Deng (Wicresoft North America Ltd),
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org
In-Reply-To: <PU1P153MB01691036654142C7972F3ACDBFE70@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
From: Dexuan Cui <decui@microsoft.com> Sent: Friday, June 21, 2019 12:02 PM
>
> The commit 05f151a73ec2 itself is correct, but it exposes this
> use-after-free bug, which is caught by some memory debug options.
>
> Add the Fixes tag to indicate the dependency.
>
> Fixes: 05f151a73ec2 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
> Sorry for not spotting the bug when sending 05f151a73ec2.
>
> Now I have enabled the mm debug options to help catch such mistakes in future.
>
> drivers/pci/controller/pci-hyperv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 808a182830e5..42ace1a690f9 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1880,6 +1880,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device
> *hbus,
> static void hv_eject_device_work(struct work_struct *work)
> {
> struct pci_eject_response *ejct_pkt;
> + struct hv_pcibus_device *hbus;
> struct hv_pci_dev *hpdev;
> struct pci_dev *pdev;
> unsigned long flags;
> @@ -1890,6 +1891,7 @@ static void hv_eject_device_work(struct work_struct *work)
> } ctxt;
>
> hpdev = container_of(work, struct hv_pci_dev, wrk);
> + hbus = hpdev->hbus;
In the lines of code following this new assignment, there are four uses of
hpdev->hbus besides the one at the bottom of the function that causes the
use-after-free error. With 'hbus' now available as a local variable, it looks
rather strange to have those other places still using hpdev->hbus. I'm thinking
they should be shortened to just 'hbus' for consistency, even though such
changes aren't directly related to fixing the bug.
Michael
>
> WARN_ON(hpdev->state != hv_pcichild_ejecting);
>
> @@ -1929,7 +1931,9 @@ static void hv_eject_device_work(struct work_struct *work)
> /* For the two refs got in new_pcichild_device() */
> put_pcichild(hpdev);
> put_pcichild(hpdev);
> - put_hvpcibus(hpdev->hbus);
> + /* hpdev has been freed. Do not use it any more. */
> +
> + put_hvpcibus(hbus);
> }
>
> /**
> --
> 2.17.1
^ permalink raw reply
* RE: [PATCH] PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
From: Dexuan Cui @ 2019-06-21 23:31 UTC (permalink / raw)
To: Michael Kelley, linux-pci@vger.kernel.org, Lorenzo Pieralisi,
bhelgaas@google.com, Haiyang Zhang, KY Srinivasan,
Stephen Hemminger, Sasha Levin, linux-hyperv@vger.kernel.org,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
Cc: Lili Deng (Wicresoft North America Ltd),
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org
In-Reply-To: <BYAPR21MB13524CDAAD1F9A19935E5F9BD7E70@BYAPR21MB1352.namprd21.prod.outlook.com>
> From: Michael Kelley <mikelley@microsoft.com>
> > @@ -1880,6 +1880,7 @@ static void hv_pci_devices_present(struct
> hv_pcibus_device
> > *hbus,
> > static void hv_eject_device_work(struct work_struct *work)
> > {
> > struct pci_eject_response *ejct_pkt;
> > + struct hv_pcibus_device *hbus;
> > struct hv_pci_dev *hpdev;
> > struct pci_dev *pdev;
> > unsigned long flags;
> > @@ -1890,6 +1891,7 @@ static void hv_eject_device_work(struct
> work_struct *work)
> > } ctxt;
> >
> > hpdev = container_of(work, struct hv_pci_dev, wrk);
> > + hbus = hpdev->hbus;
>
> In the lines of code following this new assignment, there are four uses of
> hpdev->hbus besides the one at the bottom of the function that causes the
> use-after-free error. With 'hbus' now available as a local variable, it looks
> rather strange to have those other places still using hpdev->hbus. I'm
> thinking
> they should be shortened to just 'hbus' for consistency, even though such
> changes aren't directly related to fixing the bug.
>
> Michael
Ok, let me post a v2 for this.
Thanks,
Dexuan
^ permalink raw reply
* [PATCH v2] PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
From: Dexuan Cui @ 2019-06-21 23:45 UTC (permalink / raw)
To: linux-pci@vger.kernel.org, Lorenzo Pieralisi, bhelgaas@google.com,
Haiyang Zhang, KY Srinivasan, Stephen Hemminger, Sasha Levin,
linux-hyperv@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
Michael Kelley
Cc: Lili Deng (Wicresoft North America Ltd),
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org
The commit 05f151a73ec2 itself is correct, but it exposes this
use-after-free bug, which is caught by some memory debug options.
Add a Fixes tag to indicate the dependency.
Fixes: 05f151a73ec2 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---
In v2:
Replaced "hpdev->hbus" with "hbus", since we have the new "hbus" variable. [Michael Kelley]
drivers/pci/controller/pci-hyperv.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 808a182830e5..5dadc964ad3b 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1880,6 +1880,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device *hbus,
static void hv_eject_device_work(struct work_struct *work)
{
struct pci_eject_response *ejct_pkt;
+ struct hv_pcibus_device *hbus;
struct hv_pci_dev *hpdev;
struct pci_dev *pdev;
unsigned long flags;
@@ -1890,6 +1891,7 @@ static void hv_eject_device_work(struct work_struct *work)
} ctxt;
hpdev = container_of(work, struct hv_pci_dev, wrk);
+ hbus = hpdev->hbus;
WARN_ON(hpdev->state != hv_pcichild_ejecting);
@@ -1900,8 +1902,7 @@ static void hv_eject_device_work(struct work_struct *work)
* because hbus->pci_bus may not exist yet.
*/
wslot = wslot_to_devfn(hpdev->desc.win_slot.slot);
- pdev = pci_get_domain_bus_and_slot(hpdev->hbus->sysdata.domain, 0,
- wslot);
+ pdev = pci_get_domain_bus_and_slot(hbus->sysdata.domain, 0, wslot);
if (pdev) {
pci_lock_rescan_remove();
pci_stop_and_remove_bus_device(pdev);
@@ -1909,9 +1910,9 @@ static void hv_eject_device_work(struct work_struct *work)
pci_unlock_rescan_remove();
}
- spin_lock_irqsave(&hpdev->hbus->device_list_lock, flags);
+ spin_lock_irqsave(&hbus->device_list_lock, flags);
list_del(&hpdev->list_entry);
- spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags);
+ spin_unlock_irqrestore(&hbus->device_list_lock, flags);
if (hpdev->pci_slot)
pci_destroy_slot(hpdev->pci_slot);
@@ -1920,7 +1921,7 @@ static void hv_eject_device_work(struct work_struct *work)
ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
- vmbus_sendpacket(hpdev->hbus->hdev->channel, ejct_pkt,
+ vmbus_sendpacket(hbus->hdev->channel, ejct_pkt,
sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
VM_PKT_DATA_INBAND, 0);
@@ -1929,7 +1930,9 @@ static void hv_eject_device_work(struct work_struct *work)
/* For the two refs got in new_pcichild_device() */
put_pcichild(hpdev);
put_pcichild(hpdev);
- put_hvpcibus(hpdev->hbus);
+ /* hpdev has been freed. Do not use it any more. */
+
+ put_hvpcibus(hbus);
}
/**
--
2.17.1
^ permalink raw reply related
* RE: [PATCH v2] PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
From: Michael Kelley @ 2019-06-21 23:58 UTC (permalink / raw)
To: Dexuan Cui, linux-pci@vger.kernel.org, Lorenzo Pieralisi,
bhelgaas@google.com, Haiyang Zhang, KY Srinivasan,
Stephen Hemminger, Sasha Levin, linux-hyperv@vger.kernel.org,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com
Cc: Lili Deng (Wicresoft North America Ltd),
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org
In-Reply-To: <PU1P153MB0169D420EAB61757DF4B337FBFE70@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
From: Dexuan Cui <decui@microsoft.com> Sent: Friday, June 21, 2019 4:45 PM
>
> The commit 05f151a73ec2 itself is correct, but it exposes this
> use-after-free bug, which is caught by some memory debug options.
>
> Add a Fixes tag to indicate the dependency.
>
> Fixes: 05f151a73ec2 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
>
> In v2:
> Replaced "hpdev->hbus" with "hbus", since we have the new "hbus" variable. [Michael
> Kelley]
>
> drivers/pci/controller/pci-hyperv.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 808a182830e5..5dadc964ad3b 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1880,6 +1880,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device
> *hbus,
> static void hv_eject_device_work(struct work_struct *work)
> {
> struct pci_eject_response *ejct_pkt;
> + struct hv_pcibus_device *hbus;
> struct hv_pci_dev *hpdev;
> struct pci_dev *pdev;
> unsigned long flags;
> @@ -1890,6 +1891,7 @@ static void hv_eject_device_work(struct work_struct *work)
> } ctxt;
>
> hpdev = container_of(work, struct hv_pci_dev, wrk);
> + hbus = hpdev->hbus;
>
> WARN_ON(hpdev->state != hv_pcichild_ejecting);
>
> @@ -1900,8 +1902,7 @@ static void hv_eject_device_work(struct work_struct *work)
> * because hbus->pci_bus may not exist yet.
> */
> wslot = wslot_to_devfn(hpdev->desc.win_slot.slot);
> - pdev = pci_get_domain_bus_and_slot(hpdev->hbus->sysdata.domain, 0,
> - wslot);
> + pdev = pci_get_domain_bus_and_slot(hbus->sysdata.domain, 0, wslot);
> if (pdev) {
> pci_lock_rescan_remove();
> pci_stop_and_remove_bus_device(pdev);
> @@ -1909,9 +1910,9 @@ static void hv_eject_device_work(struct work_struct *work)
> pci_unlock_rescan_remove();
> }
>
> - spin_lock_irqsave(&hpdev->hbus->device_list_lock, flags);
> + spin_lock_irqsave(&hbus->device_list_lock, flags);
> list_del(&hpdev->list_entry);
> - spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags);
> + spin_unlock_irqrestore(&hbus->device_list_lock, flags);
>
> if (hpdev->pci_slot)
> pci_destroy_slot(hpdev->pci_slot);
> @@ -1920,7 +1921,7 @@ static void hv_eject_device_work(struct work_struct *work)
> ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
> ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
> ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
> - vmbus_sendpacket(hpdev->hbus->hdev->channel, ejct_pkt,
> + vmbus_sendpacket(hbus->hdev->channel, ejct_pkt,
> sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
> VM_PKT_DATA_INBAND, 0);
>
> @@ -1929,7 +1930,9 @@ static void hv_eject_device_work(struct work_struct *work)
> /* For the two refs got in new_pcichild_device() */
> put_pcichild(hpdev);
> put_pcichild(hpdev);
> - put_hvpcibus(hpdev->hbus);
> + /* hpdev has been freed. Do not use it any more. */
> +
> + put_hvpcibus(hbus);
> }
>
> /**
> --
> 2.17.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Thomas Gleixner @ 2019-06-22 14:46 UTC (permalink / raw)
To: Sasha Levin
Cc: Michael Kelley, Vincenzo Frascino, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
Greg KH, Stephen Rothwell
In-Reply-To: <20190614211710.GQ1513@sasha-vm>
On Fri, 14 Jun 2019, Sasha Levin wrote:
> On Fri, Jun 14, 2019 at 01:15:23PM +0200, Thomas Gleixner wrote:
> > On Thu, 30 May 2019, Michael Kelley wrote:
> > > Vincenzo -- these changes for Hyper-V are a subset of a larger patch set
> > > I have that moves all of the Hyper-V clock/timer code into a separate
> > > clocksource driver in drivers/clocksource, with an include file in
> > > includes/clocksource. That new include file should be able to work
> > > instead of your new mshyperv-tsc.h. It also has the benefit of being
> > > ISA neutral, so it will work with my in-progress patch set to support
> > > Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231
> > > for the new clocksource driver patch set.
> >
> > Grrr. That's queued in hyperv-next for whatever reasons.
>
> I queue up our future pull requests there to give them some soaking in
> -next.
What? You queue completely unreviewed stuff which touches two other
subsystems to let it soak in next?
> > Sasha, can you please provide me the branch to pull from so I can have a
> > common base for all the various changes floating around?
>
> I'll send you a unified pull request for these changes.
Which has not materialized yet.
TBH, I'm pretty grumpy about those clocksource changes. Here is the
diffstat:
MAINTAINERS | 2
arch/x86/entry/vdso/vclock_gettime.c | 1
arch/x86/entry/vdso/vma.c | 2
arch/x86/hyperv/hv_init.c | 91 ---------
arch/x86/include/asm/hyperv-tlfs.h | 6
arch/x86/include/asm/mshyperv.h | 81 +-------
arch/x86/kernel/cpu/mshyperv.c | 2
arch/x86/kvm/x86.c | 1
drivers/clocksource/Makefile | 1
drivers/clocksource/hyperv_timer.c | 322 +++++++++++++++++++++++++++++++++++
drivers/hv/Kconfig | 3
drivers/hv/hv.c | 156 ----------------
drivers/hv/hv_util.c | 1
drivers/hv/hyperv_vmbus.h | 3
drivers/hv/vmbus_drv.c | 42 ++--
include/clocksource/hyperv_timer.h | 105 +++++++++++
While the world and some more people have been CC'ed on those patches,
neither the clocksource nor the x86 maintainer have been.
When I gave Vincenzo the advise to base his code on that hyper-v branch, I
expected that I find the related patches in my mail backlog. No, they have
not been there because I was not on CC.
Folks, please stop chosing Cc lists as you like. We have well established
rules for that. And please stop queueing random unreviewed patches in
next. Next is not a playground for not ready and unreviewed stuff. No, the
hyper-v inbreed Reviewed-by is not sufficient for anything x86 and
clocksource related.
After chasing and looking at those patches, which have horrible subject
lines and changelogs btw, I was not able to judge quickly whether that
stuff is self contained or not. So no, I fixed up the fallout and rebased
Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if
they are not self contained they will break bisection badly.
I'm going to push out the VDSO series later today. That will nicely break
in combination with the hyper-next branch. Stephen, please drop that and do
not try to handle the fallout. That stuff needs to go through the proper
channels or at least be acked/reviewed by the relevant maintainers. So the
hyper-v folks can rebase themself and post it proper.
Yours grumpy,
tglx
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox