* [PATCH 1/1] drivers: hv: Don't OOPS when you cannot init vmbus
From: K. Y. Srinivasan @ 2011-12-01 17:59 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering
Cc: K. Y. Srinivasan, Sasha Levin
The hv vmbus driver was causing an OOPS since it was trying to register drivers
on top of the bus even if initialization of the bus has failed for some
reason (such as the odd chance someone would run a hv enabled kernel in a
non-hv environment).
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
drivers/hv/vmbus_drv.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0c048dd..d3b0b4f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -62,6 +62,14 @@ struct hv_device_info {
struct hv_dev_port_info outbound;
};
+static int vmbus_exists(void)
+{
+ if (hv_acpi_dev == NULL)
+ return -ENODEV;
+
+ return 0;
+}
+
static void get_channel_info(struct hv_device *device,
struct hv_device_info *info)
@@ -590,6 +598,10 @@ int __vmbus_driver_register(struct hv_driver *hv_driver, struct module *owner, c
pr_info("registering driver %s\n", hv_driver->name);
+ ret = vmbus_exists();
+ if (ret < 0)
+ return ret;
+
hv_driver->driver.name = hv_driver->name;
hv_driver->driver.owner = owner;
hv_driver->driver.mod_name = mod_name;
@@ -614,6 +626,9 @@ void vmbus_driver_unregister(struct hv_driver *hv_driver)
{
pr_info("unregistering driver %s\n", hv_driver->name);
+ if (!vmbus_exists())
+ return;
+
driver_unregister(&hv_driver->driver);
}
@@ -776,6 +791,7 @@ static int __init hv_acpi_init(void)
cleanup:
acpi_bus_unregister_driver(&vmbus_acpi_driver);
+ hv_acpi_dev = NULL;
return ret;
}
--
1.7.4.1
^ permalink raw reply related
* Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Catalin Marinas @ 2011-12-01 16:57 UTC (permalink / raw)
To: Arnd Bergmann
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Ian Campbell, Pawel Moll, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201112011644.41101.arnd@arndb.de>
On Thu, Dec 01, 2011 at 04:44:40PM +0000, Arnd Bergmann wrote:
> On Thursday 01 December 2011, Catalin Marinas wrote:
> > On Thu, Dec 01, 2011 at 03:42:19PM +0000, Arnd Bergmann wrote:
> > > On Thursday 01 December 2011, Catalin Marinas wrote:
> > > How do you deal with signed integer arguments passed into SVC or HVC from
> > > a caller? If I understand the architecture correctly, the upper
> > > halves of the argument register end up zero-padded, while the callee
> > > expects sign-extension.
> >
> > If you treat it as an "int" (32-bit) and function prototype defined
> > accordingly, then the generated code only accesses it as a W (rather
> > than X) register and the top 32-bit part is ignored (no need for
> > sign-extension). If it is defined as a "long" in the 32-bit world, then
> > it indeed needs explicit conversion given the different sizes for long
> > (for example sys_lseek, the second argument is a 'long' and we do
> > explicit sign extension in the wrapper).
...
> What about unsigned long and pointer? Can we always rely on the upper
> half of the register to be zero-filled when we get an exception from 32
> bit into 64 bit state, or do we also have to zero-extend those?
They are also fine, no need for zero-extension.
--
Catalin
^ permalink raw reply
* Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Arnd Bergmann @ 2011-12-01 16:44 UTC (permalink / raw)
To: linux-arm-kernel
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Ian Campbell, Pawel Moll, Stefano Stabellini, Catalin Marinas,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net
In-Reply-To: <20111201160241.GH27394@arm.com>
On Thursday 01 December 2011, Catalin Marinas wrote:
> On Thu, Dec 01, 2011 at 03:42:19PM +0000, Arnd Bergmann wrote:
> > On Thursday 01 December 2011, Catalin Marinas wrote:
> > How do you deal with signed integer arguments passed into SVC or HVC from
> > a caller? If I understand the architecture correctly, the upper
> > halves of the argument register end up zero-padded, while the callee
> > expects sign-extension.
>
> If you treat it as an "int" (32-bit) and function prototype defined
> accordingly, then the generated code only accesses it as a W (rather
> than X) register and the top 32-bit part is ignored (no need for
> sign-extension). If it is defined as a "long" in the 32-bit world, then
> it indeed needs explicit conversion given the different sizes for long
> (for example sys_lseek, the second argument is a 'long' and we do
> explicit sign extension in the wrapper).
Ok, so it's actually different from most other 64 bit architectures, which
normally operate on 64-bit registers and expect the caller to do the
correct sign-extension.
Doing the sign-extension for long arguments then falls into the same
category as long long and unsigned long long arguments, which also need
a wrapper, as you mentioned. Strictly speaking, we only need to do it
for those where the long argument has a meaning outside of the 0..2^31
range, e.g. io_submit can only take small positive numbers although
the type is 'long'.
What about unsigned long and pointer? Can we always rely on the upper
half of the register to be zero-filled when we get an exception from 32
bit into 64 bit state, or do we also have to zero-extend those?
Arnd
^ permalink raw reply
* Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Catalin Marinas @ 2011-12-01 16:02 UTC (permalink / raw)
To: Arnd Bergmann
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Ian Campbell, Pawel Moll, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201112011542.19377.arnd@arndb.de>
On Thu, Dec 01, 2011 at 03:42:19PM +0000, Arnd Bergmann wrote:
> On Thursday 01 December 2011, Catalin Marinas wrote:
> > Given the way register banking is done on AArch64, issuing an HVC on a
> > 32-bit guest OS doesn't require translation on a 64-bit hypervisor. We
> > have a similar implementation at the SVC level (for 32-bit user apps on
> > a 64-bit kernel), the only modification was where a 32-bit SVC takes a
> > 64-bit parameter in two separate 32-bit registers, so packing needs to
> > be done in a syscall wrapper.
>
> How do you deal with signed integer arguments passed into SVC or HVC from
> a caller? If I understand the architecture correctly, the upper
> halves of the argument register end up zero-padded, while the callee
> expects sign-extension.
If you treat it as an "int" (32-bit) and function prototype defined
accordingly, then the generated code only accesses it as a W (rather
than X) register and the top 32-bit part is ignored (no need for
sign-extension). If it is defined as a "long" in the 32-bit world, then
it indeed needs explicit conversion given the different sizes for long
(for example sys_lseek, the second argument is a 'long' and we do
explicit sign extension in the wrapper).
--
Catalin
^ permalink raw reply
* Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Ian Campbell @ 2011-12-01 15:52 UTC (permalink / raw)
To: Catalin Marinas
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Arnd Bergmann, Pawel Moll, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20111201151043.GG27394@arm.com>
On Thu, 2011-12-01 at 15:10 +0000, Catalin Marinas wrote:
> On Thu, Dec 01, 2011 at 10:26:37AM +0000, Ian Campbell wrote:
> > On Wed, 2011-11-30 at 18:32 +0000, Stefano Stabellini wrote:
> > > On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > > > KVM and Xen at least both fall into the single-return-value category,
> > > > so we should be able to agree on a calling conventions. KVM does not
> > > > have an hcall API on ARM yet, and I see no reason not to use the
> > > > same implementation that you have in the Xen guest.
> > > >
> > > > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > > > file into a common asm/hypercall.h and submit it for review to the
> > > > arm kernel list?
> > >
> > > Sure, I can do that.
> > > Usually the hypercall calling convention is very hypervisor specific,
> > > but if it turns out that we have the same requirements I happy to design
> > > a common interface.
> >
> > I expect the only real decision to be made is hypercall page vs. raw hvc
> > instruction.
> >
> > The page was useful on x86 where there is a variety of instructions
> > which could be used (at least for PV there was systenter/syscall/int, I
> > think vmcall instruction differs between AMD and Intel also) and gives
> > some additional flexibility. It's hard to predict but I don't think I'd
> > expect that to be necessary on ARM.
> >
> > Another reason for having a hypercall page instead of a raw instruction
> > might be wanting to support 32 bit guests (from ~today) on a 64 bit
> > hypervisor in the future and perhaps needing to do some shimming/arg
> > translation. It would be better to aim for having the interface just be
> > 32/64 agnostic but mistakes do happen.
>
> Given the way register banking is done on AArch64, issuing an HVC on a
> 32-bit guest OS doesn't require translation on a 64-bit hypervisor.
The issue I was thinking about was struct packing for arguments passed
as pointers etc rather than the argument registers themselves. Since the
preference appears to be for raw hvc we should just be careful that they
are agnostic in these.
Ian.
> We
> have a similar implementation at the SVC level (for 32-bit user apps on
> a 64-bit kernel), the only modification was where a 32-bit SVC takes a
> 64-bit parameter in two separate 32-bit registers, so packing needs to
> be done in a syscall wrapper.
>
> I'm not closely involved with any of the Xen or KVM work but I would
> vote for using HVC than a hypercall page.
>
^ permalink raw reply
* Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Arnd Bergmann @ 2011-12-01 15:42 UTC (permalink / raw)
To: Catalin Marinas
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Ian Campbell, Pawel Moll, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20111201151043.GG27394@arm.com>
On Thursday 01 December 2011, Catalin Marinas wrote:
> Given the way register banking is done on AArch64, issuing an HVC on a
> 32-bit guest OS doesn't require translation on a 64-bit hypervisor. We
> have a similar implementation at the SVC level (for 32-bit user apps on
> a 64-bit kernel), the only modification was where a 32-bit SVC takes a
> 64-bit parameter in two separate 32-bit registers, so packing needs to
> be done in a syscall wrapper.
How do you deal with signed integer arguments passed into SVC or HVC from
a caller? If I understand the architecture correctly, the upper
halves of the argument register end up zero-padded, while the callee
expects sign-extension.
Arnd
^ permalink raw reply
* Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Stefano Stabellini @ 2011-12-01 15:12 UTC (permalink / raw)
To: Ian Campbell
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Pawel Moll, kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, Arnd Bergmann,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <1322735197.31810.191.camel@zakaz.uk.xensource.com>
On Thu, 1 Dec 2011, Ian Campbell wrote:
> On Wed, 2011-11-30 at 18:32 +0000, Stefano Stabellini wrote:
> > On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > > KVM and Xen at least both fall into the single-return-value category,
> > > so we should be able to agree on a calling conventions. KVM does not
> > > have an hcall API on ARM yet, and I see no reason not to use the
> > > same implementation that you have in the Xen guest.
> > >
> > > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > > file into a common asm/hypercall.h and submit it for review to the
> > > arm kernel list?
> >
> > Sure, I can do that.
> > Usually the hypercall calling convention is very hypervisor specific,
> > but if it turns out that we have the same requirements I happy to design
> > a common interface.
>
> I expect the only real decision to be made is hypercall page vs. raw hvc
> instruction.
>
> The page was useful on x86 where there is a variety of instructions
> which could be used (at least for PV there was systenter/syscall/int, I
> think vmcall instruction differs between AMD and Intel also) and gives
> some additional flexibility. It's hard to predict but I don't think I'd
> expect that to be necessary on ARM.
>
> Another reason for having a hypercall page instead of a raw instruction
> might be wanting to support 32 bit guests (from ~today) on a 64 bit
> hypervisor in the future and perhaps needing to do some shimming/arg
> translation. It would be better to aim for having the interface just be
> 32/64 agnostic but mistakes do happen.
I always like to keep things as simple as possible, so I am in favor of
using the raw hvc instruction.
Besides with the bulk of mmu hypercalls gone, it should not be difficult
to design a 32/64 bit agnostic interface.
^ permalink raw reply
* Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Catalin Marinas @ 2011-12-01 15:10 UTC (permalink / raw)
To: Ian Campbell
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Arnd Bergmann, Pawel Moll, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <1322735197.31810.191.camel@zakaz.uk.xensource.com>
On Thu, Dec 01, 2011 at 10:26:37AM +0000, Ian Campbell wrote:
> On Wed, 2011-11-30 at 18:32 +0000, Stefano Stabellini wrote:
> > On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > > KVM and Xen at least both fall into the single-return-value category,
> > > so we should be able to agree on a calling conventions. KVM does not
> > > have an hcall API on ARM yet, and I see no reason not to use the
> > > same implementation that you have in the Xen guest.
> > >
> > > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > > file into a common asm/hypercall.h and submit it for review to the
> > > arm kernel list?
> >
> > Sure, I can do that.
> > Usually the hypercall calling convention is very hypervisor specific,
> > but if it turns out that we have the same requirements I happy to design
> > a common interface.
>
> I expect the only real decision to be made is hypercall page vs. raw hvc
> instruction.
>
> The page was useful on x86 where there is a variety of instructions
> which could be used (at least for PV there was systenter/syscall/int, I
> think vmcall instruction differs between AMD and Intel also) and gives
> some additional flexibility. It's hard to predict but I don't think I'd
> expect that to be necessary on ARM.
>
> Another reason for having a hypercall page instead of a raw instruction
> might be wanting to support 32 bit guests (from ~today) on a 64 bit
> hypervisor in the future and perhaps needing to do some shimming/arg
> translation. It would be better to aim for having the interface just be
> 32/64 agnostic but mistakes do happen.
Given the way register banking is done on AArch64, issuing an HVC on a
32-bit guest OS doesn't require translation on a 64-bit hypervisor. We
have a similar implementation at the SVC level (for 32-bit user apps on
a 64-bit kernel), the only modification was where a 32-bit SVC takes a
64-bit parameter in two separate 32-bit registers, so packing needs to
be done in a syscall wrapper.
I'm not closely involved with any of the Xen or KVM work but I would
vote for using HVC than a hypercall page.
--
Catalin
^ permalink raw reply
* [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: K. Y. Srinivasan @ 2011-12-01 13:43 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan
The storage driver (storvsc_drv.c) handles all block storage devices
assigned to Linux guests hosted on Hyper-V. This driver has been in the
staging tree for a while and this patch moves it out of the staging area.
As per Greg's recommendation, this patch makes no changes to the staging/hv
directory. Once the driver moves out of staging, we will cleanup the
staging/hv directory.
This patch includes all the patches that I have sent against the staging/hv
tree to address the comments I have gotten to date on this storage driver.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
drivers/scsi/Kconfig | 7 +
drivers/scsi/Makefile | 3 +
drivers/scsi/storvsc_drv.c | 1583 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 1593 insertions(+), 0 deletions(-)
create mode 100644 drivers/scsi/storvsc_drv.c
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 06ea3bc..4910269 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -662,6 +662,13 @@ config VMWARE_PVSCSI
To compile this driver as a module, choose M here: the
module will be called vmw_pvscsi.
+config HYPERV_STORAGE
+ tristate "Microsoft Hyper-V virtual storage driver"
+ depends on SCSI && HYPERV
+ default HYPERV
+ help
+ Select this option to enable the Hyper-V virtual storage driver.
+
config LIBFC
tristate "LibFC module"
select SCSI_FC_ATTRS
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 2b88749..e4c1a69 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -142,6 +142,7 @@ obj-$(CONFIG_SCSI_BNX2_ISCSI) += libiscsi.o bnx2i/
obj-$(CONFIG_BE2ISCSI) += libiscsi.o be2iscsi/
obj-$(CONFIG_SCSI_PMCRAID) += pmcraid.o
obj-$(CONFIG_VMWARE_PVSCSI) += vmw_pvscsi.o
+obj-$(CONFIG_HYPERV_STORAGE) += hv_storvsc.o
obj-$(CONFIG_ARM) += arm/
@@ -170,6 +171,8 @@ scsi_mod-$(CONFIG_SCSI_PROC_FS) += scsi_proc.o
scsi_mod-y += scsi_trace.o
scsi_mod-$(CONFIG_PM) += scsi_pm.o
+hv_storvsc-y := storvsc_drv.o
+
scsi_tgt-y += scsi_tgt_lib.o scsi_tgt_if.o
sd_mod-objs := sd.o
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
new file mode 100644
index 0000000..18f8771
--- /dev/null
+++ b/drivers/scsi/storvsc_drv.c
@@ -0,0 +1,1583 @@
+/*
+ * Copyright (c) 2009, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ * Haiyang Zhang <haiyangz@microsoft.com>
+ * Hank Janssen <hjanssen@microsoft.com>
+ * K. Y. Srinivasan <kys@microsoft.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/string.h>
+#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/hyperv.h>
+#include <linux/mempool.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_tcq.h>
+#include <scsi/scsi_eh.h>
+#include <scsi/scsi_devinfo.h>
+#include <scsi/scsi_dbg.h>
+
+
+#define STORVSC_MIN_BUF_NR 64
+#define STORVSC_RING_BUFFER_SIZE (20*PAGE_SIZE)
+static int storvsc_ringbuffer_size = STORVSC_RING_BUFFER_SIZE;
+
+module_param(storvsc_ringbuffer_size, int, S_IRUGO);
+MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
+
+/* to alert the user that structure sizes may be mismatched even though the */
+/* protocol versions match. */
+
+
+#define REVISION_STRING(REVISION_) #REVISION_
+#define FILL_VMSTOR_REVISION(RESULT_LVALUE_) \
+ do { \
+ char *revision_string \
+ = REVISION_STRING($Rev : 6 $) + 6; \
+ RESULT_LVALUE_ = 0; \
+ while (*revision_string >= '0' \
+ && *revision_string <= '9') { \
+ RESULT_LVALUE_ *= 10; \
+ RESULT_LVALUE_ += *revision_string - '0'; \
+ revision_string++; \
+ } \
+ } while (0)
+
+/* Major/minor macros. Minor version is in LSB, meaning that earlier flat */
+/* version numbers will be interpreted as "0.x" (i.e., 1 becomes 0.1). */
+#define VMSTOR_PROTOCOL_MAJOR(VERSION_) (((VERSION_) >> 8) & 0xff)
+#define VMSTOR_PROTOCOL_MINOR(VERSION_) (((VERSION_)) & 0xff)
+#define VMSTOR_PROTOCOL_VERSION(MAJOR_, MINOR_) ((((MAJOR_) & 0xff) << 8) | \
+ (((MINOR_) & 0xff)))
+#define VMSTOR_INVALID_PROTOCOL_VERSION (-1)
+
+/* Version history: */
+/* V1 Beta 0.1 */
+/* V1 RC < 2008/1/31 1.0 */
+/* V1 RC > 2008/1/31 2.0 */
+#define VMSTOR_PROTOCOL_VERSION_CURRENT VMSTOR_PROTOCOL_VERSION(4, 2)
+
+
+
+
+/* This will get replaced with the max transfer length that is possible on */
+/* the host adapter. */
+/* The max transfer length will be published when we offer a vmbus channel. */
+#define MAX_TRANSFER_LENGTH 0x40000
+#define DEFAULT_PACKET_SIZE (sizeof(struct vmdata_gpa_direct) + \
+ sizeof(struct vstor_packet) + \
+ sizesizeof(u64) * (MAX_TRANSFER_LENGTH / PAGE_SIZE)))
+
+
+/* Packet structure describing virtual storage requests. */
+enum vstor_packet_operation {
+ VSTOR_OPERATION_COMPLETE_IO = 1,
+ VSTOR_OPERATION_REMOVE_DEVICE = 2,
+ VSTOR_OPERATION_EXECUTE_SRB = 3,
+ VSTOR_OPERATION_RESET_LUN = 4,
+ VSTOR_OPERATION_RESET_ADAPTER = 5,
+ VSTOR_OPERATION_RESET_BUS = 6,
+ VSTOR_OPERATION_BEGIN_INITIALIZATION = 7,
+ VSTOR_OPERATION_END_INITIALIZATION = 8,
+ VSTOR_OPERATION_QUERY_PROTOCOL_VERSION = 9,
+ VSTOR_OPERATION_QUERY_PROPERTIES = 10,
+ VSTOR_OPERATION_ENUMERATE_BUS = 11,
+ VSTOR_OPERATION_MAXIMUM = 11
+};
+
+/*
+ * Platform neutral description of a scsi request -
+ * this remains the same across the write regardless of 32/64 bit
+ * note: it's patterned off the SCSI_PASS_THROUGH structure
+ */
+#define CDB16GENERIC_LENGTH 0x10
+
+#ifndef SENSE_BUFFER_SIZE
+#define SENSE_BUFFER_SIZE 0x12
+#endif
+
+#define MAX_DATA_BUF_LEN_WITH_PADDING 0x14
+
+struct vmscsi_request {
+ unsigned short length;
+ unsigned char srb_status;
+ unsigned char scsi_status;
+
+ unsigned char port_number;
+ unsigned char path_id;
+ unsigned char target_id;
+ unsigned char lun;
+
+ unsigned char cdb_length;
+ unsigned char sense_info_length;
+ unsigned char data_in;
+ unsigned char reserved;
+
+ unsigned int data_transfer_length;
+
+ union {
+ unsigned char cdb[CDB16GENERIC_LENGTH];
+ unsigned char sense_data[SENSE_BUFFER_SIZE];
+ unsigned char reserved_array[MAX_DATA_BUF_LEN_WITH_PADDING];
+ };
+} __attribute((packed));
+
+
+/*
+ * This structure is sent during the intialization phase to get the different
+ * properties of the channel.
+ */
+struct vmstorage_channel_properties {
+ unsigned short protocol_version;
+ unsigned char path_id;
+ unsigned char target_id;
+
+ /* Note: port number is only really known on the client side */
+ unsigned int port_number;
+ unsigned int flags;
+ unsigned int max_transfer_bytes;
+
+ /* This id is unique for each channel and will correspond with */
+ /* vendor specific data in the inquirydata */
+ unsigned long long unique_id;
+} __packed;
+
+/* This structure is sent during the storage protocol negotiations. */
+struct vmstorage_protocol_version {
+ /* Major (MSW) and minor (LSW) version numbers. */
+ unsigned short major_minor;
+
+ /*
+ * Revision number is auto-incremented whenever this file is changed
+ * (See FILL_VMSTOR_REVISION macro above). Mismatch does not
+ * definitely indicate incompatibility--but it does indicate mismatched
+ * builds.
+ */
+ unsigned short revision;
+} __packed;
+
+/* Channel Property Flags */
+#define STORAGE_CHANNEL_REMOVABLE_FLAG 0x1
+#define STORAGE_CHANNEL_EMULATED_IDE_FLAG 0x2
+
+struct vstor_packet {
+ /* Requested operation type */
+ enum vstor_packet_operation operation;
+
+ /* Flags - see below for values */
+ unsigned int flags;
+
+ /* Status of the request returned from the server side. */
+ unsigned int status;
+
+ /* Data payload area */
+ union {
+ /*
+ * Structure used to forward SCSI commands from the
+ * client to the server.
+ */
+ struct vmscsi_request vm_srb;
+
+ /* Structure used to query channel properties. */
+ struct vmstorage_channel_properties storage_channel_properties;
+
+ /* Used during version negotiations. */
+ struct vmstorage_protocol_version version;
+ };
+} __packed;
+
+/* Packet flags */
+/*
+ * This flag indicates that the server should send back a completion for this
+ * packet.
+ */
+#define REQUEST_COMPLETION_FLAG 0x1
+
+/* This is the set of flags that the vsc can set in any packets it sends */
+#define VSC_LEGAL_FLAGS (REQUEST_COMPLETION_FLAG)
+
+
+/* Defines */
+
+#define STORVSC_MAX_IO_REQUESTS 128
+
+/*
+ * In Hyper-V, each port/path/target maps to 1 scsi host adapter. In
+ * reality, the path/target is not used (ie always set to 0) so our
+ * scsi host adapter essentially has 1 bus with 1 target that contains
+ * up to 256 luns.
+ */
+#define STORVSC_MAX_LUNS_PER_TARGET 64
+#define STORVSC_MAX_TARGETS 1
+#define STORVSC_MAX_CHANNELS 1
+#define STORVSC_MAX_CMD_LEN 16
+
+/* Matches Windows-end */
+enum storvsc_request_type {
+ WRITE_TYPE,
+ READ_TYPE,
+ UNKNOWN_TYPE,
+};
+
+
+struct hv_storvsc_request {
+ struct hv_device *device;
+
+ /* Synchronize the request/response if needed */
+ struct completion wait_event;
+
+ unsigned char *sense_buffer;
+ void *context;
+ void (*on_io_completion)(struct hv_storvsc_request *request);
+ struct hv_multipage_buffer data_buffer;
+
+ struct vstor_packet vstor_packet;
+};
+
+
+/* A storvsc device is a device object that contains a vmbus channel */
+struct storvsc_device {
+ struct hv_device *device;
+
+ bool destroy;
+ bool drain_notify;
+ atomic_t num_outstanding_req;
+ struct Scsi_Host *host;
+
+ wait_queue_head_t waiting_to_drain;
+
+ /*
+ * Each unique Port/Path/Target represents 1 channel ie scsi
+ * controller. In reality, the pathid, targetid is always 0
+ * and the port is set by us
+ */
+ unsigned int port_number;
+ unsigned char path_id;
+ unsigned char target_id;
+
+ /* Used for vsc/vsp channel reset process */
+ struct hv_storvsc_request init_request;
+ struct hv_storvsc_request reset_request;
+};
+
+struct stor_mem_pools {
+ struct kmem_cache *request_pool;
+ mempool_t *request_mempool;
+};
+
+struct hv_host_device {
+ struct hv_device *dev;
+ unsigned int port;
+ unsigned char path;
+ unsigned char target;
+};
+
+struct storvsc_cmd_request {
+ struct list_head entry;
+ struct scsi_cmnd *cmd;
+
+ unsigned int bounce_sgl_count;
+ struct scatterlist *bounce_sgl;
+
+ struct hv_storvsc_request request;
+};
+
+struct storvsc_scan_work {
+ struct work_struct work;
+ struct Scsi_Host *host;
+ uint lun;
+};
+
+static void storvsc_bus_scan(struct work_struct *work)
+{
+ struct storvsc_scan_work *wrk;
+ int id, order_id;
+
+ wrk = container_of(work, struct storvsc_scan_work, work);
+ for (id = 0; id < wrk->host->max_id; ++id) {
+ if (wrk->host->reverse_ordering)
+ order_id = wrk->host->max_id - id - 1;
+ else
+ order_id = id;
+
+ scsi_scan_target(&wrk->host->shost_gendev, 0,
+ order_id, SCAN_WILD_CARD, 1);
+ }
+ kfree(wrk);
+}
+
+static void storvsc_remove_lun(struct work_struct *work)
+{
+ struct storvsc_scan_work *wrk;
+ struct scsi_device *sdev;
+
+ wrk = container_of(work, struct storvsc_scan_work, work);
+ if (!scsi_host_get(wrk->host))
+ goto done;
+
+ sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
+
+ if (sdev) {
+ scsi_remove_device(sdev);
+ scsi_device_put(sdev);
+ }
+ scsi_host_put(wrk->host);
+
+done:
+ kfree(wrk);
+}
+
+static inline struct storvsc_device *get_out_stor_device(
+ struct hv_device *device)
+{
+ struct storvsc_device *stor_device;
+
+ stor_device = hv_get_drvdata(device);
+
+ if (stor_device && stor_device->destroy)
+ stor_device = NULL;
+
+ return stor_device;
+}
+
+
+static inline void storvsc_wait_to_drain(struct storvsc_device *dev)
+{
+ dev->drain_notify = true;
+ wait_event(dev->waiting_to_drain,
+ atomic_read(&dev->num_outstanding_req) == 0);
+ dev->drain_notify = false;
+}
+
+static inline struct storvsc_device *get_in_stor_device(
+ struct hv_device *device)
+{
+ struct storvsc_device *stor_device;
+
+ stor_device = hv_get_drvdata(device);
+
+ if (!stor_device)
+ goto get_in_err;
+
+ /*
+ * If the device is being destroyed; allow incoming
+ * traffic only to cleanup outstanding requests.
+ */
+
+ if (stor_device->destroy &&
+ (atomic_read(&stor_device->num_outstanding_req) == 0))
+ stor_device = NULL;
+
+get_in_err:
+ return stor_device;
+
+}
+
+static int storvsc_channel_init(struct hv_device *device)
+{
+ struct storvsc_device *stor_device;
+ struct hv_storvsc_request *request;
+ struct vstor_packet *vstor_packet;
+ int ret, t;
+
+ stor_device = get_out_stor_device(device);
+ if (!stor_device)
+ return -ENODEV;
+
+ request = &stor_device->init_request;
+ vstor_packet = &request->vstor_packet;
+
+ /*
+ * Now, initiate the vsc/vsp initialization protocol on the open
+ * channel
+ */
+ memset(request, 0, sizeof(struct hv_storvsc_request));
+ init_completion(&request->wait_event);
+ vstor_packet->operation = VSTOR_OPERATION_BEGIN_INITIALIZATION;
+ vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+
+ ret = vmbus_sendpacket(device->channel, vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)request,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret != 0)
+ goto cleanup;
+
+ t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+ if (t == 0) {
+ ret = -ETIMEDOUT;
+ goto cleanup;
+ }
+
+ if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+ vstor_packet->status != 0)
+ goto cleanup;
+
+
+ /* reuse the packet for version range supported */
+ memset(vstor_packet, 0, sizeof(struct vstor_packet));
+ vstor_packet->operation = VSTOR_OPERATION_QUERY_PROTOCOL_VERSION;
+ vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+
+ vstor_packet->version.major_minor = VMSTOR_PROTOCOL_VERSION_CURRENT;
+ FILL_VMSTOR_REVISION(vstor_packet->version.revision);
+
+ ret = vmbus_sendpacket(device->channel, vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)request,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret != 0)
+ goto cleanup;
+
+ t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+ if (t == 0) {
+ ret = -ETIMEDOUT;
+ goto cleanup;
+ }
+
+ if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+ vstor_packet->status != 0)
+ goto cleanup;
+
+
+ memset(vstor_packet, 0, sizeof(struct vstor_packet));
+ vstor_packet->operation = VSTOR_OPERATION_QUERY_PROPERTIES;
+ vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+ vstor_packet->storage_channel_properties.port_number =
+ stor_device->port_number;
+
+ ret = vmbus_sendpacket(device->channel, vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)request,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+ if (ret != 0)
+ goto cleanup;
+
+ t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+ if (t == 0) {
+ ret = -ETIMEDOUT;
+ goto cleanup;
+ }
+
+ if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+ vstor_packet->status != 0)
+ goto cleanup;
+
+ stor_device->path_id = vstor_packet->storage_channel_properties.path_id;
+ stor_device->target_id
+ = vstor_packet->storage_channel_properties.target_id;
+
+ memset(vstor_packet, 0, sizeof(struct vstor_packet));
+ vstor_packet->operation = VSTOR_OPERATION_END_INITIALIZATION;
+ vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+
+ ret = vmbus_sendpacket(device->channel, vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)request,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+ if (ret != 0)
+ goto cleanup;
+
+ t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+ if (t == 0) {
+ ret = -ETIMEDOUT;
+ goto cleanup;
+ }
+
+ if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+ vstor_packet->status != 0)
+ goto cleanup;
+
+
+cleanup:
+ return ret;
+}
+
+static void storvsc_on_io_completion(struct hv_device *device,
+ struct vstor_packet *vstor_packet,
+ struct hv_storvsc_request *request)
+{
+ struct storvsc_device *stor_device;
+ struct vstor_packet *stor_pkt;
+
+ stor_device = hv_get_drvdata(device);
+ stor_pkt = &request->vstor_packet;
+
+ /*
+ * The current SCSI handling on the host side does
+ * not correctly handle:
+ * INQUIRY command with page code parameter set to 0x80
+ * MODE_SENSE command with cmd[2] == 0x1c
+ *
+ * Setup srb and scsi status so this won't be fatal.
+ * We do this so we can distinguish truly fatal failues
+ * (srb status == 0x4) and off-line the device in that case.
+ */
+
+ if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
+ (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
+ vstor_packet->vm_srb.scsi_status = 0;
+ vstor_packet->vm_srb.srb_status = 0x1;
+ }
+
+
+ /* Copy over the status...etc */
+ stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
+ stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status;
+ stor_pkt->vm_srb.sense_info_length =
+ vstor_packet->vm_srb.sense_info_length;
+
+ if (vstor_packet->vm_srb.scsi_status != 0 ||
+ vstor_packet->vm_srb.srb_status != 1){
+ dev_warn(&device->device,
+ "cmd 0x%x scsi status 0x%x srb status 0x%x\n",
+ stor_pkt->vm_srb.cdb[0],
+ vstor_packet->vm_srb.scsi_status,
+ vstor_packet->vm_srb.srb_status);
+ }
+
+ if ((vstor_packet->vm_srb.scsi_status & 0xFF) == 0x02) {
+ /* CHECK_CONDITION */
+ if (vstor_packet->vm_srb.srb_status & 0x80) {
+ /* autosense data available */
+ dev_warn(&device->device,
+ "stor pkt %p autosense data valid - len %d\n",
+ request,
+ vstor_packet->vm_srb.sense_info_length);
+
+ memcpy(request->sense_buffer,
+ vstor_packet->vm_srb.sense_data,
+ vstor_packet->vm_srb.sense_info_length);
+
+ }
+ }
+
+ stor_pkt->vm_srb.data_transfer_length =
+ vstor_packet->vm_srb.data_transfer_length;
+
+ request->on_io_completion(request);
+
+ if (atomic_dec_and_test(&stor_device->num_outstanding_req) &&
+ stor_device->drain_notify)
+ wake_up(&stor_device->waiting_to_drain);
+
+
+}
+
+static void storvsc_on_receive(struct hv_device *device,
+ struct vstor_packet *vstor_packet,
+ struct hv_storvsc_request *request)
+{
+ struct storvsc_scan_work *work;
+ struct storvsc_device *stor_device;
+
+ switch (vstor_packet->operation) {
+ case VSTOR_OPERATION_COMPLETE_IO:
+ storvsc_on_io_completion(device, vstor_packet, request);
+ break;
+
+ case VSTOR_OPERATION_REMOVE_DEVICE:
+ case VSTOR_OPERATION_ENUMERATE_BUS:
+ stor_device = get_in_stor_device(device);
+ work = kmalloc(sizeof(struct storvsc_scan_work), GFP_ATOMIC);
+ if (!work)
+ return;
+
+ INIT_WORK(&work->work, storvsc_bus_scan);
+ work->host = stor_device->host;
+ schedule_work(&work->work);
+ break;
+
+ default:
+ break;
+ }
+}
+
+static void storvsc_on_channel_callback(void *context)
+{
+ struct hv_device *device = (struct hv_device *)context;
+ struct storvsc_device *stor_device;
+ u32 bytes_recvd;
+ u64 request_id;
+ unsigned char packet[ALIGN(sizeof(struct vstor_packet), 8)];
+ struct hv_storvsc_request *request;
+ int ret;
+
+
+ stor_device = get_in_stor_device(device);
+ if (!stor_device)
+ return;
+
+ do {
+ ret = vmbus_recvpacket(device->channel, packet,
+ ALIGN(sizeof(struct vstor_packet), 8),
+ &bytes_recvd, &request_id);
+ if (ret == 0 && bytes_recvd > 0) {
+
+ request = (struct hv_storvsc_request *)
+ (unsigned long)request_id;
+
+ if ((request == &stor_device->init_request) ||
+ (request == &stor_device->reset_request)) {
+
+ memcpy(&request->vstor_packet, packet,
+ sizeof(struct vstor_packet));
+ complete(&request->wait_event);
+ } else {
+ storvsc_on_receive(device,
+ (struct vstor_packet *)packet,
+ request);
+ }
+ } else {
+ break;
+ }
+ } while (1);
+
+ return;
+}
+
+static int storvsc_connect_to_vsp(struct hv_device *device, u32 ring_size)
+{
+ struct vmstorage_channel_properties props;
+ int ret;
+
+ memset(&props, 0, sizeof(struct vmstorage_channel_properties));
+
+ /* Open the channel */
+ ret = vmbus_open(device->channel,
+ ring_size,
+ ring_size,
+ (void *)&props,
+ sizeof(struct vmstorage_channel_properties),
+ storvsc_on_channel_callback, device);
+
+ if (ret != 0)
+ return ret;
+
+ ret = storvsc_channel_init(device);
+
+ return ret;
+}
+
+static int storvsc_dev_remove(struct hv_device *device)
+{
+ struct storvsc_device *stor_device;
+ unsigned long flags;
+
+ stor_device = hv_get_drvdata(device);
+
+ spin_lock_irqsave(&device->channel->inbound_lock, flags);
+ stor_device->destroy = true;
+ spin_unlock_irqrestore(&device->channel->inbound_lock, flags);
+
+ /*
+ * At this point, all outbound traffic should be disable. We
+ * only allow inbound traffic (responses) to proceed so that
+ * outstanding requests can be completed.
+ */
+
+ storvsc_wait_to_drain(stor_device);
+
+ /*
+ * Since we have already drained, we don't need to busy wait
+ * as was done in final_release_stor_device()
+ * Note that we cannot set the ext pointer to NULL until
+ * we have drained - to drain the outgoing packets, we need to
+ * allow incoming packets.
+ */
+ spin_lock_irqsave(&device->channel->inbound_lock, flags);
+ hv_set_drvdata(device, NULL);
+ spin_unlock_irqrestore(&device->channel->inbound_lock, flags);
+
+ /* Close the channel */
+ vmbus_close(device->channel);
+
+ kfree(stor_device);
+ return 0;
+}
+
+static int storvsc_do_io(struct hv_device *device,
+ struct hv_storvsc_request *request)
+{
+ struct storvsc_device *stor_device;
+ struct vstor_packet *vstor_packet;
+ int ret = 0;
+
+ vstor_packet = &request->vstor_packet;
+ stor_device = get_out_stor_device(device);
+
+ if (!stor_device)
+ return -ENODEV;
+
+
+ request->device = device;
+
+
+ vstor_packet->flags |= REQUEST_COMPLETION_FLAG;
+
+ vstor_packet->vm_srb.length = sizeof(struct vmscsi_request);
+
+
+ vstor_packet->vm_srb.sense_info_length = SENSE_BUFFER_SIZE;
+
+
+ vstor_packet->vm_srb.data_transfer_length =
+ request->data_buffer.len;
+
+ vstor_packet->operation = VSTOR_OPERATION_EXECUTE_SRB;
+
+ if (request->data_buffer.len) {
+ ret = vmbus_sendpacket_multipagebuffer(device->channel,
+ &request->data_buffer,
+ vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)request);
+ } else {
+ ret = vmbus_sendpacket(device->channel, vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)request,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ }
+
+ if (ret != 0)
+ return ret;
+
+ atomic_inc(&stor_device->num_outstanding_req);
+
+ return ret;
+}
+
+static void storvsc_get_ide_info(struct hv_device *dev, int *target, int *path)
+{
+ *target =
+ dev->dev_instance.b[5] << 8 | dev->dev_instance.b[4];
+
+ *path =
+ dev->dev_instance.b[3] << 24 |
+ dev->dev_instance.b[2] << 16 |
+ dev->dev_instance.b[1] << 8 | dev->dev_instance.b[0];
+}
+
+
+static int storvsc_device_alloc(struct scsi_device *sdevice)
+{
+ struct stor_mem_pools *memp;
+ int number = STORVSC_MIN_BUF_NR;
+
+ memp = kzalloc(sizeof(struct stor_mem_pools), GFP_KERNEL);
+ if (!memp)
+ return -ENOMEM;
+
+ memp->request_pool =
+ kmem_cache_create(dev_name(&sdevice->sdev_dev),
+ sizeof(struct storvsc_cmd_request), 0,
+ SLAB_HWCACHE_ALIGN, NULL);
+
+ if (!memp->request_pool)
+ goto err0;
+
+ memp->request_mempool = mempool_create(number, mempool_alloc_slab,
+ mempool_free_slab,
+ memp->request_pool);
+
+ if (!memp->request_mempool)
+ goto err1;
+
+ sdevice->hostdata = memp;
+
+ return 0;
+
+err1:
+ kmem_cache_destroy(memp->request_pool);
+
+err0:
+ kfree(memp);
+ return -ENOMEM;
+}
+
+static void storvsc_device_destroy(struct scsi_device *sdevice)
+{
+ struct stor_mem_pools *memp = sdevice->hostdata;
+
+ mempool_destroy(memp->request_mempool);
+ kmem_cache_destroy(memp->request_pool);
+ kfree(memp);
+ sdevice->hostdata = NULL;
+}
+
+static int storvsc_device_configure(struct scsi_device *sdevice)
+{
+ scsi_adjust_queue_depth(sdevice, MSG_SIMPLE_TAG,
+ STORVSC_MAX_IO_REQUESTS);
+
+ blk_queue_max_segment_size(sdevice->request_queue, PAGE_SIZE);
+
+ blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
+
+ return 0;
+}
+
+static void destroy_bounce_buffer(struct scatterlist *sgl,
+ unsigned int sg_count)
+{
+ int i;
+ struct page *page_buf;
+
+ for (i = 0; i < sg_count; i++) {
+ page_buf = sg_page((&sgl[i]));
+ if (page_buf != NULL)
+ __free_page(page_buf);
+ }
+
+ kfree(sgl);
+}
+
+static int do_bounce_buffer(struct scatterlist *sgl, unsigned int sg_count)
+{
+ int i;
+
+ /* No need to check */
+ if (sg_count < 2)
+ return -1;
+
+ /* We have at least 2 sg entries */
+ for (i = 0; i < sg_count; i++) {
+ if (i == 0) {
+ /* make sure 1st one does not have hole */
+ if (sgl[i].offset + sgl[i].length != PAGE_SIZE)
+ return i;
+ } else if (i == sg_count - 1) {
+ /* make sure last one does not have hole */
+ if (sgl[i].offset != 0)
+ return i;
+ } else {
+ /* make sure no hole in the middle */
+ if (sgl[i].length != PAGE_SIZE || sgl[i].offset != 0)
+ return i;
+ }
+ }
+ return -1;
+}
+
+static struct scatterlist *create_bounce_buffer(struct scatterlist *sgl,
+ unsigned int sg_count,
+ unsigned int len)
+{
+ int i;
+ int num_pages;
+ struct scatterlist *bounce_sgl;
+ struct page *page_buf;
+
+ num_pages = ALIGN(len, PAGE_SIZE) >> PAGE_SHIFT;
+
+ bounce_sgl = kcalloc(num_pages, sizeof(struct scatterlist), GFP_ATOMIC);
+ if (!bounce_sgl)
+ return NULL;
+
+ for (i = 0; i < num_pages; i++) {
+ page_buf = alloc_page(GFP_ATOMIC);
+ if (!page_buf)
+ goto cleanup;
+ sg_set_page(&bounce_sgl[i], page_buf, 0, 0);
+ }
+
+ return bounce_sgl;
+
+cleanup:
+ destroy_bounce_buffer(bounce_sgl, num_pages);
+ return NULL;
+}
+
+
+/* Assume the original sgl has enough room */
+static unsigned int copy_from_bounce_buffer(struct scatterlist *orig_sgl,
+ struct scatterlist *bounce_sgl,
+ unsigned int orig_sgl_count,
+ unsigned int bounce_sgl_count)
+{
+ int i;
+ int j = 0;
+ unsigned long src, dest;
+ unsigned int srclen, destlen, copylen;
+ unsigned int total_copied = 0;
+ unsigned long bounce_addr = 0;
+ unsigned long dest_addr = 0;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ for (i = 0; i < orig_sgl_count; i++) {
+ dest_addr = (unsigned long)kmap_atomic(sg_page((&orig_sgl[i])),
+ KM_IRQ0) + orig_sgl[i].offset;
+ dest = dest_addr;
+ destlen = orig_sgl[i].length;
+
+ if (bounce_addr == 0)
+ bounce_addr =
+ (unsigned long)kmap_atomic(sg_page((&bounce_sgl[j])),
+ KM_IRQ0);
+
+ while (destlen) {
+ src = bounce_addr + bounce_sgl[j].offset;
+ srclen = bounce_sgl[j].length - bounce_sgl[j].offset;
+
+ copylen = min(srclen, destlen);
+ memcpy((void *)dest, (void *)src, copylen);
+
+ total_copied += copylen;
+ bounce_sgl[j].offset += copylen;
+ destlen -= copylen;
+ dest += copylen;
+
+ if (bounce_sgl[j].offset == bounce_sgl[j].length) {
+ /* full */
+ kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+ j++;
+
+ /*
+ * It is possible that the number of elements
+ * in the bounce buffer may not be equal to
+ * the number of elements in the original
+ * scatter list. Handle this correctly.
+ */
+
+ if (j == bounce_sgl_count) {
+ /*
+ * We are done; cleanup and return.
+ */
+ kunmap_atomic((void *)(dest_addr -
+ orig_sgl[i].offset),
+ KM_IRQ0);
+ local_irq_restore(flags);
+ return total_copied;
+ }
+
+ /* if we need to use another bounce buffer */
+ if (destlen || i != orig_sgl_count - 1)
+ bounce_addr =
+ (unsigned long)kmap_atomic(
+ sg_page((&bounce_sgl[j])), KM_IRQ0);
+ } else if (destlen == 0 && i == orig_sgl_count - 1) {
+ /* unmap the last bounce that is < PAGE_SIZE */
+ kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+ }
+ }
+
+ kunmap_atomic((void *)(dest_addr - orig_sgl[i].offset),
+ KM_IRQ0);
+ }
+
+ local_irq_restore(flags);
+
+ return total_copied;
+}
+
+
+/* Assume the bounce_sgl has enough room ie using the create_bounce_buffer() */
+static unsigned int copy_to_bounce_buffer(struct scatterlist *orig_sgl,
+ struct scatterlist *bounce_sgl,
+ unsigned int orig_sgl_count)
+{
+ int i;
+ int j = 0;
+ unsigned long src, dest;
+ unsigned int srclen, destlen, copylen;
+ unsigned int total_copied = 0;
+ unsigned long bounce_addr = 0;
+ unsigned long src_addr = 0;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ for (i = 0; i < orig_sgl_count; i++) {
+ src_addr = (unsigned long)kmap_atomic(sg_page((&orig_sgl[i])),
+ KM_IRQ0) + orig_sgl[i].offset;
+ src = src_addr;
+ srclen = orig_sgl[i].length;
+
+ if (bounce_addr == 0)
+ bounce_addr =
+ (unsigned long)kmap_atomic(sg_page((&bounce_sgl[j])),
+ KM_IRQ0);
+
+ while (srclen) {
+ /* assume bounce offset always == 0 */
+ dest = bounce_addr + bounce_sgl[j].length;
+ destlen = PAGE_SIZE - bounce_sgl[j].length;
+
+ copylen = min(srclen, destlen);
+ memcpy((void *)dest, (void *)src, copylen);
+
+ total_copied += copylen;
+ bounce_sgl[j].length += copylen;
+ srclen -= copylen;
+ src += copylen;
+
+ if (bounce_sgl[j].length == PAGE_SIZE) {
+ /* full..move to next entry */
+ kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+ j++;
+
+ /* if we need to use another bounce buffer */
+ if (srclen || i != orig_sgl_count - 1)
+ bounce_addr =
+ (unsigned long)kmap_atomic(
+ sg_page((&bounce_sgl[j])), KM_IRQ0);
+
+ } else if (srclen == 0 && i == orig_sgl_count - 1) {
+ /* unmap the last bounce that is < PAGE_SIZE */
+ kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+ }
+ }
+
+ kunmap_atomic((void *)(src_addr - orig_sgl[i].offset), KM_IRQ0);
+ }
+
+ local_irq_restore(flags);
+
+ return total_copied;
+}
+
+
+static int storvsc_remove(struct hv_device *dev)
+{
+ struct storvsc_device *stor_device = hv_get_drvdata(dev);
+ struct Scsi_Host *host = stor_device->host;
+
+ scsi_remove_host(host);
+
+ scsi_host_put(host);
+
+ storvsc_dev_remove(dev);
+
+ return 0;
+}
+
+
+static int storvsc_get_chs(struct scsi_device *sdev, struct block_device * bdev,
+ sector_t capacity, int *info)
+{
+ sector_t nsect = capacity;
+ sector_t cylinders = nsect;
+ int heads, sectors_pt;
+
+ /*
+ * We are making up these values; let us keep it simple.
+ */
+ heads = 0xff;
+ sectors_pt = 0x3f; /* Sectors per track */
+ sector_div(cylinders, heads * sectors_pt);
+ if ((sector_t)(cylinders + 1) * heads * sectors_pt < nsect)
+ cylinders = 0xffff;
+
+ info[0] = heads;
+ info[1] = sectors_pt;
+ info[2] = (int)cylinders;
+
+ return 0;
+}
+
+static int storvsc_host_reset(struct hv_device *device)
+{
+ struct storvsc_device *stor_device;
+ struct hv_storvsc_request *request;
+ struct vstor_packet *vstor_packet;
+ int ret, t;
+
+
+ stor_device = get_out_stor_device(device);
+ if (!stor_device)
+ return FAILED;
+
+ request = &stor_device->reset_request;
+ vstor_packet = &request->vstor_packet;
+
+ init_completion(&request->wait_event);
+
+ vstor_packet->operation = VSTOR_OPERATION_RESET_BUS;
+ vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+ vstor_packet->vm_srb.path_id = stor_device->path_id;
+
+ ret = vmbus_sendpacket(device->channel, vstor_packet,
+ sizeof(struct vstor_packet),
+ (unsigned long)&stor_device->reset_request,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret != 0)
+ return FAILED;
+
+ t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+ if (t == 0)
+ return TIMEOUT_ERROR;
+
+
+ /*
+ * At this point, all outstanding requests in the adapter
+ * should have been flushed out and return to us
+ */
+
+ return SUCCESS;
+}
+
+
+/*
+ * storvsc_host_reset_handler - Reset the scsi HBA
+ */
+static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
+{
+ struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
+ struct hv_device *dev = host_dev->dev;
+
+ return storvsc_host_reset(dev);
+}
+
+
+/*
+ * storvsc_command_completion - Command completion processing
+ */
+static void storvsc_command_completion(struct hv_storvsc_request *request)
+{
+ struct storvsc_cmd_request *cmd_request =
+ (struct storvsc_cmd_request *)request->context;
+ struct scsi_cmnd *scmnd = cmd_request->cmd;
+ struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
+ void (*scsi_done_fn)(struct scsi_cmnd *);
+ struct scsi_sense_hdr sense_hdr;
+ struct vmscsi_request *vm_srb;
+ struct storvsc_scan_work *wrk;
+ struct stor_mem_pools *memp = scmnd->device->hostdata;
+
+ vm_srb = &request->vstor_packet.vm_srb;
+ if (cmd_request->bounce_sgl_count) {
+ if (vm_srb->data_in == READ_TYPE)
+ copy_from_bounce_buffer(scsi_sglist(scmnd),
+ cmd_request->bounce_sgl,
+ scsi_sg_count(scmnd),
+ cmd_request->bounce_sgl_count);
+ destroy_bounce_buffer(cmd_request->bounce_sgl,
+ cmd_request->bounce_sgl_count);
+ }
+
+ /*
+ * If there is an error; offline the device since all
+ * error recovery strategies would have already been
+ * deployed on the host side.
+ */
+ if (vm_srb->srb_status == 0x4)
+ scmnd->result = DID_TARGET_FAILURE << 16;
+ else
+ scmnd->result = vm_srb->scsi_status;
+
+ /*
+ * If the LUN is invalid; remove the device.
+ */
+ if (vm_srb->srb_status == 0x20) {
+ struct storvsc_device *stor_dev;
+ struct hv_device *dev = host_dev->dev;
+ struct Scsi_Host *host;
+
+ stor_dev = get_in_stor_device(dev);
+ host = stor_dev->host;
+
+ wrk = kmalloc(sizeof(struct storvsc_scan_work),
+ GFP_ATOMIC);
+ if (!wrk) {
+ scmnd->result = DID_TARGET_FAILURE << 16;
+ } else {
+ wrk->host = host;
+ wrk->lun = vm_srb->lun;
+ INIT_WORK(&wrk->work, storvsc_remove_lun);
+ schedule_work(&wrk->work);
+ }
+ }
+
+ if (scmnd->result) {
+ if (scsi_normalize_sense(scmnd->sense_buffer,
+ SCSI_SENSE_BUFFERSIZE, &sense_hdr))
+ scsi_print_sense_hdr("storvsc", &sense_hdr);
+ }
+
+ scsi_set_resid(scmnd,
+ request->data_buffer.len -
+ vm_srb->data_transfer_length);
+
+ scsi_done_fn = scmnd->scsi_done;
+
+ scmnd->host_scribble = NULL;
+ scmnd->scsi_done = NULL;
+
+ scsi_done_fn(scmnd);
+
+ mempool_free(cmd_request, memp->request_mempool);
+}
+
+static bool storvsc_check_scsi_cmd(struct scsi_cmnd *scmnd)
+{
+ bool allowed = true;
+ u8 scsi_op = scmnd->cmnd[0];
+
+ switch (scsi_op) {
+ /* smartd sends this command, which will offline the device */
+ case SET_WINDOW:
+ scmnd->result = ILLEGAL_REQUEST << 16;
+ allowed = false;
+ break;
+ default:
+ break;
+ }
+ return allowed;
+}
+
+/*
+ * storvsc_queuecommand - Initiate command processing
+ */
+static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
+{
+ int ret;
+ struct hv_host_device *host_dev = shost_priv(host);
+ struct hv_device *dev = host_dev->dev;
+ struct hv_storvsc_request *request;
+ struct storvsc_cmd_request *cmd_request;
+ unsigned int request_size = 0;
+ int i;
+ struct scatterlist *sgl;
+ unsigned int sg_count = 0;
+ struct vmscsi_request *vm_srb;
+ struct stor_mem_pools *memp = scmnd->device->hostdata;
+
+ if (storvsc_check_scsi_cmd(scmnd) == false) {
+ scmnd->scsi_done(scmnd);
+ return 0;
+ }
+
+ /* If retrying, no need to prep the cmd */
+ if (scmnd->host_scribble) {
+
+ cmd_request =
+ (struct storvsc_cmd_request *)scmnd->host_scribble;
+
+ goto retry_request;
+ }
+
+ request_size = sizeof(struct storvsc_cmd_request);
+
+ cmd_request = mempool_alloc(memp->request_mempool,
+ GFP_ATOMIC);
+ if (!cmd_request)
+ return SCSI_MLQUEUE_DEVICE_BUSY;
+
+ memset(cmd_request, 0, sizeof(struct storvsc_cmd_request));
+
+ /* Setup the cmd request */
+ cmd_request->bounce_sgl_count = 0;
+ cmd_request->bounce_sgl = NULL;
+ cmd_request->cmd = scmnd;
+
+ scmnd->host_scribble = (unsigned char *)cmd_request;
+
+ request = &cmd_request->request;
+ vm_srb = &request->vstor_packet.vm_srb;
+
+
+ /* Build the SRB */
+ switch (scmnd->sc_data_direction) {
+ case DMA_TO_DEVICE:
+ vm_srb->data_in = WRITE_TYPE;
+ break;
+ case DMA_FROM_DEVICE:
+ vm_srb->data_in = READ_TYPE;
+ break;
+ default:
+ vm_srb->data_in = UNKNOWN_TYPE;
+ break;
+ }
+
+ request->on_io_completion = storvsc_command_completion;
+ request->context = cmd_request;/* scmnd; */
+
+ vm_srb->port_number = host_dev->port;
+ vm_srb->path_id = scmnd->device->channel;
+ vm_srb->target_id = scmnd->device->id;
+ vm_srb->lun = scmnd->device->lun;
+
+ vm_srb->cdb_length = scmnd->cmd_len;
+
+ memcpy(vm_srb->cdb, scmnd->cmnd, vm_srb->cdb_length);
+
+ request->sense_buffer = scmnd->sense_buffer;
+
+
+ request->data_buffer.len = scsi_bufflen(scmnd);
+ if (scsi_sg_count(scmnd)) {
+ sgl = (struct scatterlist *)scsi_sglist(scmnd);
+ sg_count = scsi_sg_count(scmnd);
+
+ /* check if we need to bounce the sgl */
+ if (do_bounce_buffer(sgl, scsi_sg_count(scmnd)) != -1) {
+ cmd_request->bounce_sgl =
+ create_bounce_buffer(sgl, scsi_sg_count(scmnd),
+ scsi_bufflen(scmnd));
+ if (!cmd_request->bounce_sgl) {
+ scmnd->host_scribble = NULL;
+ mempool_free(cmd_request,
+ memp->request_mempool);
+
+ return SCSI_MLQUEUE_HOST_BUSY;
+ }
+
+ cmd_request->bounce_sgl_count =
+ ALIGN(scsi_bufflen(scmnd), PAGE_SIZE) >>
+ PAGE_SHIFT;
+
+ if (vm_srb->data_in == WRITE_TYPE)
+ copy_to_bounce_buffer(sgl,
+ cmd_request->bounce_sgl,
+ scsi_sg_count(scmnd));
+
+ sgl = cmd_request->bounce_sgl;
+ sg_count = cmd_request->bounce_sgl_count;
+ }
+
+ request->data_buffer.offset = sgl[0].offset;
+
+ for (i = 0; i < sg_count; i++)
+ request->data_buffer.pfn_array[i] =
+ page_to_pfn(sg_page((&sgl[i])));
+
+ } else if (scsi_sglist(scmnd)) {
+ request->data_buffer.offset =
+ virt_to_phys(scsi_sglist(scmnd)) & (PAGE_SIZE-1);
+ request->data_buffer.pfn_array[0] =
+ virt_to_phys(scsi_sglist(scmnd)) >> PAGE_SHIFT;
+ }
+
+retry_request:
+ /* Invokes the vsc to start an IO */
+ ret = storvsc_do_io(dev, &cmd_request->request);
+
+ if (ret == -EAGAIN) {
+ /* no more space */
+
+ if (cmd_request->bounce_sgl_count)
+ destroy_bounce_buffer(cmd_request->bounce_sgl,
+ cmd_request->bounce_sgl_count);
+
+ mempool_free(cmd_request, memp->request_mempool);
+
+ scmnd->host_scribble = NULL;
+
+ ret = SCSI_MLQUEUE_DEVICE_BUSY;
+ }
+
+ return ret;
+}
+
+/* Scsi driver */
+static struct scsi_host_template scsi_driver = {
+ .module = THIS_MODULE,
+ .name = "storvsc_host_t",
+ .bios_param = storvsc_get_chs,
+ .queuecommand = storvsc_queuecommand,
+ .eh_host_reset_handler = storvsc_host_reset_handler,
+ .slave_alloc = storvsc_device_alloc,
+ .slave_destroy = storvsc_device_destroy,
+ .slave_configure = storvsc_device_configure,
+ .cmd_per_lun = 1,
+ /* 64 max_queue * 1 target */
+ .can_queue = STORVSC_MAX_IO_REQUESTS*STORVSC_MAX_TARGETS,
+ .this_id = -1,
+ /* no use setting to 0 since ll_blk_rw reset it to 1 */
+ /* currently 32 */
+ .sg_tablesize = MAX_MULTIPAGE_BUFFER_COUNT,
+ .use_clustering = DISABLE_CLUSTERING,
+ /* Make sure we dont get a sg segment crosses a page boundary */
+ .dma_boundary = PAGE_SIZE-1,
+};
+
+enum {
+ SCSI_GUID,
+ IDE_GUID,
+};
+
+static const struct hv_vmbus_device_id id_table[] = {
+ /* SCSI guid */
+ { VMBUS_DEVICE(0xd9, 0x63, 0x61, 0xba, 0xa1, 0x04, 0x29, 0x4d,
+ 0xb6, 0x05, 0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f)
+ .driver_data = SCSI_GUID },
+ /* IDE guid */
+ { VMBUS_DEVICE(0x32, 0x26, 0x41, 0x32, 0xcb, 0x86, 0xa2, 0x44,
+ 0x9b, 0x5c, 0x50, 0xd1, 0x41, 0x73, 0x54, 0xf5)
+ .driver_data = IDE_GUID },
+ { },
+};
+
+MODULE_DEVICE_TABLE(vmbus, id_table);
+
+
+/*
+ * storvsc_probe - Add a new device for this driver
+ */
+
+static int storvsc_probe(struct hv_device *device,
+ const struct hv_vmbus_device_id *dev_id)
+{
+ int ret;
+ struct Scsi_Host *host;
+ struct hv_host_device *host_dev;
+ bool dev_is_ide = ((dev_id->driver_data == IDE_GUID) ? true : false);
+ int path = 0;
+ int target = 0;
+ struct storvsc_device *stor_device;
+
+ host = scsi_host_alloc(&scsi_driver,
+ sizeof(struct hv_host_device));
+ if (!host)
+ return -ENOMEM;
+
+ host_dev = shost_priv(host);
+ memset(host_dev, 0, sizeof(struct hv_host_device));
+
+ host_dev->port = host->host_no;
+ host_dev->dev = device;
+
+
+ stor_device = kzalloc(sizeof(struct storvsc_device), GFP_KERNEL);
+ if (!stor_device) {
+ ret = -ENOMEM;
+ goto err_out0;
+ }
+
+ stor_device->destroy = false;
+ init_waitqueue_head(&stor_device->waiting_to_drain);
+ stor_device->device = device;
+ stor_device->host = host;
+ hv_set_drvdata(device, stor_device);
+
+ stor_device->port_number = host->host_no;
+ ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size);
+ if (ret)
+ goto err_out1;
+
+ if (dev_is_ide)
+ storvsc_get_ide_info(device, &target, &path);
+
+ host_dev->path = stor_device->path_id;
+ host_dev->target = stor_device->target_id;
+
+ /* max # of devices per target */
+ host->max_lun = STORVSC_MAX_LUNS_PER_TARGET;
+ /* max # of targets per channel */
+ host->max_id = STORVSC_MAX_TARGETS;
+ /* max # of channels */
+ host->max_channel = STORVSC_MAX_CHANNELS - 1;
+ /* max cmd length */
+ host->max_cmd_len = STORVSC_MAX_CMD_LEN;
+
+ /* Register the HBA and start the scsi bus scan */
+ ret = scsi_add_host(host, &device->device);
+ if (ret != 0)
+ goto err_out2;
+
+ if (!dev_is_ide) {
+ scsi_scan_host(host);
+ return 0;
+ }
+ ret = scsi_add_device(host, 0, target, 0);
+ if (ret) {
+ scsi_remove_host(host);
+ goto err_out2;
+ }
+ return 0;
+
+err_out2:
+ /*
+ * Once we have connected with the host, we would need to
+ * to invoke storvsc_dev_remove() to rollback this state and
+ * this call also frees up the stor_device; hence the jump around
+ * err_out1 label.
+ */
+ storvsc_dev_remove(device);
+ goto err_out0;
+
+err_out1:
+ kfree(stor_device);
+
+err_out0:
+ scsi_host_put(host);
+ return ret;
+}
+
+/* The one and only one */
+
+static struct hv_driver storvsc_drv = {
+ .name = KBUILD_MODNAME,
+ .id_table = id_table,
+ .probe = storvsc_probe,
+ .remove = storvsc_remove,
+};
+
+static int __init storvsc_drv_init(void)
+{
+ u32 max_outstanding_req_per_channel;
+
+ /*
+ * Divide the ring buffer data size (which is 1 page less
+ * than the ring buffer size since that page is reserved for
+ * the ring buffer indices) by the max request size (which is
+ * vmbus_channel_packet_multipage_buffer + struct vstor_packet + u64)
+ */
+ max_outstanding_req_per_channel =
+ ((storvsc_ringbuffer_size - PAGE_SIZE) /
+ ALIGN(MAX_MULTIPAGE_BUFFER_PACKET +
+ sizeof(struct vstor_packet) + sizeof(u64),
+ sizeof(u64)));
+
+ if (max_outstanding_req_per_channel <
+ STORVSC_MAX_IO_REQUESTS)
+ return -EINVAL;
+
+ return vmbus_driver_register(&storvsc_drv);
+}
+
+static void __exit storvsc_drv_exit(void)
+{
+ vmbus_driver_unregister(&storvsc_drv);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_VERSION(HV_DRV_VERSION);
+MODULE_DESCRIPTION("Microsoft Hyper-V virtual storage driver");
+module_init(storvsc_drv_init);
+module_exit(storvsc_drv_exit);
--
1.7.4.1
^ permalink raw reply related
* [PATCH 5/5] Staging: hv: storvsc: Implement per device memory pools
From: K. Y. Srinivasan @ 2011-12-01 12:59 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1322744360-13067-1-git-send-email-kys@microsoft.com>
The current code implemented a per-HBA memory pool mechanism. For IDE disks
managed by this driver, there is a one to one correspondance between the
block device and the associated virtual HBA and since currently only IDE devices
can be the boot device, this addressed the deadlock issues that were raised during
the review process. This patch implements a per-lun memory pool mechanism.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/staging/hv/storvsc_drv.c | 106 ++++++++++++++++++++++----------------
1 files changed, 62 insertions(+), 44 deletions(-)
diff --git a/drivers/staging/hv/storvsc_drv.c b/drivers/staging/hv/storvsc_drv.c
index c22de06..18f8771 100644
--- a/drivers/staging/hv/storvsc_drv.c
+++ b/drivers/staging/hv/storvsc_drv.c
@@ -285,10 +285,13 @@ struct storvsc_device {
struct hv_storvsc_request reset_request;
};
-struct hv_host_device {
- struct hv_device *dev;
+struct stor_mem_pools {
struct kmem_cache *request_pool;
mempool_t *request_mempool;
+};
+
+struct hv_host_device {
+ struct hv_device *dev;
unsigned int port;
unsigned char path;
unsigned char target;
@@ -790,7 +793,48 @@ static void storvsc_get_ide_info(struct hv_device *dev, int *target, int *path)
static int storvsc_device_alloc(struct scsi_device *sdevice)
{
+ struct stor_mem_pools *memp;
+ int number = STORVSC_MIN_BUF_NR;
+
+ memp = kzalloc(sizeof(struct stor_mem_pools), GFP_KERNEL);
+ if (!memp)
+ return -ENOMEM;
+
+ memp->request_pool =
+ kmem_cache_create(dev_name(&sdevice->sdev_dev),
+ sizeof(struct storvsc_cmd_request), 0,
+ SLAB_HWCACHE_ALIGN, NULL);
+
+ if (!memp->request_pool)
+ goto err0;
+
+ memp->request_mempool = mempool_create(number, mempool_alloc_slab,
+ mempool_free_slab,
+ memp->request_pool);
+
+ if (!memp->request_mempool)
+ goto err1;
+
+ sdevice->hostdata = memp;
+
return 0;
+
+err1:
+ kmem_cache_destroy(memp->request_pool);
+
+err0:
+ kfree(memp);
+ return -ENOMEM;
+}
+
+static void storvsc_device_destroy(struct scsi_device *sdevice)
+{
+ struct stor_mem_pools *memp = sdevice->hostdata;
+
+ mempool_destroy(memp->request_mempool);
+ kmem_cache_destroy(memp->request_pool);
+ kfree(memp);
+ sdevice->hostdata = NULL;
}
static int storvsc_device_configure(struct scsi_device *sdevice)
@@ -1031,19 +1075,13 @@ static int storvsc_remove(struct hv_device *dev)
{
struct storvsc_device *stor_device = hv_get_drvdata(dev);
struct Scsi_Host *host = stor_device->host;
- struct hv_host_device *host_dev = shost_priv(host);
scsi_remove_host(host);
scsi_host_put(host);
storvsc_dev_remove(dev);
- if (host_dev->request_pool) {
- mempool_destroy(host_dev->request_mempool);
- kmem_cache_destroy(host_dev->request_pool);
- host_dev->request_pool = NULL;
- host_dev->request_mempool = NULL;
- }
+
return 0;
}
@@ -1139,6 +1177,7 @@ static void storvsc_command_completion(struct hv_storvsc_request *request)
struct scsi_sense_hdr sense_hdr;
struct vmscsi_request *vm_srb;
struct storvsc_scan_work *wrk;
+ struct stor_mem_pools *memp = scmnd->device->hostdata;
vm_srb = &request->vstor_packet.vm_srb;
if (cmd_request->bounce_sgl_count) {
@@ -1201,7 +1240,7 @@ static void storvsc_command_completion(struct hv_storvsc_request *request)
scsi_done_fn(scmnd);
- mempool_free(cmd_request, host_dev->request_mempool);
+ mempool_free(cmd_request, memp->request_mempool);
}
static bool storvsc_check_scsi_cmd(struct scsi_cmnd *scmnd)
@@ -1236,6 +1275,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
struct scatterlist *sgl;
unsigned int sg_count = 0;
struct vmscsi_request *vm_srb;
+ struct stor_mem_pools *memp = scmnd->device->hostdata;
if (storvsc_check_scsi_cmd(scmnd) == false) {
scmnd->scsi_done(scmnd);
@@ -1253,7 +1293,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
request_size = sizeof(struct storvsc_cmd_request);
- cmd_request = mempool_alloc(host_dev->request_mempool,
+ cmd_request = mempool_alloc(memp->request_mempool,
GFP_ATOMIC);
if (!cmd_request)
return SCSI_MLQUEUE_DEVICE_BUSY;
@@ -1312,7 +1352,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
if (!cmd_request->bounce_sgl) {
scmnd->host_scribble = NULL;
mempool_free(cmd_request,
- host_dev->request_mempool);
+ memp->request_mempool);
return SCSI_MLQUEUE_HOST_BUSY;
}
@@ -1354,7 +1394,7 @@ retry_request:
destroy_bounce_buffer(cmd_request->bounce_sgl,
cmd_request->bounce_sgl_count);
- mempool_free(cmd_request, host_dev->request_mempool);
+ mempool_free(cmd_request, memp->request_mempool);
scmnd->host_scribble = NULL;
@@ -1372,6 +1412,7 @@ static struct scsi_host_template scsi_driver = {
.queuecommand = storvsc_queuecommand,
.eh_host_reset_handler = storvsc_host_reset_handler,
.slave_alloc = storvsc_device_alloc,
+ .slave_destroy = storvsc_device_destroy,
.slave_configure = storvsc_device_configure,
.cmd_per_lun = 1,
/* 64 max_queue * 1 target */
@@ -1413,7 +1454,6 @@ static int storvsc_probe(struct hv_device *device,
const struct hv_vmbus_device_id *dev_id)
{
int ret;
- int number = STORVSC_MIN_BUF_NR;
struct Scsi_Host *host;
struct hv_host_device *host_dev;
bool dev_is_ide = ((dev_id->driver_data == IDE_GUID) ? true : false);
@@ -1432,29 +1472,11 @@ static int storvsc_probe(struct hv_device *device,
host_dev->port = host->host_no;
host_dev->dev = device;
- host_dev->request_pool =
- kmem_cache_create(dev_name(&device->device),
- sizeof(struct storvsc_cmd_request), 0,
- SLAB_HWCACHE_ALIGN, NULL);
-
- if (!host_dev->request_pool) {
- scsi_host_put(host);
- return -ENOMEM;
- }
-
- host_dev->request_mempool = mempool_create(number, mempool_alloc_slab,
- mempool_free_slab,
- host_dev->request_pool);
-
- if (!host_dev->request_mempool) {
- ret = -ENOMEM;
- goto err_out0;
- }
stor_device = kzalloc(sizeof(struct storvsc_device), GFP_KERNEL);
if (!stor_device) {
ret = -ENOMEM;
- goto err_out1;
+ goto err_out0;
}
stor_device->destroy = false;
@@ -1466,7 +1488,7 @@ static int storvsc_probe(struct hv_device *device,
stor_device->port_number = host->host_no;
ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size);
if (ret)
- goto err_out2;
+ goto err_out1;
if (dev_is_ide)
storvsc_get_ide_info(device, &target, &path);
@@ -1486,7 +1508,7 @@ static int storvsc_probe(struct hv_device *device,
/* Register the HBA and start the scsi bus scan */
ret = scsi_add_host(host, &device->device);
if (ret != 0)
- goto err_out3;
+ goto err_out2;
if (!dev_is_ide) {
scsi_scan_host(host);
@@ -1495,28 +1517,24 @@ static int storvsc_probe(struct hv_device *device,
ret = scsi_add_device(host, 0, target, 0);
if (ret) {
scsi_remove_host(host);
- goto err_out3;
+ goto err_out2;
}
return 0;
-err_out3:
+err_out2:
/*
* Once we have connected with the host, we would need to
* to invoke storvsc_dev_remove() to rollback this state and
* this call also frees up the stor_device; hence the jump around
- * err_out2 label.
+ * err_out1 label.
*/
storvsc_dev_remove(device);
- goto err_out1;
-
-err_out2:
- kfree(stor_device);
+ goto err_out0;
err_out1:
- mempool_destroy(host_dev->request_mempool);
+ kfree(stor_device);
err_out0:
- kmem_cache_destroy(host_dev->request_pool);
scsi_host_put(host);
return ret;
}
--
1.7.4.1
^ permalink raw reply related
* [PATCH 4/5] Staging: hv: storvsc: Fix a bug in copy_from_bounce_buffer()
From: K. Y. Srinivasan @ 2011-12-01 12:59 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1322744360-13067-1-git-send-email-kys@microsoft.com>
Fix a bug in copy_from_bounce_buffer().
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/staging/hv/storvsc_drv.c | 24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/hv/storvsc_drv.c b/drivers/staging/hv/storvsc_drv.c
index 8dafe52..c22de06 100644
--- a/drivers/staging/hv/storvsc_drv.c
+++ b/drivers/staging/hv/storvsc_drv.c
@@ -880,7 +880,8 @@ cleanup:
/* Assume the original sgl has enough room */
static unsigned int copy_from_bounce_buffer(struct scatterlist *orig_sgl,
struct scatterlist *bounce_sgl,
- unsigned int orig_sgl_count)
+ unsigned int orig_sgl_count,
+ unsigned int bounce_sgl_count)
{
int i;
int j = 0;
@@ -921,6 +922,24 @@ static unsigned int copy_from_bounce_buffer(struct scatterlist *orig_sgl,
kunmap_atomic((void *)bounce_addr, KM_IRQ0);
j++;
+ /*
+ * It is possible that the number of elements
+ * in the bounce buffer may not be equal to
+ * the number of elements in the original
+ * scatter list. Handle this correctly.
+ */
+
+ if (j == bounce_sgl_count) {
+ /*
+ * We are done; cleanup and return.
+ */
+ kunmap_atomic((void *)(dest_addr -
+ orig_sgl[i].offset),
+ KM_IRQ0);
+ local_irq_restore(flags);
+ return total_copied;
+ }
+
/* if we need to use another bounce buffer */
if (destlen || i != orig_sgl_count - 1)
bounce_addr =
@@ -1126,7 +1145,8 @@ static void storvsc_command_completion(struct hv_storvsc_request *request)
if (vm_srb->data_in == READ_TYPE)
copy_from_bounce_buffer(scsi_sglist(scmnd),
cmd_request->bounce_sgl,
- scsi_sg_count(scmnd));
+ scsi_sg_count(scmnd),
+ cmd_request->bounce_sgl_count);
destroy_bounce_buffer(cmd_request->bounce_sgl,
cmd_request->bounce_sgl_count);
}
--
1.7.4.1
^ permalink raw reply related
* [PATCH 3/5] Staging: hv: storvsc: Fix a bug in storvsc_command_completion()
From: K. Y. Srinivasan @ 2011-12-01 12:59 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1322744360-13067-1-git-send-email-kys@microsoft.com>
Fix a bug in storvsc_command_completion() that leaks memory when scatter/gather
lists are used on the "write" side.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/staging/hv/storvsc_drv.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/hv/storvsc_drv.c b/drivers/staging/hv/storvsc_drv.c
index 14ecb69..8dafe52 100644
--- a/drivers/staging/hv/storvsc_drv.c
+++ b/drivers/staging/hv/storvsc_drv.c
@@ -1123,13 +1123,12 @@ static void storvsc_command_completion(struct hv_storvsc_request *request)
vm_srb = &request->vstor_packet.vm_srb;
if (cmd_request->bounce_sgl_count) {
- if (vm_srb->data_in == READ_TYPE) {
+ if (vm_srb->data_in == READ_TYPE)
copy_from_bounce_buffer(scsi_sglist(scmnd),
cmd_request->bounce_sgl,
scsi_sg_count(scmnd));
- destroy_bounce_buffer(cmd_request->bounce_sgl,
+ destroy_bounce_buffer(cmd_request->bounce_sgl,
cmd_request->bounce_sgl_count);
- }
}
/*
--
1.7.4.1
^ permalink raw reply related
* [PATCH 2/5] Staging: hv: storvsc: Cleanup storvsc_device_alloc()
From: K. Y. Srinivasan @ 2011-12-01 12:59 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1322744360-13067-1-git-send-email-kys@microsoft.com>
The code in storvsc_device_alloc() is not needed as this would be
done by default. Get rid of it. We still keep the function as we use
this hook to allocate per-LUN memory pools in a later patch.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/staging/hv/storvsc_drv.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/drivers/staging/hv/storvsc_drv.c b/drivers/staging/hv/storvsc_drv.c
index 9153f98..14ecb69 100644
--- a/drivers/staging/hv/storvsc_drv.c
+++ b/drivers/staging/hv/storvsc_drv.c
@@ -790,11 +790,6 @@ static void storvsc_get_ide_info(struct hv_device *dev, int *target, int *path)
static int storvsc_device_alloc(struct scsi_device *sdevice)
{
- /*
- * This enables luns to be located sparsely. Otherwise, we may not
- * discovered them.
- */
- sdevice->sdev_bflags |= BLIST_SPARSELUN | BLIST_LARGELUN;
return 0;
}
--
1.7.4.1
^ permalink raw reply related
* [PATCH 1/5] Staging: hv: storvsc: Disable clustering
From: K. Y. Srinivasan @ 2011-12-01 12:59 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1322744315-13017-1-git-send-email-kys@microsoft.com>
Disable clustering, since the host side on Hyper-V requires that
each I/O element not exceed the page size. As part of this
cleanup, get rid of the function to merge bvecs, as the primary
reason for this function was to avoid having an element exceed
the page size.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/staging/hv/storvsc_drv.c | 18 +-----------------
1 files changed, 1 insertions(+), 17 deletions(-)
diff --git a/drivers/staging/hv/storvsc_drv.c b/drivers/staging/hv/storvsc_drv.c
index 0245143..9153f98 100644
--- a/drivers/staging/hv/storvsc_drv.c
+++ b/drivers/staging/hv/storvsc_drv.c
@@ -798,13 +798,6 @@ static int storvsc_device_alloc(struct scsi_device *sdevice)
return 0;
}
-static int storvsc_merge_bvec(struct request_queue *q,
- struct bvec_merge_data *bmd, struct bio_vec *bvec)
-{
- /* checking done by caller. */
- return bvec->bv_len;
-}
-
static int storvsc_device_configure(struct scsi_device *sdevice)
{
scsi_adjust_queue_depth(sdevice, MSG_SIMPLE_TAG,
@@ -812,8 +805,6 @@ static int storvsc_device_configure(struct scsi_device *sdevice)
blk_queue_max_segment_size(sdevice->request_queue, PAGE_SIZE);
- blk_queue_merge_bvec(sdevice->request_queue, storvsc_merge_bvec);
-
blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
return 0;
@@ -1375,14 +1366,7 @@ static struct scsi_host_template scsi_driver = {
/* no use setting to 0 since ll_blk_rw reset it to 1 */
/* currently 32 */
.sg_tablesize = MAX_MULTIPAGE_BUFFER_COUNT,
- /*
- * ENABLE_CLUSTERING allows mutiple physically contig bio_vecs to merge
- * into 1 sg element. If set, we must limit the max_segment_size to
- * PAGE_SIZE, otherwise we may get 1 sg element that represents
- * multiple
- */
- /* physically contig pfns (ie sg[x].length > PAGE_SIZE). */
- .use_clustering = ENABLE_CLUSTERING,
+ .use_clustering = DISABLE_CLUSTERING,
/* Make sure we dont get a sg segment crosses a page boundary */
.dma_boundary = PAGE_SIZE-1,
};
--
1.7.4.1
^ permalink raw reply related
* [PATCH 0000/0005] Staging: hv: storvsc cleanup
From: K. Y. Srinivasan @ 2011-12-01 12:58 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
Cc: K. Y. Srinivasan
Cleanup storvsc driver based on review comments from James.
1) While the existing per (virtual) HBA memory pools mechanism
does address the deadlock issues raised earlier since for IDE
devices there is a HBA per device and presently only IDE devices
can be drain devices, here we implement a generic per-LUN memory
pools mechanism.
2) Fix a couple of bugs in the bounce buffer handling code (now that I
have been able to trigger these code paths!)
3) Cleanup some unnecessary code.
With this, I believe I have addressed all of the review comments I have received
on this driver. I will submit a separate patch for moving the driver out of staging
(that includes these patches).
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Avi Kivity @ 2011-12-01 11:11 UTC (permalink / raw)
To: Raghavendra K T
Cc: Peter Zijlstra, Virtualization, H. Peter Anvin,
Stefano Stabellini, Jeremy Fitzhardinge, Dave Jiang, KVM, x86,
Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
Srivatsa Vaddagiri, Xen, Sasha Levin, Sedat Dilek,
Thomas Gleixner, Yinghai Lu, Greg Kroah-Hartman, LKML,
Dave Hansen, Suzuki Poulose
In-Reply-To: <20111130085959.23386.69166.sendpatchset@oc5400248562.ibm.com>
On 11/30/2011 10:59 AM, Raghavendra K T wrote:
> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU.
>
> Qemu needs a corresponding patch to pass up the presence of this feature to
> guest via cpuid. Patch to qemu will be sent separately.
>
> There is no Xen/KVM hypercall interface to await kick from.
The hypercall needs to be documented in
Documentation/virtual/kvm/hypercalls.txt.
Have you tested it on AMD machines? There are some differences in the
hypercall infrastructure there.
> /* This indicates that the new set of kvmclock msrs
> * are available. The use of 0x11 and 0x12 is deprecated
> */
> #define KVM_FEATURE_CLOCKSOURCE2 3
> #define KVM_FEATURE_ASYNC_PF 4
> #define KVM_FEATURE_STEAL_TIME 5
> +#define KVM_FEATURE_KICK_VCPU 6
Documentation/virtual/kvm/cpuid.txt.
>
> /* The last 8 bits are used to indicate how to interpret the flags field
> * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..6e1c8b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_XSAVE:
> case KVM_CAP_ASYNC_PF:
> case KVM_CAP_GET_TSC_KHZ:
> + case KVM_CAP_KICK_VCPU:
This is redundant with the feature bit? In general, KVM_CAP is for the
host API, while KVM_FEATURE is for the guest API.
>
> +/*
> + * kvm_pv_kick_cpu_op: Kick a vcpu.
> + *
> + * @cpu - vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
> +{
> + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
There is no guarantee that guest cpu numbers match host vcpu numbers.
Use APIC IDs, and kvm_apic_match_dest().
> + struct kvm_mp_state mp_state;
> +
> + mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
> + if (vcpu) {
> + vcpu->kicked = 1;
> + /* Ensure kicked is always set before wakeup */
> + barrier();
> + }
> + kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);
This must only be called from the vcpu thread.
> + kvm_vcpu_kick(vcpu);
> +}
> +
>
> struct kvm_vcpu_arch arch;
> +
> + /*
> + * blocked vcpu wakes up by checking this flag set by unlocker.
> + */
> + int kicked;
>
Write only variable.
An alternative approach is to use an MSR protocol like x2apic ICR.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply
* Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Ian Campbell @ 2011-12-01 10:34 UTC (permalink / raw)
To: Arnd Bergmann
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Pawel Moll, kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201111301815.01297.arnd@arndb.de>
On Wed, 2011-11-30 at 18:15 +0000, Arnd Bergmann wrote:
> On Wednesday 30 November 2011, Ian Campbell wrote:
> > On Wed, 2011-11-30 at 14:32 +0000, Arnd Bergmann wrote:
> > > On Wednesday 30 November 2011, Ian Campbell wrote:
> > > What I suggested to the KVM developers is to start out with the
> > > vexpress platform, but then generalize it to the point where it fits
> > > your needs. All hardware that one expects a guest to have (GIC, timer,
> > > ...) will still show up in the same location as on a real vexpress,
> > > while anything that makes no sense or is better paravirtualized (LCD,
> > > storage, ...) just becomes optional and has to be described in the
> > > device tree if it's actually there.
> >
> > That's along the lines of what I was thinking as well.
> >
> > The DT contains the address of GIC, timer etc as well right? So at least
> > in principal we needn't provide e.g. the GIC at the same address as any
> > real platform but in practice I expect we will.
>
> Yes.
>
> > In principal we could also offer the user options as to which particular
> > platform a guest looks like.
>
> At least when using a qemu based simulation. Most platforms have some
> characteristics that are not meaningful in a classic virtualization
> scenario, but it would certainly be helpful to use the virtualization
> extensions to run a kernel that was built for a particular platform
> faster than with pure qemu, when you want to test that kernel image.
>
> It has been suggested in the past that it would be nice to run the
> guest kernel built for the same platform as the host kernel by
> default, but I think it would be much better to have just one
> platform that we end up using for guests on any host platform,
> unless there is a strong reason to do otherwise.
Yes, I agree, certainly that is what we were planning to target in the
first instance. Doing this means that we can get away with minimal
emulation of actual hardware, relying instead on PV drivers or hardware
virtualisation features.
Supporting specific board platforms as guests would be nice to have
eventually. We would need to do more emulation (e.g. running qemu as a
device model) for that case.
> There is also ongoing restructuring in the ARM Linux kernel to
> allow running the same kernel binary on multiple platforms. While
> there is still a lot of work to be done, you should assume that
> we will finish it before you see lots of users in production, there
> is no need to plan for the current one-kernel-per-board case.
We were absolutely banking on targeting the results of this work, so
that's good ;-)
Ian.
^ permalink raw reply
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Michael S. Tsirkin @ 2011-12-01 10:26 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
In-Reply-To: <1322726977.3259.3.camel@lappy>
On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > > >
> > > > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > > same/worse for virtio-blk.
> > > > > >
> > > > > > I know they were introduced to allow adding very large bufs.
> > > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > > benefit the most from indirect bufs?
> > > > > >
> > > > >
> > > > > Indirects are really for block devices with many spindles, since there
> > > > > the limiting factor is the number of requests in flight. Network
> > > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > > size and use direct buffers there (so the ring size more or less
> > > > > corresponds to the buffer size).
> > > > >
> > > >
> > > > I did some testing of indirect descriptors under different workloads.
> > >
> > > MST and I discussed getting clever with dynamic limits ages ago, but it
> > > was down low on the TODO list. Thanks for diving into this...
> > >
> > > AFAICT, if the ring never fills, direct is optimal. When the ring
> > > fills, indirect is optimal (we're better to queue now than later).
> > >
> > > Why not something simple, like a threshold which drops every time we
> > > fill the ring?
> > >
> > > struct vring_virtqueue
> > > {
> > > ...
> > > int indirect_thresh;
> > > ...
> > > }
> > >
> > > virtqueue_add_buf_gfp()
> > > {
> > > ...
> > >
> > > if (vq->indirect &&
> > > (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> > > return indirect()
> > > ...
> > >
> > > if (vq->num_free < out + in) {
> > > if (vq->indirect && vq->indirect_thresh > 0)
> > > vq->indirect_thresh--;
> > >
> > > ...
> > > }
> > >
> > > Too dumb?
> > >
> > > Cheers,
> > > Rusty.
> >
> > We'll presumably need some logic to increment is back,
> > to account for random workload changes.
> > Something like slow start?
>
> We can increment it each time the queue was less than 10% full, it
> should act like slow start, no?
No, we really shouldn't get an empty ring as long as things behave
well. What I meant is something like:
#define VIRTIO_DECREMENT 2
#define VIRTIO_INCREMENT 1
if (vq->num_free < out + in) {
if (vq->indirect && vq->indirect_thresh > VIRTIO_DECREMENT)
vq->indirect_thresh /= VIRTIO_DECREMENT;
} else {
if (vq->indirect_thresh < vq->num)
vq->indirect_thresh += VIRTIO_INCREMENT;
}
So we try to avoid indirect but the moment there's no space, we decrease
the threshold drastically. If you make the increment/decrement module
parameters it's easy to try different values.
> --
>
> Sasha.
^ permalink raw reply
* Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Ian Campbell @ 2011-12-01 10:26 UTC (permalink / raw)
To: Stefano Stabellini
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Arnd Bergmann, Pawel Moll, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <alpine.DEB.2.00.1111301820290.31179@kaball-desktop>
On Wed, 2011-11-30 at 18:32 +0000, Stefano Stabellini wrote:
> On Wed, 30 Nov 2011, Arnd Bergmann wrote:
>
> > KVM and Xen at least both fall into the single-return-value category,
> > so we should be able to agree on a calling conventions. KVM does not
> > have an hcall API on ARM yet, and I see no reason not to use the
> > same implementation that you have in the Xen guest.
> >
> > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > file into a common asm/hypercall.h and submit it for review to the
> > arm kernel list?
>
> Sure, I can do that.
> Usually the hypercall calling convention is very hypervisor specific,
> but if it turns out that we have the same requirements I happy to design
> a common interface.
I expect the only real decision to be made is hypercall page vs. raw hvc
instruction.
The page was useful on x86 where there is a variety of instructions
which could be used (at least for PV there was systenter/syscall/int, I
think vmcall instruction differs between AMD and Intel also) and gives
some additional flexibility. It's hard to predict but I don't think I'd
expect that to be necessary on ARM.
Another reason for having a hypercall page instead of a raw instruction
might be wanting to support 32 bit guests (from ~today) on a 64 bit
hypervisor in the future and perhaps needing to do some shimming/arg
translation. It would be better to aim for having the interface just be
32/64 agnostic but mistakes do happen.
Ian.
^ permalink raw reply
* Re: virtio-scsi spec (was Re: [PATCH] Add virtio-scsi to the virtio spec)
From: Hannes Reinecke @ 2011-12-01 9:52 UTC (permalink / raw)
To: Paolo Bonzini
Cc: LKML, linux-scsi, virtualization, Stefan Hajnoczi,
Michael S. Tsirkin
In-Reply-To: <4ED65BA4.3000003@redhat.com>
On 11/30/2011 05:36 PM, Paolo Bonzini wrote:
> On 11/30/2011 03:17 PM, Hannes Reinecke wrote:
>>> seg_max is the maximum number of segments that can be in a
>>> command. A bidirectional command can include seg_max input
>>> segments and seg_max output segments.
>>>
>> I would like to have the other request_queue limitations exposed
>> here, too.
>> Most notably we're missing the maximum size of an individual segment
>> and the maximum size of the overall I/O request.
>
> The virtio transport does not put any limit, as far as I know.
>
Virtio doesn't, but the underlying device/driver might.
And if we don't expose these values we cannot format the request correctly.
>> As this is the host specification I really would like to see an host
>> identifier somewhere in there.
>> Otherwise we won't be able to reliably identify a virtio SCSI host.
>
> I thought about it, but I couldn't figure out exactly how to use it. If
> it's just allocating 64 bits in the configuration space (with the
> stipulation that they could be zero), let's do it now. Otherwise a
> controlq command is indeed better, and it can come later.
>
> But even if it's just a 64-bit value, then: 1) where would you place it
> in sysfs for userspace? I can make up a random name, but existing user
> tools won't find it and that's against the design of virtio-scsi. 2) How
> would it be encoded as a transport ID? Is it FC, or firewire, or SAS, or
> what?
>
I was thinking of something along the lines of the TransportID as
defined in SPC.
Main idea is to have a unique ID by which we can identify a given
virtio-scsi host. Admittedly it might not be useful in general, so it
might be an idea to delegate this to another controlq command.
>> Plus you can't calculate the ITL nexus information, making
>> Persistent Reservations impossible.
>
> They are not impossible, only some features such as SPEC_I_PT. If you
> use NPIV or iSCSI in the host, then the persistent reservations will
> already get the correct initiator port. If not, much more work is needed.
>
Yes, for a a shared (physical) SCSI host persistent reservations will be
tricky.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply
* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-12-01 9:09 UTC (permalink / raw)
To: Ohad Ben-Cohen, dhowells, paulmck
Cc: virtualization, linux-kernel, linux-arm-kernel, kvm
In-Reply-To: <CAK=Wgbb5M62LgXY-sn7R6YautHxwqyDbPTeXYJxgmrS9fPXR9A@mail.gmail.com>
On Thu, Dec 01, 2011 at 08:14:26AM +0200, Ohad Ben-Cohen wrote:
> On Thu, Dec 1, 2011 at 1:13 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > For x86, stores into memory are ordered. So I think that yes, smp_XXX
> > can be selected at compile time.
>
> But then you can't use the same kernel image for both scenarios.
I was talking about virtio-pci. That always allocates the ring
in the normal memory.
> It won't take long until people will use virtio on ARM for both
> virtualization and for talking to devices, and having to rebuild the
> kernel for different use cases is nasty.
Yes, I understand that it's nasty.
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] Add virtio-scsi to the virtio spec
From: Paolo Bonzini @ 2011-12-01 8:55 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S. Tsirkin, LKML, Stefan Hajnoczi, virtualization
In-Reply-To: <87sjl5rnj6.fsf@rustcorp.com.au>
On 12/01/2011 04:14 AM, Rusty Russell wrote:
> I'd prefer to see the spec only cover things
> which are implemented and tested, otherwise the risk of a flaw in the
> spec is really high in my experience.
In general I agree, and I did that for virtio-specific things such as
the eventq and the configuration space. This is also why I don't want
to add untested controlq requests that people suggested.
However, there's tension between this and providing a complete SCSI
transport. SCSI is roughly defined as a set of RPC interfaces ("Send
command", "Abort task", etc.) and transports provide the RPC protocol.
The SCSI command set changes relatively often, but the RPC interfaces
are pretty stable. This stability limits the risk, and having a mapping
for all interfaces also makes future changes less likely.
> Comments below:
>
>> num_queues is the total number of virtqueues exposed by the
>> device. The driver is free to use only one request queue, or
>> it can use more to achieve better performance.
>
> s/total number of virtqueues/total number of request virtqueues/ ?
Ok.
>> max_channel, max_target and max_lun can be used by the driver
>> as hints for scanning the logical units on the host. In the
>> current version of the spec, they will always be respectively
>> 0, 255 and 16383.
>
> s/hints for scanning/hints to constrain scanning/ ? (I assume).
Yes.
>> The driver queues requests to an arbitrary request queue, and they are
>> used by the device on that same queue. In this version of the spec,
>> if a driver uses more than one queue it is the responsibility of the
>> driver to ensure strict request ordering; commands placed on different
>> queue will be consumed with no order constraints.
>
> Suggest simplification of second sentence:
>
> It is the responsibility of the driver to ensure strict request
> ordering; commands placed on different queues will be consumed with no
> order constraints.
Agreed.
>> Task_attr, prio and crn should be left to zero: command priority
>> is explicitly not supported by this version of the device;
>> task_attr defines the task attribute as in the table above, but
>> all task attributes may be mapped to SIMPLE by the device; crn
>> may also be provided by clients, but is generally expected to be
>> 0. The maximum CRN value defined by the protocol is 255, since
>> CRN is stored in an 8-bit integer.
>
> Be braver in your language please. It helps poor implementers who are
> already confused by learning SCSI and virtio:
>
> Task_attr, and prio must be zero.[1] task_attr defines the task
> attribute as in the table above, but all task attributes may be mapped
> to SIMPLE by the device; crn may also be provided by clients, but is
> generally expected to be 0.
>
> [1] Future extensions may use these fields.
>
> Is it useful for a driver to specify ordered (or other) modes, knowing
> it could be reduced to SIMPLE without it being aware? Or should we use
> feature bits to indicate what the device supports?
This is actually mandated by SCSI. (!) It defines all the modes, but
explicitly says that they can be reduced to SIMPLE.
>> Note that since ACA is not supported by this version of the
>> spec, VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.
>
> I think if you don't support ACA in the spec, don't define this. How
> will a driver author use this information?
I will remove the text, no one will notice. :)
However, leaving the #define is preferrable because it keeps the SCSI
transport complete. SCSI unfortunately is full of obsolete concepts
that no one implements but are still in the standard (and have funny
names: ACA stands for Auto Contingent Allegiance). Fallbacks are
allowed and indeed defined by the standard, but an implementation is
still supposed to provide the "concepts". You can see this everywhere
in drivers/target.
>> struct virtio_scsi_ctrl_an {
>> u32 type;
>> u8 lun[8];
>> u32 event_requested;
>> u32 event_actual;
>> u8 response;
>> }
>
> With all these structures, you might want a comment indicating the
> read-only and write-only (from the device POV) parts of the struct, eg:
>
> struct virtio_scsi_ctrl_an {
> // Read-only part
> u32 type;
> u8 lun[8];
> u32 event_requested;
> // Write-only part
> u32 event_actual;
> u8 response;
> }
(Very) good idea.
Paolo
^ permalink raw reply
* Re: virtio-scsi spec (was Re: [PATCH] Add virtio-scsi to the virtio spec)
From: Paolo Bonzini @ 2011-12-01 8:49 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Michael S. Tsirkin, LKML, Stefan Hajnoczi, virtualization
In-Reply-To: <4ED74E54.3070303@suse.de>
On 12/01/2011 10:52 AM, Hannes Reinecke wrote:
>>>>
>>> I would like to have the other request_queue limitations exposed
>>> here, too.
>>> Most notably we're missing the maximum size of an individual segment
>>> and the maximum size of the overall I/O request.
>>
>> The virtio transport does not put any limit, as far as I know.
>>
> Virtio doesn't, but the underlying device/driver might.
> And if we don't expose these values we cannot format the request correctly.
These limits should be per target/LUN, so it seems like material for
another controlq command when the need arises. For now, I'd really
prefer to have the spec match the implementation (plus a few SAM
bogosities).
Paolo
^ permalink raw reply
* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-12-01 8:12 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <87zkfdrpn8.fsf@rustcorp.com.au>
On Thu, Dec 01, 2011 at 12:58:59PM +1030, Rusty Russell wrote:
> On Thu, 1 Dec 2011 01:13:07 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > For x86, stores into memory are ordered. So I think that yes, smp_XXX
> > can be selected at compile time.
> >
> > So let's forget the virtio strangeness for a minute,
>
> Hmm, we got away with light barriers because we knew we were not
> *really* talking to a device. But now with virtio-mmio, turns out we
> are :)
You think virtio-mmio this issue too? It's reported on remoteproc...
> I'm really tempted to revert d57ed95 for 3.2, and we can revisit this
> optimization later if it proves worthwhile.
>
> Thoughts?
> Rusty.
Generally it does seem the best we can do for 3.2.
Given it's rc3, I'd be a bit wary of introducing regressions - I'll try
to find some real setups (as in - not my laptop) to run some benchmarks
on, to verify there's no major problem.
I hope I can report on this in about a week from now - want to hold onto this meanwhile?
Further, if we do revert, need to remember to apply the following
beforehand, to avoid breaking virtio tool:
tools/virtio: implement mandatory barriers for x86
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 68b8b8d..1bf0e80 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -172,11 +172,18 @@ struct virtqueue {
#define MODULE_LICENSE(__MODULE_LICENSE_value) \
const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value
#define CONFIG_SMP
#if defined(__i386__) || defined(__x86_64__)
#define barrier() asm volatile("" ::: "memory")
#define mb() __sync_synchronize()
+#if defined(__i386__)
+#define wmb() mb()
+#define rmb() mb()
+#else
+#define wmb() asm volatile("sfence" ::: "memory")
+#define rmb() asm volatile("lfence" ::: "memory")
+#endif
#define smp_mb() mb()
# define smp_rmb() barrier()
--
MST
^ permalink raw reply related
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Sasha Levin @ 2011-12-01 8:09 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
In-Reply-To: <20111201075847.GA5479@redhat.com>
On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > >
> > > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > same/worse for virtio-blk.
> > > > >
> > > > > I know they were introduced to allow adding very large bufs.
> > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > benefit the most from indirect bufs?
> > > > >
> > > >
> > > > Indirects are really for block devices with many spindles, since there
> > > > the limiting factor is the number of requests in flight. Network
> > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > size and use direct buffers there (so the ring size more or less
> > > > corresponds to the buffer size).
> > > >
> > >
> > > I did some testing of indirect descriptors under different workloads.
> >
> > MST and I discussed getting clever with dynamic limits ages ago, but it
> > was down low on the TODO list. Thanks for diving into this...
> >
> > AFAICT, if the ring never fills, direct is optimal. When the ring
> > fills, indirect is optimal (we're better to queue now than later).
> >
> > Why not something simple, like a threshold which drops every time we
> > fill the ring?
> >
> > struct vring_virtqueue
> > {
> > ...
> > int indirect_thresh;
> > ...
> > }
> >
> > virtqueue_add_buf_gfp()
> > {
> > ...
> >
> > if (vq->indirect &&
> > (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> > return indirect()
> > ...
> >
> > if (vq->num_free < out + in) {
> > if (vq->indirect && vq->indirect_thresh > 0)
> > vq->indirect_thresh--;
> >
> > ...
> > }
> >
> > Too dumb?
> >
> > Cheers,
> > Rusty.
>
> We'll presumably need some logic to increment is back,
> to account for random workload changes.
> Something like slow start?
We can increment it each time the queue was less than 10% full, it
should act like slow start, no?
--
Sasha.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox