Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: Greg KH @ 2012-02-09 20:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: K. Y. Srinivasan, linux-kernel, devel, virtualization, ohering,
	hch, linux-scsi
In-Reply-To: <1328818718.2944.20.camel@dabdike.int.hansenpartnership.com>

On Thu, Feb 09, 2012 at 02:18:38PM -0600, James Bottomley wrote:
> On Thu, 2012-02-09 at 12:04 -0800, Greg KH wrote:
> > On Thu, Feb 09, 2012 at 12:04:11PM -0800, K. Y. Srinivasan wrote:
> > > The storage driver (storvsc_drv.c) handles all block storage devices
> > > assigned to Linux guests hosted on Hyper-V. This driver has been in the
> > > staging tree for a while and this patch moves it out of the staging area.
> > > As per Greg's recommendation, this patch makes no changes to the staging/hv
> > > directory. Once the driver moves out of staging, we will cleanup the
> > > staging/hv directory.
> > > 
> > > James was willing to apply this patch during the 3.3-rc phase and a decision
> > > was taken to defer this to 3.4 since Greg had queued up a bunch of storvsc
> > > patches for 3.4. Now that Greg has applied all of the pending storvsc patches,
> > > I am re-sending this patch to move this driver out of staging.
> > > 
> > > 
> > > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > > ---
> > >  drivers/scsi/Kconfig       |    7 +
> > >  drivers/scsi/Makefile      |    3 +
> > >  drivers/scsi/storvsc_drv.c | 1548 ++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 1558 insertions(+), 0 deletions(-)
> > >  create mode 100644 drivers/scsi/storvsc_drv.c
> > 
> > James, any objection to me applying this to the staging-next tree, and
> > at the same time, deleting this version of the driver from the
> > drivers/staging/hv/ directory?
> 
> Well, yes, it has the same build failure as the previous one and that
> will make it non bisectable.

Ah, that would make things a bit difficult, KY, please test your
patches...

^ permalink raw reply

* RE: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: KY Srinivasan @ 2012-02-09 20:21 UTC (permalink / raw)
  To: James Bottomley, Greg KH
  Cc: linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	virtualization@lists.osdl.org, ohering@suse.com,
	hch@infradead.org, linux-scsi@vger.kernel.org
In-Reply-To: <1328818718.2944.20.camel@dabdike.int.hansenpartnership.com>



> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> Sent: Thursday, February 09, 2012 3:19 PM
> To: Greg KH
> Cc: KY Srinivasan; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; hch@infradead.org; linux-
> scsi@vger.kernel.org
> Subject: Re: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the
> staging area
> 
> On Thu, 2012-02-09 at 12:04 -0800, Greg KH wrote:
> > On Thu, Feb 09, 2012 at 12:04:11PM -0800, K. Y. Srinivasan wrote:
> > > The storage driver (storvsc_drv.c) handles all block storage devices
> > > assigned to Linux guests hosted on Hyper-V. This driver has been in the
> > > staging tree for a while and this patch moves it out of the staging area.
> > > As per Greg's recommendation, this patch makes no changes to the
> staging/hv
> > > directory. Once the driver moves out of staging, we will cleanup the
> > > staging/hv directory.
> > >
> > > James was willing to apply this patch during the 3.3-rc phase and a decision
> > > was taken to defer this to 3.4 since Greg had queued up a bunch of storvsc
> > > patches for 3.4. Now that Greg has applied all of the pending storvsc patches,
> > > I am re-sending this patch to move this driver out of staging.
> > >
> > >
> > > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > > ---
> > >  drivers/scsi/Kconfig       |    7 +
> > >  drivers/scsi/Makefile      |    3 +
> > >  drivers/scsi/storvsc_drv.c | 1548
> ++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 1558 insertions(+), 0 deletions(-)
> > >  create mode 100644 drivers/scsi/storvsc_drv.c
> >
> > James, any objection to me applying this to the staging-next tree, and
> > at the same time, deleting this version of the driver from the
> > drivers/staging/hv/ directory?
> 
> Well, yes, it has the same build failure as the previous one and that
> will make it non bisectable.

James,

What build failure is this. I just pulled down Greg's tree earlier today and applied this
patch and built it.

Regards,

K. Y 
 


^ permalink raw reply

* Re: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: James Bottomley @ 2012-02-09 20:18 UTC (permalink / raw)
  To: Greg KH
  Cc: K. Y. Srinivasan, linux-kernel, devel, virtualization, ohering,
	hch, linux-scsi
In-Reply-To: <20120209200404.GA8816@kroah.com>

On Thu, 2012-02-09 at 12:04 -0800, Greg KH wrote:
> On Thu, Feb 09, 2012 at 12:04:11PM -0800, K. Y. Srinivasan wrote:
> > The storage driver (storvsc_drv.c) handles all block storage devices
> > assigned to Linux guests hosted on Hyper-V. This driver has been in the
> > staging tree for a while and this patch moves it out of the staging area.
> > As per Greg's recommendation, this patch makes no changes to the staging/hv
> > directory. Once the driver moves out of staging, we will cleanup the
> > staging/hv directory.
> > 
> > James was willing to apply this patch during the 3.3-rc phase and a decision
> > was taken to defer this to 3.4 since Greg had queued up a bunch of storvsc
> > patches for 3.4. Now that Greg has applied all of the pending storvsc patches,
> > I am re-sending this patch to move this driver out of staging.
> > 
> > 
> > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > ---
> >  drivers/scsi/Kconfig       |    7 +
> >  drivers/scsi/Makefile      |    3 +
> >  drivers/scsi/storvsc_drv.c | 1548 ++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 1558 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/scsi/storvsc_drv.c
> 
> James, any objection to me applying this to the staging-next tree, and
> at the same time, deleting this version of the driver from the
> drivers/staging/hv/ directory?

Well, yes, it has the same build failure as the previous one and that
will make it non bisectable.

James

^ permalink raw reply

* [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: K. Y. Srinivasan @ 2012-02-09 20:04 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, jbottomley,
	hch, linux-scsi
  Cc: K. Y. Srinivasan

The storage driver (storvsc_drv.c) handles all block storage devices
assigned to Linux guests hosted on Hyper-V. This driver has been in the
staging tree for a while and this patch moves it out of the staging area.
As per Greg's recommendation, this patch makes no changes to the staging/hv
directory. Once the driver moves out of staging, we will cleanup the
staging/hv directory.

James was willing to apply this patch during the 3.3-rc phase and a decision
was taken to defer this to 3.4 since Greg had queued up a bunch of storvsc
patches for 3.4. Now that Greg has applied all of the pending storvsc patches,
I am re-sending this patch to move this driver out of staging.


Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/scsi/Kconfig       |    7 +
 drivers/scsi/Makefile      |    3 +
 drivers/scsi/storvsc_drv.c | 1548 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1558 insertions(+), 0 deletions(-)
 create mode 100644 drivers/scsi/storvsc_drv.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 06ea3bc..4910269 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -662,6 +662,13 @@ config VMWARE_PVSCSI
 	  To compile this driver as a module, choose M here: the
 	  module will be called vmw_pvscsi.
 
+config HYPERV_STORAGE
+	tristate "Microsoft Hyper-V virtual storage driver"
+	depends on SCSI && HYPERV
+	default HYPERV
+	help
+	  Select this option to enable the Hyper-V virtual storage driver.
+
 config LIBFC
 	tristate "LibFC module"
 	select SCSI_FC_ATTRS
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 2b88749..e4c1a69 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -142,6 +142,7 @@ obj-$(CONFIG_SCSI_BNX2_ISCSI)	+= libiscsi.o bnx2i/
 obj-$(CONFIG_BE2ISCSI)		+= libiscsi.o be2iscsi/
 obj-$(CONFIG_SCSI_PMCRAID)	+= pmcraid.o
 obj-$(CONFIG_VMWARE_PVSCSI)	+= vmw_pvscsi.o
+obj-$(CONFIG_HYPERV_STORAGE)	+= hv_storvsc.o
 
 obj-$(CONFIG_ARM)		+= arm/
 
@@ -170,6 +171,8 @@ scsi_mod-$(CONFIG_SCSI_PROC_FS)	+= scsi_proc.o
 scsi_mod-y			+= scsi_trace.o
 scsi_mod-$(CONFIG_PM)		+= scsi_pm.o
 
+hv_storvsc-y			:= storvsc_drv.o
+
 scsi_tgt-y			+= scsi_tgt_lib.o scsi_tgt_if.o
 
 sd_mod-objs	:= sd.o
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
new file mode 100644
index 0000000..695ffc3
--- /dev/null
+++ b/drivers/scsi/storvsc_drv.c
@@ -0,0 +1,1548 @@
+/*
+ * Copyright (c) 2009, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Haiyang Zhang <haiyangz@microsoft.com>
+ *   Hank Janssen  <hjanssen@microsoft.com>
+ *   K. Y. Srinivasan <kys@microsoft.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/string.h>
+#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/hyperv.h>
+#include <linux/mempool.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_tcq.h>
+#include <scsi/scsi_eh.h>
+#include <scsi/scsi_devinfo.h>
+#include <scsi/scsi_dbg.h>
+
+/*
+ * All wire protocol details (storage protocol between the guest and the host)
+ * are consolidated here.
+ *
+ * Begin protocol definitions.
+ */
+
+/*
+ * Version history:
+ * V1 Beta: 0.1
+ * V1 RC < 2008/1/31: 1.0
+ * V1 RC > 2008/1/31:  2.0
+ * Win7: 4.2
+ */
+
+#define VMSTOR_CURRENT_MAJOR  4
+#define VMSTOR_CURRENT_MINOR  2
+
+
+/*  Packet structure describing virtual storage requests. */
+enum vstor_packet_operation {
+	VSTOR_OPERATION_COMPLETE_IO		= 1,
+	VSTOR_OPERATION_REMOVE_DEVICE		= 2,
+	VSTOR_OPERATION_EXECUTE_SRB		= 3,
+	VSTOR_OPERATION_RESET_LUN		= 4,
+	VSTOR_OPERATION_RESET_ADAPTER		= 5,
+	VSTOR_OPERATION_RESET_BUS		= 6,
+	VSTOR_OPERATION_BEGIN_INITIALIZATION	= 7,
+	VSTOR_OPERATION_END_INITIALIZATION	= 8,
+	VSTOR_OPERATION_QUERY_PROTOCOL_VERSION	= 9,
+	VSTOR_OPERATION_QUERY_PROPERTIES	= 10,
+	VSTOR_OPERATION_ENUMERATE_BUS		= 11,
+	VSTOR_OPERATION_MAXIMUM			= 11
+};
+
+/*
+ * Platform neutral description of a scsi request -
+ * this remains the same across the write regardless of 32/64 bit
+ * note: it's patterned off the SCSI_PASS_THROUGH structure
+ */
+#define STORVSC_MAX_CMD_LEN			0x10
+#define STORVSC_SENSE_BUFFER_SIZE		0x12
+#define STORVSC_MAX_BUF_LEN_WITH_PADDING	0x14
+
+struct vmscsi_request {
+	u16 length;
+	u8 srb_status;
+	u8 scsi_status;
+
+	u8  port_number;
+	u8  path_id;
+	u8  target_id;
+	u8  lun;
+
+	u8  cdb_length;
+	u8  sense_info_length;
+	u8  data_in;
+	u8  reserved;
+
+	u32 data_transfer_length;
+
+	union {
+		u8 cdb[STORVSC_MAX_CMD_LEN];
+		u8 sense_data[STORVSC_SENSE_BUFFER_SIZE];
+		u8 reserved_array[STORVSC_MAX_BUF_LEN_WITH_PADDING];
+	};
+} __attribute((packed));
+
+
+/*
+ * This structure is sent during the intialization phase to get the different
+ * properties of the channel.
+ */
+struct vmstorage_channel_properties {
+	u16 protocol_version;
+	u8  path_id;
+	u8 target_id;
+
+	/* Note: port number is only really known on the client side */
+	u32  port_number;
+	u32  flags;
+	u32   max_transfer_bytes;
+
+	/*
+	 * This id is unique for each channel and will correspond with
+	 * vendor specific data in the inquiry data.
+	 */
+
+	u64  unique_id;
+} __packed;
+
+/*  This structure is sent during the storage protocol negotiations. */
+struct vmstorage_protocol_version {
+	/* Major (MSW) and minor (LSW) version numbers. */
+	u16 major_minor;
+
+	/*
+	 * Revision number is auto-incremented whenever this file is changed
+	 * (See FILL_VMSTOR_REVISION macro above).  Mismatch does not
+	 * definitely indicate incompatibility--but it does indicate mismatched
+	 * builds.
+	 * This is only used on the windows side. Just set it to 0.
+	 */
+	u16 revision;
+} __packed;
+
+/* Channel Property Flags */
+#define STORAGE_CHANNEL_REMOVABLE_FLAG		0x1
+#define STORAGE_CHANNEL_EMULATED_IDE_FLAG	0x2
+
+struct vstor_packet {
+	/* Requested operation type */
+	enum vstor_packet_operation operation;
+
+	/*  Flags - see below for values */
+	u32 flags;
+
+	/* Status of the request returned from the server side. */
+	u32 status;
+
+	/* Data payload area */
+	union {
+		/*
+		 * Structure used to forward SCSI commands from the
+		 * client to the server.
+		 */
+		struct vmscsi_request vm_srb;
+
+		/* Structure used to query channel properties. */
+		struct vmstorage_channel_properties storage_channel_properties;
+
+		/* Used during version negotiations. */
+		struct vmstorage_protocol_version version;
+	};
+} __packed;
+
+/*
+ * Packet Flags:
+ *
+ * This flag indicates that the server should send back a completion for this
+ * packet.
+ */
+
+#define REQUEST_COMPLETION_FLAG	0x1
+
+/* Matches Windows-end */
+enum storvsc_request_type {
+	WRITE_TYPE = 0,
+	READ_TYPE,
+	UNKNOWN_TYPE,
+};
+
+/*
+ * SRB status codes and masks; a subset of the codes used here.
+ */
+
+#define SRB_STATUS_AUTOSENSE_VALID	0x80
+#define SRB_STATUS_INVALID_LUN	0x20
+#define SRB_STATUS_SUCCESS	0x01
+#define SRB_STATUS_ERROR	0x04
+
+/*
+ * This is the end of Protocol specific defines.
+ */
+
+
+/*
+ * We setup a mempool to allocate request structures for this driver
+ * on a per-lun basis. The following define specifies the number of
+ * elements in the pool.
+ */
+
+#define STORVSC_MIN_BUF_NR				64
+static int storvsc_ringbuffer_size = (20 * PAGE_SIZE);
+
+module_param(storvsc_ringbuffer_size, int, S_IRUGO);
+MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
+
+#define STORVSC_MAX_IO_REQUESTS				128
+
+/*
+ * In Hyper-V, each port/path/target maps to 1 scsi host adapter.  In
+ * reality, the path/target is not used (ie always set to 0) so our
+ * scsi host adapter essentially has 1 bus with 1 target that contains
+ * up to 256 luns.
+ */
+#define STORVSC_MAX_LUNS_PER_TARGET			64
+#define STORVSC_MAX_TARGETS				1
+#define STORVSC_MAX_CHANNELS				1
+
+
+
+struct storvsc_cmd_request {
+	struct list_head entry;
+	struct scsi_cmnd *cmd;
+
+	unsigned int bounce_sgl_count;
+	struct scatterlist *bounce_sgl;
+
+	struct hv_device *device;
+
+	/* Synchronize the request/response if needed */
+	struct completion wait_event;
+
+	unsigned char *sense_buffer;
+	struct hv_multipage_buffer data_buffer;
+	struct vstor_packet vstor_packet;
+};
+
+
+/* A storvsc device is a device object that contains a vmbus channel */
+struct storvsc_device {
+	struct hv_device *device;
+
+	bool	 destroy;
+	bool	 drain_notify;
+	atomic_t num_outstanding_req;
+	struct Scsi_Host *host;
+
+	wait_queue_head_t waiting_to_drain;
+
+	/*
+	 * Each unique Port/Path/Target represents 1 channel ie scsi
+	 * controller. In reality, the pathid, targetid is always 0
+	 * and the port is set by us
+	 */
+	unsigned int port_number;
+	unsigned char path_id;
+	unsigned char target_id;
+
+	/* Used for vsc/vsp channel reset process */
+	struct storvsc_cmd_request init_request;
+	struct storvsc_cmd_request reset_request;
+};
+
+struct stor_mem_pools {
+	struct kmem_cache *request_pool;
+	mempool_t *request_mempool;
+};
+
+struct hv_host_device {
+	struct hv_device *dev;
+	unsigned int port;
+	unsigned char path;
+	unsigned char target;
+};
+
+struct storvsc_scan_work {
+	struct work_struct work;
+	struct Scsi_Host *host;
+	uint lun;
+};
+
+static void storvsc_bus_scan(struct work_struct *work)
+{
+	struct storvsc_scan_work *wrk;
+	int id, order_id;
+
+	wrk = container_of(work, struct storvsc_scan_work, work);
+	for (id = 0; id < wrk->host->max_id; ++id) {
+		if (wrk->host->reverse_ordering)
+			order_id = wrk->host->max_id - id - 1;
+		else
+			order_id = id;
+
+		scsi_scan_target(&wrk->host->shost_gendev, 0,
+				order_id, SCAN_WILD_CARD, 1);
+	}
+	kfree(wrk);
+}
+
+static void storvsc_remove_lun(struct work_struct *work)
+{
+	struct storvsc_scan_work *wrk;
+	struct scsi_device *sdev;
+
+	wrk = container_of(work, struct storvsc_scan_work, work);
+	if (!scsi_host_get(wrk->host))
+		goto done;
+
+	sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
+
+	if (sdev) {
+		scsi_remove_device(sdev);
+		scsi_device_put(sdev);
+	}
+	scsi_host_put(wrk->host);
+
+done:
+	kfree(wrk);
+}
+
+/*
+ * Major/minor macros.  Minor version is in LSB, meaning that earlier flat
+ * version numbers will be interpreted as "0.x" (i.e., 1 becomes 0.1).
+ */
+
+static inline u16 storvsc_get_version(u8 major, u8 minor)
+{
+	u16 version;
+
+	version = ((major << 8) | minor);
+	return version;
+}
+
+/*
+ * We can get incoming messages from the host that are not in response to
+ * messages that we have sent out. An example of this would be messages
+ * received by the guest to notify dynamic addition/removal of LUNs. To
+ * deal with potential race conditions where the driver may be in the
+ * midst of being unloaded when we might receive an unsolicited message
+ * from the host, we have implemented a mechanism to gurantee sequential
+ * consistency:
+ *
+ * 1) Once the device is marked as being destroyed, we will fail all
+ *    outgoing messages.
+ * 2) We permit incoming messages when the device is being destroyed,
+ *    only to properly account for messages already sent out.
+ */
+
+static inline struct storvsc_device *get_out_stor_device(
+					struct hv_device *device)
+{
+	struct storvsc_device *stor_device;
+
+	stor_device = hv_get_drvdata(device);
+
+	if (stor_device && stor_device->destroy)
+		stor_device = NULL;
+
+	return stor_device;
+}
+
+
+static inline void storvsc_wait_to_drain(struct storvsc_device *dev)
+{
+	dev->drain_notify = true;
+	wait_event(dev->waiting_to_drain,
+		   atomic_read(&dev->num_outstanding_req) == 0);
+	dev->drain_notify = false;
+}
+
+static inline struct storvsc_device *get_in_stor_device(
+					struct hv_device *device)
+{
+	struct storvsc_device *stor_device;
+
+	stor_device = hv_get_drvdata(device);
+
+	if (!stor_device)
+		goto get_in_err;
+
+	/*
+	 * If the device is being destroyed; allow incoming
+	 * traffic only to cleanup outstanding requests.
+	 */
+
+	if (stor_device->destroy  &&
+		(atomic_read(&stor_device->num_outstanding_req) == 0))
+		stor_device = NULL;
+
+get_in_err:
+	return stor_device;
+
+}
+
+static void destroy_bounce_buffer(struct scatterlist *sgl,
+				  unsigned int sg_count)
+{
+	int i;
+	struct page *page_buf;
+
+	for (i = 0; i < sg_count; i++) {
+		page_buf = sg_page((&sgl[i]));
+		if (page_buf != NULL)
+			__free_page(page_buf);
+	}
+
+	kfree(sgl);
+}
+
+static int do_bounce_buffer(struct scatterlist *sgl, unsigned int sg_count)
+{
+	int i;
+
+	/* No need to check */
+	if (sg_count < 2)
+		return -1;
+
+	/* We have at least 2 sg entries */
+	for (i = 0; i < sg_count; i++) {
+		if (i == 0) {
+			/* make sure 1st one does not have hole */
+			if (sgl[i].offset + sgl[i].length != PAGE_SIZE)
+				return i;
+		} else if (i == sg_count - 1) {
+			/* make sure last one does not have hole */
+			if (sgl[i].offset != 0)
+				return i;
+		} else {
+			/* make sure no hole in the middle */
+			if (sgl[i].length != PAGE_SIZE || sgl[i].offset != 0)
+				return i;
+		}
+	}
+	return -1;
+}
+
+static struct scatterlist *create_bounce_buffer(struct scatterlist *sgl,
+						unsigned int sg_count,
+						unsigned int len,
+						int write)
+{
+	int i;
+	int num_pages;
+	struct scatterlist *bounce_sgl;
+	struct page *page_buf;
+	unsigned int buf_len = ((write == WRITE_TYPE) ? 0 : PAGE_SIZE);
+
+	num_pages = ALIGN(len, PAGE_SIZE) >> PAGE_SHIFT;
+
+	bounce_sgl = kcalloc(num_pages, sizeof(struct scatterlist), GFP_ATOMIC);
+	if (!bounce_sgl)
+		return NULL;
+
+	for (i = 0; i < num_pages; i++) {
+		page_buf = alloc_page(GFP_ATOMIC);
+		if (!page_buf)
+			goto cleanup;
+		sg_set_page(&bounce_sgl[i], page_buf, buf_len, 0);
+	}
+
+	return bounce_sgl;
+
+cleanup:
+	destroy_bounce_buffer(bounce_sgl, num_pages);
+	return NULL;
+}
+
+/* Assume the original sgl has enough room */
+static unsigned int copy_from_bounce_buffer(struct scatterlist *orig_sgl,
+					    struct scatterlist *bounce_sgl,
+					    unsigned int orig_sgl_count,
+					    unsigned int bounce_sgl_count)
+{
+	int i;
+	int j = 0;
+	unsigned long src, dest;
+	unsigned int srclen, destlen, copylen;
+	unsigned int total_copied = 0;
+	unsigned long bounce_addr = 0;
+	unsigned long dest_addr = 0;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (i = 0; i < orig_sgl_count; i++) {
+		dest_addr = (unsigned long)kmap_atomic(sg_page((&orig_sgl[i])),
+					KM_IRQ0) + orig_sgl[i].offset;
+		dest = dest_addr;
+		destlen = orig_sgl[i].length;
+
+		if (bounce_addr == 0)
+			bounce_addr =
+			(unsigned long)kmap_atomic(sg_page((&bounce_sgl[j])),
+							KM_IRQ0);
+
+		while (destlen) {
+			src = bounce_addr + bounce_sgl[j].offset;
+			srclen = bounce_sgl[j].length - bounce_sgl[j].offset;
+
+			copylen = min(srclen, destlen);
+			memcpy((void *)dest, (void *)src, copylen);
+
+			total_copied += copylen;
+			bounce_sgl[j].offset += copylen;
+			destlen -= copylen;
+			dest += copylen;
+
+			if (bounce_sgl[j].offset == bounce_sgl[j].length) {
+				/* full */
+				kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+				j++;
+
+				/*
+				 * It is possible that the number of elements
+				 * in the bounce buffer may not be equal to
+				 * the number of elements in the original
+				 * scatter list. Handle this correctly.
+				 */
+
+				if (j == bounce_sgl_count) {
+					/*
+					 * We are done; cleanup and return.
+					 */
+					kunmap_atomic((void *)(dest_addr -
+							orig_sgl[i].offset),
+							KM_IRQ0);
+					local_irq_restore(flags);
+					return total_copied;
+				}
+
+				/* if we need to use another bounce buffer */
+				if (destlen || i != orig_sgl_count - 1)
+					bounce_addr =
+					(unsigned long)kmap_atomic(
+					sg_page((&bounce_sgl[j])), KM_IRQ0);
+			} else if (destlen == 0 && i == orig_sgl_count - 1) {
+				/* unmap the last bounce that is < PAGE_SIZE */
+				kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+			}
+		}
+
+		kunmap_atomic((void *)(dest_addr - orig_sgl[i].offset),
+			      KM_IRQ0);
+	}
+
+	local_irq_restore(flags);
+
+	return total_copied;
+}
+
+/* Assume the bounce_sgl has enough room ie using the create_bounce_buffer() */
+static unsigned int copy_to_bounce_buffer(struct scatterlist *orig_sgl,
+					  struct scatterlist *bounce_sgl,
+					  unsigned int orig_sgl_count)
+{
+	int i;
+	int j = 0;
+	unsigned long src, dest;
+	unsigned int srclen, destlen, copylen;
+	unsigned int total_copied = 0;
+	unsigned long bounce_addr = 0;
+	unsigned long src_addr = 0;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (i = 0; i < orig_sgl_count; i++) {
+		src_addr = (unsigned long)kmap_atomic(sg_page((&orig_sgl[i])),
+				KM_IRQ0) + orig_sgl[i].offset;
+		src = src_addr;
+		srclen = orig_sgl[i].length;
+
+		if (bounce_addr == 0)
+			bounce_addr =
+			(unsigned long)kmap_atomic(sg_page((&bounce_sgl[j])),
+						KM_IRQ0);
+
+		while (srclen) {
+			/* assume bounce offset always == 0 */
+			dest = bounce_addr + bounce_sgl[j].length;
+			destlen = PAGE_SIZE - bounce_sgl[j].length;
+
+			copylen = min(srclen, destlen);
+			memcpy((void *)dest, (void *)src, copylen);
+
+			total_copied += copylen;
+			bounce_sgl[j].length += copylen;
+			srclen -= copylen;
+			src += copylen;
+
+			if (bounce_sgl[j].length == PAGE_SIZE) {
+				/* full..move to next entry */
+				kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+				j++;
+
+				/* if we need to use another bounce buffer */
+				if (srclen || i != orig_sgl_count - 1)
+					bounce_addr =
+					(unsigned long)kmap_atomic(
+					sg_page((&bounce_sgl[j])), KM_IRQ0);
+
+			} else if (srclen == 0 && i == orig_sgl_count - 1) {
+				/* unmap the last bounce that is < PAGE_SIZE */
+				kunmap_atomic((void *)bounce_addr, KM_IRQ0);
+			}
+		}
+
+		kunmap_atomic((void *)(src_addr - orig_sgl[i].offset), KM_IRQ0);
+	}
+
+	local_irq_restore(flags);
+
+	return total_copied;
+}
+
+static int storvsc_channel_init(struct hv_device *device)
+{
+	struct storvsc_device *stor_device;
+	struct storvsc_cmd_request *request;
+	struct vstor_packet *vstor_packet;
+	int ret, t;
+
+	stor_device = get_out_stor_device(device);
+	if (!stor_device)
+		return -ENODEV;
+
+	request = &stor_device->init_request;
+	vstor_packet = &request->vstor_packet;
+
+	/*
+	 * Now, initiate the vsc/vsp initialization protocol on the open
+	 * channel
+	 */
+	memset(request, 0, sizeof(struct storvsc_cmd_request));
+	init_completion(&request->wait_event);
+	vstor_packet->operation = VSTOR_OPERATION_BEGIN_INITIALIZATION;
+	vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+
+	ret = vmbus_sendpacket(device->channel, vstor_packet,
+			       sizeof(struct vstor_packet),
+			       (unsigned long)request,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+	    vstor_packet->status != 0)
+		goto cleanup;
+
+
+	/* reuse the packet for version range supported */
+	memset(vstor_packet, 0, sizeof(struct vstor_packet));
+	vstor_packet->operation = VSTOR_OPERATION_QUERY_PROTOCOL_VERSION;
+	vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+
+	vstor_packet->version.major_minor =
+		storvsc_get_version(VMSTOR_CURRENT_MAJOR, VMSTOR_CURRENT_MINOR);
+
+	/*
+	 * The revision number is only used in Windows; set it to 0.
+	 */
+	vstor_packet->version.revision = 0;
+
+	ret = vmbus_sendpacket(device->channel, vstor_packet,
+			       sizeof(struct vstor_packet),
+			       (unsigned long)request,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+	    vstor_packet->status != 0)
+		goto cleanup;
+
+
+	memset(vstor_packet, 0, sizeof(struct vstor_packet));
+	vstor_packet->operation = VSTOR_OPERATION_QUERY_PROPERTIES;
+	vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+	vstor_packet->storage_channel_properties.port_number =
+					stor_device->port_number;
+
+	ret = vmbus_sendpacket(device->channel, vstor_packet,
+			       sizeof(struct vstor_packet),
+			       (unsigned long)request,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+	    vstor_packet->status != 0)
+		goto cleanup;
+
+	stor_device->path_id = vstor_packet->storage_channel_properties.path_id;
+	stor_device->target_id
+		= vstor_packet->storage_channel_properties.target_id;
+
+	memset(vstor_packet, 0, sizeof(struct vstor_packet));
+	vstor_packet->operation = VSTOR_OPERATION_END_INITIALIZATION;
+	vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+
+	ret = vmbus_sendpacket(device->channel, vstor_packet,
+			       sizeof(struct vstor_packet),
+			       (unsigned long)request,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
+	    vstor_packet->status != 0)
+		goto cleanup;
+
+
+cleanup:
+	return ret;
+}
+
+
+static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request)
+{
+	struct scsi_cmnd *scmnd = cmd_request->cmd;
+	struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
+	void (*scsi_done_fn)(struct scsi_cmnd *);
+	struct scsi_sense_hdr sense_hdr;
+	struct vmscsi_request *vm_srb;
+	struct storvsc_scan_work *wrk;
+	struct stor_mem_pools *memp = scmnd->device->hostdata;
+
+	vm_srb = &cmd_request->vstor_packet.vm_srb;
+	if (cmd_request->bounce_sgl_count) {
+		if (vm_srb->data_in == READ_TYPE)
+			copy_from_bounce_buffer(scsi_sglist(scmnd),
+					cmd_request->bounce_sgl,
+					scsi_sg_count(scmnd),
+					cmd_request->bounce_sgl_count);
+		destroy_bounce_buffer(cmd_request->bounce_sgl,
+					cmd_request->bounce_sgl_count);
+	}
+
+	/*
+	 * If there is an error; offline the device since all
+	 * error recovery strategies would have already been
+	 * deployed on the host side.
+	 */
+	if (vm_srb->srb_status == SRB_STATUS_ERROR)
+		scmnd->result = DID_TARGET_FAILURE << 16;
+	else
+		scmnd->result = vm_srb->scsi_status;
+
+	/*
+	 * If the LUN is invalid; remove the device.
+	 */
+	if (vm_srb->srb_status == SRB_STATUS_INVALID_LUN) {
+		struct storvsc_device *stor_dev;
+		struct hv_device *dev = host_dev->dev;
+		struct Scsi_Host *host;
+
+		stor_dev = get_in_stor_device(dev);
+		host = stor_dev->host;
+
+		wrk = kmalloc(sizeof(struct storvsc_scan_work),
+				GFP_ATOMIC);
+		if (!wrk) {
+			scmnd->result = DID_TARGET_FAILURE << 16;
+		} else {
+			wrk->host = host;
+			wrk->lun = vm_srb->lun;
+			INIT_WORK(&wrk->work, storvsc_remove_lun);
+			schedule_work(&wrk->work);
+		}
+	}
+
+	if (scmnd->result) {
+		if (scsi_normalize_sense(scmnd->sense_buffer,
+				SCSI_SENSE_BUFFERSIZE, &sense_hdr))
+			scsi_print_sense_hdr("storvsc", &sense_hdr);
+	}
+
+	scsi_set_resid(scmnd,
+		cmd_request->data_buffer.len -
+		vm_srb->data_transfer_length);
+
+	scsi_done_fn = scmnd->scsi_done;
+
+	scmnd->host_scribble = NULL;
+	scmnd->scsi_done = NULL;
+
+	scsi_done_fn(scmnd);
+
+	mempool_free(cmd_request, memp->request_mempool);
+}
+
+static void storvsc_on_io_completion(struct hv_device *device,
+				  struct vstor_packet *vstor_packet,
+				  struct storvsc_cmd_request *request)
+{
+	struct storvsc_device *stor_device;
+	struct vstor_packet *stor_pkt;
+
+	stor_device = hv_get_drvdata(device);
+	stor_pkt = &request->vstor_packet;
+
+	/*
+	 * The current SCSI handling on the host side does
+	 * not correctly handle:
+	 * INQUIRY command with page code parameter set to 0x80
+	 * MODE_SENSE command with cmd[2] == 0x1c
+	 *
+	 * Setup srb and scsi status so this won't be fatal.
+	 * We do this so we can distinguish truly fatal failues
+	 * (srb status == 0x4) and off-line the device in that case.
+	 */
+
+	if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
+	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
+		vstor_packet->vm_srb.scsi_status = 0;
+		vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
+	}
+
+
+	/* Copy over the status...etc */
+	stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
+	stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status;
+	stor_pkt->vm_srb.sense_info_length =
+	vstor_packet->vm_srb.sense_info_length;
+
+	if (vstor_packet->vm_srb.scsi_status != 0 ||
+		vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS){
+		dev_warn(&device->device,
+			 "cmd 0x%x scsi status 0x%x srb status 0x%x\n",
+			 stor_pkt->vm_srb.cdb[0],
+			 vstor_packet->vm_srb.scsi_status,
+			 vstor_packet->vm_srb.srb_status);
+	}
+
+	if ((vstor_packet->vm_srb.scsi_status & 0xFF) == 0x02) {
+		/* CHECK_CONDITION */
+		if (vstor_packet->vm_srb.srb_status &
+			SRB_STATUS_AUTOSENSE_VALID) {
+			/* autosense data available */
+			dev_warn(&device->device,
+				 "stor pkt %p autosense data valid - len %d\n",
+				 request,
+				 vstor_packet->vm_srb.sense_info_length);
+
+			memcpy(request->sense_buffer,
+			       vstor_packet->vm_srb.sense_data,
+			       vstor_packet->vm_srb.sense_info_length);
+
+		}
+	}
+
+	stor_pkt->vm_srb.data_transfer_length =
+	vstor_packet->vm_srb.data_transfer_length;
+
+	storvsc_command_completion(request);
+
+	if (atomic_dec_and_test(&stor_device->num_outstanding_req) &&
+		stor_device->drain_notify)
+		wake_up(&stor_device->waiting_to_drain);
+
+
+}
+
+static void storvsc_on_receive(struct hv_device *device,
+			     struct vstor_packet *vstor_packet,
+			     struct storvsc_cmd_request *request)
+{
+	struct storvsc_scan_work *work;
+	struct storvsc_device *stor_device;
+
+	switch (vstor_packet->operation) {
+	case VSTOR_OPERATION_COMPLETE_IO:
+		storvsc_on_io_completion(device, vstor_packet, request);
+		break;
+
+	case VSTOR_OPERATION_REMOVE_DEVICE:
+	case VSTOR_OPERATION_ENUMERATE_BUS:
+		stor_device = get_in_stor_device(device);
+		work = kmalloc(sizeof(struct storvsc_scan_work), GFP_ATOMIC);
+		if (!work)
+			return;
+
+		INIT_WORK(&work->work, storvsc_bus_scan);
+		work->host = stor_device->host;
+		schedule_work(&work->work);
+		break;
+
+	default:
+		break;
+	}
+}
+
+static void storvsc_on_channel_callback(void *context)
+{
+	struct hv_device *device = (struct hv_device *)context;
+	struct storvsc_device *stor_device;
+	u32 bytes_recvd;
+	u64 request_id;
+	unsigned char packet[ALIGN(sizeof(struct vstor_packet), 8)];
+	struct storvsc_cmd_request *request;
+	int ret;
+
+
+	stor_device = get_in_stor_device(device);
+	if (!stor_device)
+		return;
+
+	do {
+		ret = vmbus_recvpacket(device->channel, packet,
+				       ALIGN(sizeof(struct vstor_packet), 8),
+				       &bytes_recvd, &request_id);
+		if (ret == 0 && bytes_recvd > 0) {
+
+			request = (struct storvsc_cmd_request *)
+					(unsigned long)request_id;
+
+			if ((request == &stor_device->init_request) ||
+			    (request == &stor_device->reset_request)) {
+
+				memcpy(&request->vstor_packet, packet,
+				       sizeof(struct vstor_packet));
+				complete(&request->wait_event);
+			} else {
+				storvsc_on_receive(device,
+						(struct vstor_packet *)packet,
+						request);
+			}
+		} else {
+			break;
+		}
+	} while (1);
+
+	return;
+}
+
+static int storvsc_connect_to_vsp(struct hv_device *device, u32 ring_size)
+{
+	struct vmstorage_channel_properties props;
+	int ret;
+
+	memset(&props, 0, sizeof(struct vmstorage_channel_properties));
+
+	ret = vmbus_open(device->channel,
+			 ring_size,
+			 ring_size,
+			 (void *)&props,
+			 sizeof(struct vmstorage_channel_properties),
+			 storvsc_on_channel_callback, device);
+
+	if (ret != 0)
+		return ret;
+
+	ret = storvsc_channel_init(device);
+
+	return ret;
+}
+
+static int storvsc_dev_remove(struct hv_device *device)
+{
+	struct storvsc_device *stor_device;
+	unsigned long flags;
+
+	stor_device = hv_get_drvdata(device);
+
+	spin_lock_irqsave(&device->channel->inbound_lock, flags);
+	stor_device->destroy = true;
+	spin_unlock_irqrestore(&device->channel->inbound_lock, flags);
+
+	/*
+	 * At this point, all outbound traffic should be disable. We
+	 * only allow inbound traffic (responses) to proceed so that
+	 * outstanding requests can be completed.
+	 */
+
+	storvsc_wait_to_drain(stor_device);
+
+	/*
+	 * Since we have already drained, we don't need to busy wait
+	 * as was done in final_release_stor_device()
+	 * Note that we cannot set the ext pointer to NULL until
+	 * we have drained - to drain the outgoing packets, we need to
+	 * allow incoming packets.
+	 */
+	spin_lock_irqsave(&device->channel->inbound_lock, flags);
+	hv_set_drvdata(device, NULL);
+	spin_unlock_irqrestore(&device->channel->inbound_lock, flags);
+
+	/* Close the channel */
+	vmbus_close(device->channel);
+
+	kfree(stor_device);
+	return 0;
+}
+
+static int storvsc_do_io(struct hv_device *device,
+			      struct storvsc_cmd_request *request)
+{
+	struct storvsc_device *stor_device;
+	struct vstor_packet *vstor_packet;
+	int ret = 0;
+
+	vstor_packet = &request->vstor_packet;
+	stor_device = get_out_stor_device(device);
+
+	if (!stor_device)
+		return -ENODEV;
+
+
+	request->device  = device;
+
+
+	vstor_packet->flags |= REQUEST_COMPLETION_FLAG;
+
+	vstor_packet->vm_srb.length = sizeof(struct vmscsi_request);
+
+
+	vstor_packet->vm_srb.sense_info_length = STORVSC_SENSE_BUFFER_SIZE;
+
+
+	vstor_packet->vm_srb.data_transfer_length =
+	request->data_buffer.len;
+
+	vstor_packet->operation = VSTOR_OPERATION_EXECUTE_SRB;
+
+	if (request->data_buffer.len) {
+		ret = vmbus_sendpacket_multipagebuffer(device->channel,
+				&request->data_buffer,
+				vstor_packet,
+				sizeof(struct vstor_packet),
+				(unsigned long)request);
+	} else {
+		ret = vmbus_sendpacket(device->channel, vstor_packet,
+			       sizeof(struct vstor_packet),
+			       (unsigned long)request,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+	}
+
+	if (ret != 0)
+		return ret;
+
+	atomic_inc(&stor_device->num_outstanding_req);
+
+	return ret;
+}
+
+static int storvsc_device_alloc(struct scsi_device *sdevice)
+{
+	struct stor_mem_pools *memp;
+	int number = STORVSC_MIN_BUF_NR;
+
+	memp = kzalloc(sizeof(struct stor_mem_pools), GFP_KERNEL);
+	if (!memp)
+		return -ENOMEM;
+
+	memp->request_pool =
+		kmem_cache_create(dev_name(&sdevice->sdev_dev),
+				sizeof(struct storvsc_cmd_request), 0,
+				SLAB_HWCACHE_ALIGN, NULL);
+
+	if (!memp->request_pool)
+		goto err0;
+
+	memp->request_mempool = mempool_create(number, mempool_alloc_slab,
+						mempool_free_slab,
+						memp->request_pool);
+
+	if (!memp->request_mempool)
+		goto err1;
+
+	sdevice->hostdata = memp;
+
+	return 0;
+
+err1:
+	kmem_cache_destroy(memp->request_pool);
+
+err0:
+	kfree(memp);
+	return -ENOMEM;
+}
+
+static void storvsc_device_destroy(struct scsi_device *sdevice)
+{
+	struct stor_mem_pools *memp = sdevice->hostdata;
+
+	mempool_destroy(memp->request_mempool);
+	kmem_cache_destroy(memp->request_pool);
+	kfree(memp);
+	sdevice->hostdata = NULL;
+}
+
+static int storvsc_device_configure(struct scsi_device *sdevice)
+{
+	scsi_adjust_queue_depth(sdevice, MSG_SIMPLE_TAG,
+				STORVSC_MAX_IO_REQUESTS);
+
+	blk_queue_max_segment_size(sdevice->request_queue, PAGE_SIZE);
+
+	blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
+
+	return 0;
+}
+
+static int storvsc_get_chs(struct scsi_device *sdev, struct block_device * bdev,
+			   sector_t capacity, int *info)
+{
+	sector_t nsect = capacity;
+	sector_t cylinders = nsect;
+	int heads, sectors_pt;
+
+	/*
+	 * We are making up these values; let us keep it simple.
+	 */
+	heads = 0xff;
+	sectors_pt = 0x3f;      /* Sectors per track */
+	sector_div(cylinders, heads * sectors_pt);
+	if ((sector_t)(cylinders + 1) * heads * sectors_pt < nsect)
+		cylinders = 0xffff;
+
+	info[0] = heads;
+	info[1] = sectors_pt;
+	info[2] = (int)cylinders;
+
+	return 0;
+}
+
+static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
+{
+	struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
+	struct hv_device *device = host_dev->dev;
+
+	struct storvsc_device *stor_device;
+	struct storvsc_cmd_request *request;
+	struct vstor_packet *vstor_packet;
+	int ret, t;
+
+
+	stor_device = get_out_stor_device(device);
+	if (!stor_device)
+		return FAILED;
+
+	request = &stor_device->reset_request;
+	vstor_packet = &request->vstor_packet;
+
+	init_completion(&request->wait_event);
+
+	vstor_packet->operation = VSTOR_OPERATION_RESET_BUS;
+	vstor_packet->flags = REQUEST_COMPLETION_FLAG;
+	vstor_packet->vm_srb.path_id = stor_device->path_id;
+
+	ret = vmbus_sendpacket(device->channel, vstor_packet,
+			       sizeof(struct vstor_packet),
+			       (unsigned long)&stor_device->reset_request,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+	if (ret != 0)
+		return FAILED;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+	if (t == 0)
+		return TIMEOUT_ERROR;
+
+
+	/*
+	 * At this point, all outstanding requests in the adapter
+	 * should have been flushed out and return to us
+	 */
+
+	return SUCCESS;
+}
+
+static bool storvsc_scsi_cmd_ok(struct scsi_cmnd *scmnd)
+{
+	bool allowed = true;
+	u8 scsi_op = scmnd->cmnd[0];
+
+	switch (scsi_op) {
+	/*
+	 * smartd sends this command and the host does not handle
+	 * this. So, don't send it.
+	 */
+	case SET_WINDOW:
+		scmnd->result = ILLEGAL_REQUEST << 16;
+		allowed = false;
+		break;
+	default:
+		break;
+	}
+	return allowed;
+}
+
+static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
+{
+	int ret;
+	struct hv_host_device *host_dev = shost_priv(host);
+	struct hv_device *dev = host_dev->dev;
+	struct storvsc_cmd_request *cmd_request;
+	unsigned int request_size = 0;
+	int i;
+	struct scatterlist *sgl;
+	unsigned int sg_count = 0;
+	struct vmscsi_request *vm_srb;
+	struct stor_mem_pools *memp = scmnd->device->hostdata;
+
+	if (!storvsc_scsi_cmd_ok(scmnd)) {
+		scmnd->scsi_done(scmnd);
+		return 0;
+	}
+
+	request_size = sizeof(struct storvsc_cmd_request);
+
+	cmd_request = mempool_alloc(memp->request_mempool,
+				       GFP_ATOMIC);
+
+	/*
+	 * We might be invoked in an interrupt context; hence
+	 * mempool_alloc() can fail.
+	 */
+	if (!cmd_request)
+		return SCSI_MLQUEUE_DEVICE_BUSY;
+
+	memset(cmd_request, 0, sizeof(struct storvsc_cmd_request));
+
+	/* Setup the cmd request */
+	cmd_request->cmd = scmnd;
+
+	scmnd->host_scribble = (unsigned char *)cmd_request;
+
+	vm_srb = &cmd_request->vstor_packet.vm_srb;
+
+
+	/* Build the SRB */
+	switch (scmnd->sc_data_direction) {
+	case DMA_TO_DEVICE:
+		vm_srb->data_in = WRITE_TYPE;
+		break;
+	case DMA_FROM_DEVICE:
+		vm_srb->data_in = READ_TYPE;
+		break;
+	default:
+		vm_srb->data_in = UNKNOWN_TYPE;
+		break;
+	}
+
+
+	vm_srb->port_number = host_dev->port;
+	vm_srb->path_id = scmnd->device->channel;
+	vm_srb->target_id = scmnd->device->id;
+	vm_srb->lun = scmnd->device->lun;
+
+	vm_srb->cdb_length = scmnd->cmd_len;
+
+	memcpy(vm_srb->cdb, scmnd->cmnd, vm_srb->cdb_length);
+
+	cmd_request->sense_buffer = scmnd->sense_buffer;
+
+
+	cmd_request->data_buffer.len = scsi_bufflen(scmnd);
+	if (scsi_sg_count(scmnd)) {
+		sgl = (struct scatterlist *)scsi_sglist(scmnd);
+		sg_count = scsi_sg_count(scmnd);
+
+		/* check if we need to bounce the sgl */
+		if (do_bounce_buffer(sgl, scsi_sg_count(scmnd)) != -1) {
+			cmd_request->bounce_sgl =
+				create_bounce_buffer(sgl, scsi_sg_count(scmnd),
+						     scsi_bufflen(scmnd),
+						     vm_srb->data_in);
+			if (!cmd_request->bounce_sgl) {
+				ret = SCSI_MLQUEUE_HOST_BUSY;
+				goto queue_error;
+			}
+
+			cmd_request->bounce_sgl_count =
+				ALIGN(scsi_bufflen(scmnd), PAGE_SIZE) >>
+					PAGE_SHIFT;
+
+			if (vm_srb->data_in == WRITE_TYPE)
+				copy_to_bounce_buffer(sgl,
+					cmd_request->bounce_sgl,
+					scsi_sg_count(scmnd));
+
+			sgl = cmd_request->bounce_sgl;
+			sg_count = cmd_request->bounce_sgl_count;
+		}
+
+		cmd_request->data_buffer.offset = sgl[0].offset;
+
+		for (i = 0; i < sg_count; i++)
+			cmd_request->data_buffer.pfn_array[i] =
+				page_to_pfn(sg_page((&sgl[i])));
+
+	} else if (scsi_sglist(scmnd)) {
+		cmd_request->data_buffer.offset =
+			virt_to_phys(scsi_sglist(scmnd)) & (PAGE_SIZE-1);
+		cmd_request->data_buffer.pfn_array[0] =
+			virt_to_phys(scsi_sglist(scmnd)) >> PAGE_SHIFT;
+	}
+
+	/* Invokes the vsc to start an IO */
+	ret = storvsc_do_io(dev, cmd_request);
+
+	if (ret == -EAGAIN) {
+		/* no more space */
+
+		if (cmd_request->bounce_sgl_count) {
+			destroy_bounce_buffer(cmd_request->bounce_sgl,
+					cmd_request->bounce_sgl_count);
+
+			ret = SCSI_MLQUEUE_DEVICE_BUSY;
+			goto queue_error;
+		}
+	}
+
+	return 0;
+
+queue_error:
+	mempool_free(cmd_request, memp->request_mempool);
+	scmnd->host_scribble = NULL;
+	return ret;
+}
+
+static struct scsi_host_template scsi_driver = {
+	.module	=		THIS_MODULE,
+	.name =			"storvsc_host_t",
+	.bios_param =		storvsc_get_chs,
+	.queuecommand =		storvsc_queuecommand,
+	.eh_host_reset_handler =	storvsc_host_reset_handler,
+	.slave_alloc =		storvsc_device_alloc,
+	.slave_destroy =	storvsc_device_destroy,
+	.slave_configure =	storvsc_device_configure,
+	.cmd_per_lun =		1,
+	/* 64 max_queue * 1 target */
+	.can_queue =		STORVSC_MAX_IO_REQUESTS*STORVSC_MAX_TARGETS,
+	.this_id =		-1,
+	/* no use setting to 0 since ll_blk_rw reset it to 1 */
+	/* currently 32 */
+	.sg_tablesize =		MAX_MULTIPAGE_BUFFER_COUNT,
+	.use_clustering =	DISABLE_CLUSTERING,
+	/* Make sure we dont get a sg segment crosses a page boundary */
+	.dma_boundary =		PAGE_SIZE-1,
+};
+
+enum {
+	SCSI_GUID,
+	IDE_GUID,
+};
+
+static const struct hv_vmbus_device_id id_table[] = {
+	/* SCSI guid */
+	{ VMBUS_DEVICE(0xd9, 0x63, 0x61, 0xba, 0xa1, 0x04, 0x29, 0x4d,
+		       0xb6, 0x05, 0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f)
+	  .driver_data = SCSI_GUID },
+	/* IDE guid */
+	{ VMBUS_DEVICE(0x32, 0x26, 0x41, 0x32, 0xcb, 0x86, 0xa2, 0x44,
+		       0x9b, 0x5c, 0x50, 0xd1, 0x41, 0x73, 0x54, 0xf5)
+	  .driver_data = IDE_GUID },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(vmbus, id_table);
+
+static int storvsc_probe(struct hv_device *device,
+			const struct hv_vmbus_device_id *dev_id)
+{
+	int ret;
+	struct Scsi_Host *host;
+	struct hv_host_device *host_dev;
+	bool dev_is_ide = ((dev_id->driver_data == IDE_GUID) ? true : false);
+	int target = 0;
+	struct storvsc_device *stor_device;
+
+	host = scsi_host_alloc(&scsi_driver,
+			       sizeof(struct hv_host_device));
+	if (!host)
+		return -ENOMEM;
+
+	host_dev = shost_priv(host);
+	memset(host_dev, 0, sizeof(struct hv_host_device));
+
+	host_dev->port = host->host_no;
+	host_dev->dev = device;
+
+
+	stor_device = kzalloc(sizeof(struct storvsc_device), GFP_KERNEL);
+	if (!stor_device) {
+		ret = -ENOMEM;
+		goto err_out0;
+	}
+
+	stor_device->destroy = false;
+	init_waitqueue_head(&stor_device->waiting_to_drain);
+	stor_device->device = device;
+	stor_device->host = host;
+	hv_set_drvdata(device, stor_device);
+
+	stor_device->port_number = host->host_no;
+	ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size);
+	if (ret)
+		goto err_out1;
+
+	host_dev->path = stor_device->path_id;
+	host_dev->target = stor_device->target_id;
+
+	/* max # of devices per target */
+	host->max_lun = STORVSC_MAX_LUNS_PER_TARGET;
+	/* max # of targets per channel */
+	host->max_id = STORVSC_MAX_TARGETS;
+	/* max # of channels */
+	host->max_channel = STORVSC_MAX_CHANNELS - 1;
+	/* max cmd length */
+	host->max_cmd_len = STORVSC_MAX_CMD_LEN;
+
+	/* Register the HBA and start the scsi bus scan */
+	ret = scsi_add_host(host, &device->device);
+	if (ret != 0)
+		goto err_out2;
+
+	if (!dev_is_ide) {
+		scsi_scan_host(host);
+	} else {
+		target = (device->dev_instance.b[5] << 8 |
+			 device->dev_instance.b[4]);
+		ret = scsi_add_device(host, 0, target, 0);
+		if (ret) {
+			scsi_remove_host(host);
+			goto err_out2;
+		}
+	}
+	return 0;
+
+err_out2:
+	/*
+	 * Once we have connected with the host, we would need to
+	 * to invoke storvsc_dev_remove() to rollback this state and
+	 * this call also frees up the stor_device; hence the jump around
+	 * err_out1 label.
+	 */
+	storvsc_dev_remove(device);
+	goto err_out0;
+
+err_out1:
+	kfree(stor_device);
+
+err_out0:
+	scsi_host_put(host);
+	return ret;
+}
+
+static int storvsc_remove(struct hv_device *dev)
+{
+	struct storvsc_device *stor_device = hv_get_drvdata(dev);
+	struct Scsi_Host *host = stor_device->host;
+
+	scsi_remove_host(host);
+	storvsc_dev_remove(dev);
+	scsi_host_put(host);
+
+	return 0;
+}
+
+static struct hv_driver storvsc_drv = {
+	.name = KBUILD_MODNAME,
+	.id_table = id_table,
+	.probe = storvsc_probe,
+	.remove = storvsc_remove,
+};
+
+static int __init storvsc_drv_init(void)
+{
+	u32 max_outstanding_req_per_channel;
+
+	/*
+	 * Divide the ring buffer data size (which is 1 page less
+	 * than the ring buffer size since that page is reserved for
+	 * the ring buffer indices) by the max request size (which is
+	 * vmbus_channel_packet_multipage_buffer + struct vstor_packet + u64)
+	 */
+	max_outstanding_req_per_channel =
+		((storvsc_ringbuffer_size - PAGE_SIZE) /
+		ALIGN(MAX_MULTIPAGE_BUFFER_PACKET +
+		sizeof(struct vstor_packet) + sizeof(u64),
+		sizeof(u64)));
+
+	if (max_outstanding_req_per_channel <
+	    STORVSC_MAX_IO_REQUESTS)
+		return -EINVAL;
+
+	return vmbus_driver_register(&storvsc_drv);
+}
+
+static void __exit storvsc_drv_exit(void)
+{
+	vmbus_driver_unregister(&storvsc_drv);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_VERSION(HV_DRV_VERSION);
+MODULE_DESCRIPTION("Microsoft Hyper-V virtual storage driver");
+module_init(storvsc_drv_init);
+module_exit(storvsc_drv_exit);
-- 
1.7.4.1


^ permalink raw reply related

* Re: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: Greg KH @ 2012-02-09 20:04 UTC (permalink / raw)
  To: K. Y. Srinivasan
  Cc: linux-kernel, devel, virtualization, ohering, jbottomley, hch,
	linux-scsi
In-Reply-To: <1328817851-13435-1-git-send-email-kys@microsoft.com>

On Thu, Feb 09, 2012 at 12:04:11PM -0800, K. Y. Srinivasan wrote:
> The storage driver (storvsc_drv.c) handles all block storage devices
> assigned to Linux guests hosted on Hyper-V. This driver has been in the
> staging tree for a while and this patch moves it out of the staging area.
> As per Greg's recommendation, this patch makes no changes to the staging/hv
> directory. Once the driver moves out of staging, we will cleanup the
> staging/hv directory.
> 
> James was willing to apply this patch during the 3.3-rc phase and a decision
> was taken to defer this to 3.4 since Greg had queued up a bunch of storvsc
> patches for 3.4. Now that Greg has applied all of the pending storvsc patches,
> I am re-sending this patch to move this driver out of staging.
> 
> 
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  drivers/scsi/Kconfig       |    7 +
>  drivers/scsi/Makefile      |    3 +
>  drivers/scsi/storvsc_drv.c | 1548 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1558 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/scsi/storvsc_drv.c

James, any objection to me applying this to the staging-next tree, and
at the same time, deleting this version of the driver from the
drivers/staging/hv/ directory?

thanks,

greg k-h

^ permalink raw reply

* RE: [PATCH 1/2] Staging: hv: Applied all the patches already in the staging queue
From: KY Srinivasan @ 2012-02-09 16:42 UTC (permalink / raw)
  To: Greg KH
  Cc: James Bottomley, Greg KH, linux-scsi@vger.kernel.org,
	ohering@suse.com, linux-kernel@vger.kernel.org, hch@infradead.org,
	virtualization@lists.osdl.org, devel@linuxdriverproject.org
In-Reply-To: <20120209162940.GA16395@kroah.com>



> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Thursday, February 09, 2012 11:30 AM
> To: KY Srinivasan
> Cc: James Bottomley; Greg KH; linux-scsi@vger.kernel.org; ohering@suse.com;
> linux-kernel@vger.kernel.org; hch@infradead.org; virtualization@lists.osdl.org;
> devel@linuxdriverproject.org
> Subject: Re: [PATCH 1/2] Staging: hv: Applied all the patches already in the staging
> queue
> 
> On Thu, Feb 09, 2012 at 03:31:50PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> > > Sent: Thursday, January 26, 2012 12:51 PM
> > > To: Greg KH
> > > Cc: KY Srinivasan; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org;
> > > virtualization@lists.osdl.org; ohering@suse.com; hch@infradead.org; linux-
> > > scsi@vger.kernel.org
> > > Subject: Re: [PATCH 1/2] Staging: hv: Applied all the patches already in the
> staging
> > > queue
> > >
> > > > Yes, I have pending patches for this driver in my "to-apply" queue, but
> > > > that is for 3.4, not now.
> > > >
> > > > So, how about I just apply those patches, and then apply a patch that
> > > > moves it out of staging into drivers/scsi/, and queue all of those up
> > > > for 3.4.
> > > >
> > > > James, any objection to that?
> > >
> > > Well, we were going to try to stage this driver for 3.3-rc.  If everyone
> > > agrees to push it back to 3.4, then this works for me.
> >
> > Greg,
> >
> > Now that you have applied all the pending patches for the Hyper-V storage
> > driver; do you want me to resend the patch that moved this driver out of
> staging?
> 
> Please do, I need James's ACK on it as well.

I will re-send the patch that James was about to apply a few weeks ago for 3.3-rc.
James, do you need something else to ACK this.

Regards,

K. Y



^ permalink raw reply

* Re: [PATCH 1/2] Staging: hv: Applied all the patches already in the staging queue
From: Greg KH @ 2012-02-09 16:29 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: James Bottomley, Greg KH, linux-scsi@vger.kernel.org,
	ohering@suse.com, linux-kernel@vger.kernel.org, hch@infradead.org,
	virtualization@lists.osdl.org, devel@linuxdriverproject.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B70C8A1@TK5EX14MBXC128.redmond.corp.microsoft.com>

On Thu, Feb 09, 2012 at 03:31:50PM +0000, KY Srinivasan wrote:
> 
> 
> > -----Original Message-----
> > From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> > Sent: Thursday, January 26, 2012 12:51 PM
> > To: Greg KH
> > Cc: KY Srinivasan; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> > virtualization@lists.osdl.org; ohering@suse.com; hch@infradead.org; linux-
> > scsi@vger.kernel.org
> > Subject: Re: [PATCH 1/2] Staging: hv: Applied all the patches already in the staging
> > queue
> > 
> > > Yes, I have pending patches for this driver in my "to-apply" queue, but
> > > that is for 3.4, not now.
> > >
> > > So, how about I just apply those patches, and then apply a patch that
> > > moves it out of staging into drivers/scsi/, and queue all of those up
> > > for 3.4.
> > >
> > > James, any objection to that?
> > 
> > Well, we were going to try to stage this driver for 3.3-rc.  If everyone
> > agrees to push it back to 3.4, then this works for me.
> 
> Greg,
> 
> Now that you have applied all the pending patches for the Hyper-V storage
> driver; do you want me to resend the patch that moved this driver out of staging?

Please do, I need James's ACK on it as well.

thanks,

greg k-h

^ permalink raw reply

* RE: [PATCH 1/2] Staging: hv: Applied all the patches already in the staging queue
From: KY Srinivasan @ 2012-02-09 15:31 UTC (permalink / raw)
  To: James Bottomley, Greg KH
  Cc: linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	virtualization@lists.osdl.org, ohering@suse.com,
	hch@infradead.org, linux-scsi@vger.kernel.org
In-Reply-To: <1327600244.6151.5.camel@dabdike.int.hansenpartnership.com>



> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> Sent: Thursday, January 26, 2012 12:51 PM
> To: Greg KH
> Cc: KY Srinivasan; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; hch@infradead.org; linux-
> scsi@vger.kernel.org
> Subject: Re: [PATCH 1/2] Staging: hv: Applied all the patches already in the staging
> queue
> 
> > Yes, I have pending patches for this driver in my "to-apply" queue, but
> > that is for 3.4, not now.
> >
> > So, how about I just apply those patches, and then apply a patch that
> > moves it out of staging into drivers/scsi/, and queue all of those up
> > for 3.4.
> >
> > James, any objection to that?
> 
> Well, we were going to try to stage this driver for 3.3-rc.  If everyone
> agrees to push it back to 3.4, then this works for me.

Greg,

Now that you have applied all the pending patches for the Hyper-V storage
driver; do you want me to resend the patch that moved this driver out of staging?

Regards,

K. Y


^ permalink raw reply

* Re: [PATCH 42/50] virtio_net:  use dev_hw_addr_random() instead of random_ether_addr()
From: Michael S. Tsirkin @ 2012-02-09  9:35 UTC (permalink / raw)
  To: Danny Kukawka
  Cc: netdev, Danny Kukawka, linux-kernel, virtualization,
	David S. Miller
In-Reply-To: <1328735457-29986-43-git-send-email-danny.kukawka@bisect.de>

On Wed, Feb 08, 2012 at 10:10:49PM +0100, Danny Kukawka wrote:
> Use dev_hw_addr_random() instead of calling random_ether_addr()
> to set addr_assign_type correctly to NET_ADDR_RANDOM.
> 
> Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4880aa8..69d36e1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1061,7 +1061,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (virtio_config_val_len(vdev, VIRTIO_NET_F_MAC,
>  				  offsetof(struct virtio_net_config, mac),
>  				  dev->dev_addr, dev->addr_len) < 0)
> -		random_ether_addr(dev->dev_addr);
> +		dev_hw_addr_random(dev, dev->dev_addr);
>  
>  	/* Set up our device-specific information */
>  	vi = netdev_priv(dev);
> -- 
> 1.7.7.3

^ permalink raw reply

* [PATCH 42/50] virtio_net: use dev_hw_addr_random() instead of random_ether_addr()
From: Danny Kukawka @ 2012-02-08 21:10 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, netdev, Danny Kukawka, linux-kernel,
	virtualization, David S. Miller
In-Reply-To: <1328735457-29986-1-git-send-email-danny.kukawka@bisect.de>

Use dev_hw_addr_random() instead of calling random_ether_addr()
to set addr_assign_type correctly to NET_ADDR_RANDOM.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
---
 drivers/net/virtio_net.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4880aa8..69d36e1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1061,7 +1061,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_config_val_len(vdev, VIRTIO_NET_F_MAC,
 				  offsetof(struct virtio_net_config, mac),
 				  dev->dev_addr, dev->addr_len) < 0)
-		random_ether_addr(dev->dev_addr);
+		dev_hw_addr_random(dev, dev->dev_addr);
 
 	/* Set up our device-specific information */
 	vi = netdev_priv(dev);
-- 
1.7.7.3

^ permalink raw reply related

* Re: Strange finding about kernel samepage merging
From: fluxion @ 2012-02-07  5:46 UTC (permalink / raw)
  To: Jidong Xiao; +Cc: linux-mm, virtualization
In-Reply-To: <CAG4AFWZGr8SQF0rV+iys04HWmQ5WEGvXNcSZ9qJ7Jj9+FRbjCg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1518 bytes --]

On Feb 6, 2012 10:14 PM, "Jidong Xiao" <jidong.xiao@gmail.com> wrote:
>
> On Mon, Feb 6, 2012 at 10:35 PM, Michael Roth <mdroth@linux.vnet.ibm.com>
wrote:
> > My guess is you end up with 2 copies of each page on the guest: the
copy in
> > the guest's page cache, and the copy in the buffer you allocated. From
the
> > perspective of the host this all looks like anonymous memory, so ksm
merges
> > the pages.
>
> Yes, the result definitely shows that there two copies. But I don't
> understand why there would be two copies. So whenever you allocate
> memory in a guest OS, you will always create two copies of the same
> memory?

Well, not just guests, hosts as well. Most operating systems will, by
default, cache the data read from disks in memory to speed up subsequent
access. In your case you're also creating a copy by allocating a second
buffer and storing the data there as well.

Ksm only merges anonymous pages, not disk/page cache, but since your
guest's pagecache looks like anonymous memory to the host, ksm is able to
merge the dupes.

>
> An interesting thing is, if I replace the posix_memalign() function
> with the malloc() function (See the original program, the commented
> line.) there would be only one copy, i.e., no merging happens,
> however, since I need to have some page-aligned memory, that's why I
> use posix_memalign().

Yup, ksm can only detect duplicate pages, so if your buffer isn't page
aligned it's unable to merge with the copy in the guest's page cache

>
> Regards
> Jidong
>

[-- Attachment #1.2: Type: text/html, Size: 1889 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: Strange finding about kernel samepage merging
From: Jidong Xiao @ 2012-02-07  4:13 UTC (permalink / raw)
  To: Michael Roth; +Cc: linux-mm, virtualization
In-Reply-To: <CANAOKxs8j2T2b0tKssFX9NeC1wyMqjLMQmgmRwMs9qvokYcW2w@mail.gmail.com>

On Mon, Feb 6, 2012 at 10:35 PM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> My guess is you end up with 2 copies of each page on the guest: the copy in
> the guest's page cache, and the copy in the buffer you allocated. From the
> perspective of the host this all looks like anonymous memory, so ksm merges
> the pages.

Yes, the result definitely shows that there two copies. But I don't
understand why there would be two copies. So whenever you allocate
memory in a guest OS, you will always create two copies of the same
memory?

An interesting thing is, if I replace the posix_memalign() function
with the malloc() function (See the original program, the commented
line.) there would be only one copy, i.e., no merging happens,
however, since I need to have some page-aligned memory, that's why I
use posix_memalign().

Regards
Jidong

^ permalink raw reply

* Re: Strange finding about kernel samepage merging
From: Michael Roth @ 2012-02-07  3:35 UTC (permalink / raw)
  To: Jidong Xiao; +Cc: linux-mm, virtualization
In-Reply-To: <CAG4AFWaXVEHP+YikRSyt8ky9XsiBnwQ3O94Bgc7-b7nYL_2PZQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 238 bytes --]

My guess is you end up with 2 copies of each page on the guest: the copy in
the guest's page cache, and the copy in the buffer you allocated. From the
perspective of the host this all looks like anonymous memory, so ksm merges
the pages.

[-- Attachment #1.2: Type: text/html, Size: 249 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3] Add virtio-scsi to the virtio spec
From: Rusty Russell @ 2012-02-06 23:36 UTC (permalink / raw)
  To: Paolo Bonzini, virtualization
  Cc: linux-scsi, LKML, Stefan Hajnoczi, Michael S. Tsirkin
In-Reply-To: <1328439694-18705-1-git-send-email-pbonzini@redhat.com>

On Sun,  5 Feb 2012 12:01:34 +0100, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hi Rusty,
> 
> here is the specification for a virtio-based SCSI host (controller, HBA,
> you name it) so that you can apply it to the spec document and publish it.
> 
> I changed the index from 7 to 8 to account for the rpmsg device,
> and added a feature bit to tell the guest in advance whether the host
> supports hotplug.  Otherwise there is no change from v2.
> 
> Paolo

Thanks, applied!

I will release a new draft version shortly.

Cheers,
Rusty.

^ permalink raw reply

* Strange finding about kernel samepage merging
From: Jidong Xiao @ 2012-02-06 22:44 UTC (permalink / raw)
  To: linux-mm; +Cc: virtualization

Hi,

This is a very very strange thing I have seen in Linux Kernel. I wrote
a simple program, all it does is to load a file into memory. This
programming is running on a virtual machine while linux-kvm is working
as the hypervisor. I enabled ksm in the hypervisor level, my host
machine was installed with a Opensuse11.4 while the guest OS is
Fedora14, the strange thing is, whenever I run following simple
program, the number exported by /sys/kernel/mm/ksm/page_sharing
increase dramatically, I mean, no matter what file I am loading, the
corresponding pages will always be merged.

Here is the simple program:

[root@fedora14 kernel]# cat testmkv.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int ae_load_file_to_memory(const char *filename, char **result)
{
       int size = 0;
       int ret;
       FILE *f = fopen(filename, "rb");
       if (f == NULL)
       {
               *result = NULL;
               return -1; // -1 means file opening fail
       }
       fseek(f, 0, SEEK_END);
       size = ftell(f);
       fseek(f, 0, SEEK_SET);
       ret = posix_memalign(result,4096,size+1);
//        *result = (char *)malloc(size+1);
       if (size != fread(*result, sizeof(char), size, f))
       {
               free(*result);
               return -2; // -2 means file reading fail
       }
       fclose(f);
       (*result)[size] = 0;
       return size;
}

int main()
{
       char *content;
       int size,pages;
       int read;
       struct timeval tb,ta;
       double tv;
       size = ae_load_file_to_memory("test.mkv", &content);
       if (size < 0)
       {
               puts("Error loading file");
               return 1;
       }
       sleep(150);
       return 0;

}

Here is my observation, before I run the program:

jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14539
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14539
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14540
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14540
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14540
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14540
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14540
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
14540

After I run the program (during the the sleeping time period and after
the program exits.)

jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
25526
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
32368
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
35066
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
38010
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
40410
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
43012
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
45562
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
47866
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
50072
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
52314
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54010
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54486
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54655
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54969
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54969
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54969
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54968
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54968
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54968
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54968
jxiao@yosemite:~> cat /sys/kernel/mm/ksm/pages_sharing
54968

The increased number pretty equals to the pages of the applicaiton,
i.e. test.mkv (file size, 158M). I just cannot understand who will
share pages with test.mkv, test.mkv is a special application, it's
unique, moreover, I tried many other files/applications, I mean, I
replaced test.mkv with many other files, including some windows
specific files such *.exe files, but I still saw the same result. How
could that happen??

If you need more information, just let me know. Thank you.

Regards

^ permalink raw reply

* [PATCH v3] Add virtio-scsi to the virtio spec
From: Paolo Bonzini @ 2012-02-05 11:01 UTC (permalink / raw)
  To: Rusty Russell, virtualization
  Cc: linux-scsi, LKML, Stefan Hajnoczi, Michael S. Tsirkin

Hi Rusty,

here is the specification for a virtio-based SCSI host (controller, HBA,
you name it) so that you can apply it to the spec document and publish it.

I changed the index from 7 to 8 to account for the rpmsg device,
and added a feature bit to tell the guest in advance whether the host
supports hotplug.  Otherwise there is no change from v2.

Paolo


--- virtio-spec.lyx.saved	2011-11-29 14:00:59.782659120 +0100
+++ virtio-spec.lyx	2012-02-05 11:57:34.691711427 +0100
@@ -56,6 +56,7 @@
 \html_math_output 0
 \html_css_as_file 0
 \html_be_strict false
+\author 1531152142 "pbonzini" 
 \end_header
 
 \begin_body
@@ -321,7 +322,7 @@
 
 \begin_layout Standard
 \begin_inset Tabular
-<lyxtabular version="3" rows="8" columns="3">
+<lyxtabular version="3" rows="9" columns="3">
 <features tabularvalignment="middle">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
@@ -530,8 +531,43 @@
 </cell>
 </row>
 <row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1328438958
+8
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322650855
+SCSI host
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322650861
+Appendix H
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
 <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
 \begin_inset Text
 
 \begin_layout Plain Layout
 9
@@ -6427,6 +6463,2188 @@
 \end_layout
 
 \begin_layout Chapter*
+
+\change_inserted 1531152142 1322571716
+Appendix H: SCSI Host Device
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1323090161
+The virtio SCSI host device groups together one or more virtual logical
+ units (such as disks), and allows communicating to them using the SCSI
+ protocol.
+ An instance of the device represents a SCSI host to which many targets
+ and LUNs are attached.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322571726
+The virtio SCSI device services two kinds of requests:
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322571726
+command requests for a logical unit;
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322571726
+task management functions related to a logical unit, target or command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322571726
+The device is also able to send out notifications about added and removed
+ logical units.
+ Together, these capabilities provide a SCSI transport protocol that uses
+ virtqueues as the transfer medium.
+ In the transport protocol, the virtio driver acts as the initiator, while
+ the virtio SCSI host provides one or more targets that receive and process
+ the requests.
+ 
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted 1531152142 1322571697
+Configuration
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322651166
+Subsystem
+\begin_inset space ~
+\end_inset
+
+Device
+\begin_inset space ~
+\end_inset
+
+ID 7
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571777
+Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571813
+Feature
+\begin_inset space ~
+\end_inset
+
+bits
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1328438975
+VIRTIO_SCSI_F_INOUT
+\begin_inset space ~
+\end_inset
+
+(0) A single request can include both read-only and write-only data buffers.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1328439056
+VIRTIO_SCSI_F_HOTPLUG
+\begin_inset space ~
+\end_inset
+
+(1) The host should enable hot-plug/hot-unplug of new LUNs and targets on
+ the SCSI bus.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322651190
+Device
+\begin_inset space ~
+\end_inset
+
+configuration
+\begin_inset space ~
+\end_inset
+
+layout All fields of this configuration are always available.
+ 
+\series bold
+sense_size
+\series default
+ and 
+\series bold
+cdb_size
+\series default
+ are writable by the guest.
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322571919
+
+struct virtio_scsi_config {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575810
+
+    u32 num_queues;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322934205
+
+    u32 seg_max;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322934257
+
+    u32 max_sectors;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322934350
+
+    u32 cmd_per_lun;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575811
+
+    u32 event_info_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575811
+
+    u32 sense_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575812
+
+    u32 cdb_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322576412
+
+    u16 max_channel;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322576413
+
+    u16 max_target;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322576414
+
+    u32 max_lun;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322571878
+
+};
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322724976
+num_queues is the total number of request virtqueues exposed by the device.
+ The driver is free to use only one request queue, or it can use more to
+ achieve better performance.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322934245
+seg_max is the maximum number of segments that can be in a command.
+ A bidirectional command can include 
+\series bold
+seg_max
+\series default
+ input segments and 
+\series bold
+seg_max 
+\series default
+output segments.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322934739
+max_sectors is a hint to the guest about the maximum transfer size it should
+ use.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322934845
+cmd_per_lun is a hint to the guest about the maximum number of linked commands
+ it should send to one LUN.
+ The actual value to be used is the minimum of 
+\series bold
+cmd_per_lun
+\series default
+ and the virtqueue size.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571959
+event_info_size is the maximum size that the device will fill for buffers
+ that the driver places in the eventq.
+ The driver should always put buffers at least of this size.
+ It is written by the device depending on the set of negotated features.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571997
+sense_size is the maximum size of the sense data that the device will write.
+ The default value is written by the device and will always be 96, but the
+ driver can modify it.
+ It is restored to the default when the device is reset.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322575599
+cdb_size is the maximum size of the CDB that the driver will write.
+ The default value is written by the device and will always be 32, but the
+ driver can likewise modify it.
+ It is restored to the default when the device is reset.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322724997
+max_channel,
+\begin_inset space \space{}
+\end_inset
+
+max_target
+\series medium
+
+\begin_inset space ~
+\end_inset
+
+and
+\begin_inset space \space{}
+\end_inset
+
+
+\series default
+max_lun can be used by the driver as hints to constrain scanning the logical
+ units on the host.
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Section*
+
+\change_inserted 1531152142 1322571959
+Device Initialization
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572042
+The initialization routine should first of all discover the device's virtqueues.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572054
+If the driver uses the eventq, it should then place at least a buffer in
+ the eventq.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572042
+The driver can immediately issue requests (for example, INQUIRY or REPORT
+ LUNS) or task management functions (for example, I_T RESET).
+ 
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted 1531152142 1322572348
+Device Operation: request queues
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322725031
+The driver queues requests to an arbitrary request queue, and they are used
+ by the device on that same queue.
+ It is the responsibility of the driver to ensure strict request ordering
+ for commands placed on different queues, because they will be consumed
+ with no order constraints.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572395
+Requests have the following format: 
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572526
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725766
+
+struct virtio_scsi_req_cmd {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725783
+
+    // Read-only
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572417
+
+    u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572419
+
+    u64 id;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572420
+
+    u8 task_attr;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572422
+
+    u8 prio;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572425
+
+    u8 crn;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572426
+
+    char cdb[cdb_size];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572410
+
+    char dataout[];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725797
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572429
+
+    u32 sense_len;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572430
+
+    u32 residual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572432
+
+    u16 status_qualifier;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572434
+
+    u8 status;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572435
+
+    u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572437
+
+    u8 sense[sense_size];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572439
+
+    char datain[];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572471
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572410
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572476
+
+/* command-specific response values */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572480
+
+#define VIRTIO_SCSI_S_OK                0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572483
+
+#define VIRTIO_SCSI_S_OVERRUN           1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572489
+
+#define VIRTIO_SCSI_S_ABORTED           2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572491
+
+#define VIRTIO_SCSI_S_BAD_TARGET        3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572494
+
+#define VIRTIO_SCSI_S_RESET             4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665324
+
+#define VIRTIO_SCSI_S_BUSY              5
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665325
+
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665326
+
+#define VIRTIO_SCSI_S_TARGET_FAILURE    7
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665326
+
+#define VIRTIO_SCSI_S_NEXUS_FAILURE     8
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665326
+
+#define VIRTIO_SCSI_S_FAILURE           9
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572502
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572507
+
+/* task_attr */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572510
+
+#define VIRTIO_SCSI_S_SIMPLE            0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572513
+
+#define VIRTIO_SCSI_S_ORDERED           1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572516
+
+#define VIRTIO_SCSI_S_HEAD              2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572504
+
+#define VIRTIO_SCSI_S_ACA               3
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322725062
+The 
+\series bold
+lun
+\series default
+ field addresses a target and logical unit in the virtio-scsi device's SCSI
+ domain.
+ The only supported format for the LUN field is: first byte set to 1, second
+ byte set to target, third and fourth byte representing a single level LUN
+ structure, followed by four zero bytes.
+ With this representation, a virtio-scsi device can serve up to 256 targets
+ and 16384 LUNs per target.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572562
+The 
+\series bold
+id
+\series default
+ field is the command identifier (
+\begin_inset Quotes eld
+\end_inset
+
+tag
+\begin_inset Quotes erd
+\end_inset
+
+).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322725122
+
+\series bold
+task_attr
+\series default
+, 
+\series bold
+prio
+\series default
+ and 
+\series bold
+crn
+\series default
+ should be left to zero.
+ 
+\series bold
+task_attr
+\series default
+ defines the task attribute as in the table above, but all task attributes
+ may be mapped to SIMPLE by the device; 
+\series bold
+crn
+\series default
+ may also be provided by clients, but is generally expected to be 0.
+ The maximum CRN value defined by the protocol is 255, since CRN is stored
+ in an 8-bit integer.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572647
+All of these fields are defined in SAM.
+ They are always read-only, as are the 
+\series bold
+cdb
+\series default
+ and 
+\series bold
+dataout
+\series default
+ field.
+ The 
+\series bold
+cdb_size
+\series default
+ is taken from the configuration space.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572919
+
+\series bold
+sense
+\series default
+ and subsequent fields are always write-only.
+ The 
+\series bold
+sense_len
+\series default
+ field indicates the number of bytes actually written to the sense buffer.
+ The 
+\series bold
+residual
+\series default
+ field indicates the residual size, calculated as 
+\begin_inset Quotes eld
+\end_inset
+
+data_length - number_of_transferred_bytes
+\begin_inset Quotes erd
+\end_inset
+
+, for read or write operations.
+ For bidirectional commands, the number_of_transferred_bytes includes both
+ read and written bytes.
+ A residual field that is less than the size of datain means that the dataout
+ field was processed entirely.
+ A residual field that exceeds the size of datain means that the dataout
+ field was processed partially and the datain field was not processed at
+ all.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572971
+The 
+\series bold
+status
+\series default
+ byte is written by the device to be the status code as defined in SAM.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572971
+The 
+\series bold
+response
+\series default
+ byte is written by the device to be one of the following:
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_OK when the request was completed and the status byte is filled
+ with a SCSI status code (not necessarily "GOOD").
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires transferring more
+ data than is available in the data buffers.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322652973
+VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an ABORT TASK
+ or ABORT TASK SET task management function.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322573041
+VIRTIO_SCSI_S_BAD_TARGET if the request was never processed because the
+ target indicated by the 
+\series bold
+lun
+\series default
+ field does not exist.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322653176
+VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus or device
+ reset (including a task management function).
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a problem in
+ the connection between the host and the target (severed link).
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a failure and the
+ guest should not retry on other paths.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure but retrying
+ on other paths might yield a different result.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322664259
+VIRTIO_SCSI_S_BUSY if the request failed but retrying on the same path should
+ work.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322664258
+VIRTIO_SCSI_S_FAILURE for other host or guest error.
+ In particular, if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_IN
+OUT feature has not been negotiated, the request will be immediately returned
+ with a response equal to VIRTIO_SCSI_S_FAILURE.
+ 
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted 1531152142 1322573130
+Device Operation: controlq
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573193
+The controlq is used for other SCSI transport operations.
+ Requests have the following format:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573233
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573243
+
+struct virtio_scsi_ctrl {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573246
+
+    u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573248
+
+    ...
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573250
+
+    u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574229
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574236
+
+/* response values valid for all commands */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574310
+
+#define VIRTIO_SCSI_S_OK                       0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665338
+
+#define VIRTIO_SCSI_S_BAD_TARGET               3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665342
+
+#define VIRTIO_SCSI_S_BUSY                     5
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665355
+
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE        6
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665357
+
+#define VIRTIO_SCSI_S_TARGET_FAILURE           7
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665358
+
+#define VIRTIO_SCSI_S_NEXUS_FAILURE            8
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665359
+
+#define VIRTIO_SCSI_S_FAILURE                  9
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665363
+
+#define VIRTIO_SCSI_S_INCORRECT_LUN            12
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573193
+The 
+\series bold
+type
+\series default
+ identifies the remaining fields.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573193
+The following commands are defined:
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322576973
+Task
+\begin_inset space \space{}
+\end_inset
+
+management
+\begin_inset space \space{}
+\end_inset
+
+function 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF                      0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+struct virtio_scsi_ctrl_tmf
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+{
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725821
+
+    // Read-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725810
+
+    u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+    u32 subtype;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+    u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+    u64 id;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725832
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+    u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+/* command-specific response values */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665370
+
+#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       10
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322665370
+
+#define VIRTIO_SCSI_S_FUNCTION_REJECTED        11
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322725968
+The type is VIRTIO_SCSI_T_TMF; the subtype field defines.
+ All fields except 
+\series bold
+response
+\series default
+ are filled by the driver.
+ The 
+\series bold
+subtype
+\series default
+ field must always be specified and identifies the requested task management
+ function.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322725982
+Other fields may be irrelevant for the requested TMF; if so, they are ignored
+ but they should still be present.
+ The 
+\series bold
+lun
+\series default
+ field is in the same format specified for request queues; the single level
+ LUN is ignored when the task management function addresses a whole I_T
+ nexus.
+ When relevant, the value of the 
+\series bold
+id
+\series default
+ field is matched against the id values passed on the requestq.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574270
+The outcome of the task management function is written by the device in
+ the response field.
+ The command-specific response values map 1-to-1 with those defined in SAM.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576979
+Asynchronous
+\begin_inset space \space{}
+\end_inset
+
+notification
+\begin_inset space \space{}
+\end_inset
+
+query 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_T_AN_QUERY                    1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+struct virtio_scsi_ctrl_an {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725848
+
+    // Read-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725843
+
+    u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+    u8  lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+    u32 event_requested;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725838
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725838
+
+    u32 event_actual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+    u8  response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE  2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT          4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST    8
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST          32
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY         64
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574687
+By sending this command, the driver asks the device which events the given
+ LUN can report, as described in paragraphs 6.6 and A.6 of the SCSI MMC specificat
+ion.
+ The driver writes the events it is interested in into the event_requested;
+ the device responds by writing the events that it supports into event_actual.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574688
+The 
+\series bold
+type
+\series default
+ is VIRTIO_SCSI_T_AN_QUERY.
+ The 
+\series bold
+lun
+\series default
+ and 
+\series bold
+event_requested
+\series default
+ fields are written by the driver.
+ The 
+\series bold
+event_actual
+\series default
+ and 
+\series bold
+response
+\series default
+ fields are written by the device.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574345
+No command-specific values are defined for the response byte.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576981
+Asynchronous
+\begin_inset space \space{}
+\end_inset
+
+notification
+\begin_inset space \space{}
+\end_inset
+
+subscription 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574354
+
+#define VIRTIO_SCSI_T_AN_SUBSCRIBE                2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+struct virtio_scsi_ctrl_an {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725858
+
+    // Read-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+    u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+    u8  lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+    u32 event_requested;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725864
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+    u32 event_actual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+    u8  response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574708
+By sending this command, the driver asks the specified LUN to report events
+ for its physical interface, again as described in the SCSI MMC specification.
+ The driver writes the events it is interested in into the event_requested;
+ the device responds by writing the events that it supports into event_actual.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574709
+Event types are the same as for the asynchronous notification query message.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574710
+The 
+\series bold
+type
+\series default
+ is VIRTIO_SCSI_T_AN_SUBSCRIBE.
+ The 
+\series bold
+lun
+\series default
+ and 
+\series bold
+event_requested
+\series default
+ fields are written by the driver.
+ The 
+\series bold
+event_actual
+\series default
+ and 
+\series bold
+response
+\series default
+ fields are written by the device.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574419
+No command-specific values are defined for the response byte.
+\end_layout
+
+\end_deeper
+\begin_layout Section*
+
+\change_inserted 1531152142 1322574433
+Device Operation: eventq
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322653610
+The eventq is used by the device to report information on logical units
+ that are attached to it.
+ The driver should always leave a few buffers ready in the eventq.
+ In general, the device will not queue events to cope with an empty eventq,
+ and will end up dropping events if it finds no buffer ready.
+ However, when reporting events for many LUNs (e.g.
+ when a whole target disappears), the device can throttle events to avoid
+ dropping them.
+ For this reason, placing 10-15 buffers on the event queue should be enough.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574442
+Buffers are placed in the eventq and filled by the device when interesting
+ events occur.
+ The buffers should be strictly write-only (device-filled) and the size
+ of the buffers should be at least the value given in the device's configuration
+ information.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574487
+Buffers returned by the device on the eventq will be referred to as "events"
+ in the rest of this section.
+ Events have the following format: 
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574508
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+#define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+struct virtio_scsi_event {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725871
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+    u32 event;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+    ...
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574516
+If bit 31 is set in the event field, the device failed to report an event
+ due to missing buffers.
+ In this case, the driver should poll the logical units for unit attention
+ conditions, and/or do whatever form of bus scan is appropriate for the
+ guest operating system.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574521
+Other data that the device writes to the buffer depends on the contents
+ of the event field.
+ The following events are defined:
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322653652
+No
+\begin_inset space \space{}
+\end_inset
+
+event 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574545
+
+#define VIRTIO_SCSI_T_NO_EVENT         0
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322576984
+This event is fired in the following cases: 
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322574588
+When the device detects in the eventq a buffer that is shorter than what
+ is indicated in the configuration field, it might use it immediately and
+ put this dummy value in the event field.
+ A well-written driver will never observe this situation.
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322574604
+When events are dropped, the device may signal this event as soon as the
+ drivers makes a buffer available, in order to request action from the driver.
+ In this case, of course, this event will be reported with the VIRTIO_SCSI_T_EVE
+NTS_MISSED flag.
+ 
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576985
+Transport
+\begin_inset space \space{}
+\end_inset
+
+reset 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_T_TRANSPORT_RESET  1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725908
+
+struct virtio_scsi_event_reset {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725876
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+    u32 event;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+    u8  lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+    u32 reason;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_EVT_RESET_HARD         0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_EVT_RESET_RESCAN       1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_EVT_RESET_REMOVED      2
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574756
+By sending this event, the device signals that a logical unit on a target
+ has been reset, including the case of a new device appearing or disappearing
+ on the bus.The device fills in all fields.
+ The 
+\series bold
+event
+\series default
+ field is set to VIRTIO_SCSI_T_TRANSPORT_RESET.
+ The 
+\series bold
+lun
+\series default
+ field addresses a logical unit in the SCSI host.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322577082
+The 
+\series bold
+reason
+\series default
+ value is one of the three #define values appearing above:
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577449
+
+\series bold
+VIRTIO_SCSI_EVT_RESET_REMOVED
+\series default
+ (
+\begin_inset Quotes eld
+\end_inset
+
+LUN/target removed
+\begin_inset Quotes erd
+\end_inset
+
+) is used if the target or logical unit is no longer able to receive commands.
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577452
+
+\series bold
+VIRTIO_SCSI_EVT_RESET_HARD
+\series default
+ (
+\begin_inset Quotes eld
+\end_inset
+
+LUN hard reset
+\begin_inset Quotes erd
+\end_inset
+
+) is used if the logical unit has been reset, but is still present.
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577446
+
+\series bold
+VIRTIO_SCSI_EVT_RESET_RESCAN
+\series default
+ (
+\begin_inset Quotes eld
+\end_inset
+
+rescan LUN/target
+\begin_inset Quotes erd
+\end_inset
+
+) is used if a target or logical unit has just appeared on the device.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1328439419
+The 
+\begin_inset Quotes eld
+\end_inset
+
+removed
+\begin_inset Quotes erd
+\end_inset
+
+ and 
+\begin_inset Quotes eld
+\end_inset
+
+rescan
+\begin_inset Quotes erd
+\end_inset
+
+ events, when sent for LUN 0, may apply to the entire target.
+ After receiving them the driver should ask the initiator to rescan the
+ target, in order to detect the case when an entire target has appeared
+ or disappeared.
+ These two events will never be reported unless the 
+\series bold
+VIRTIO_SCSI_F_HOTPLUG
+\series default
+ feature was negotiated between the host and the guest.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322577057
+Events will also be reported via sense codes (this obviously does not apply
+ to newly appeared buses or targets, since the application has never discovered
+ them):
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577457
+\begin_inset Quotes eld
+\end_inset
+
+LUN/target removed
+\begin_inset Quotes erd
+\end_inset
+
+ maps to sense key ILLEGAL REQUEST, asc 0x25, ascq 0x00 (LOGICAL UNIT NOT
+ SUPPORTED)
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577460
+\begin_inset Quotes eld
+\end_inset
+
+LUN hard reset
+\begin_inset Quotes erd
+\end_inset
+
+ maps to sense key UNIT ATTENTION, asc 0x29 (POWER ON, RESET OR BUS DEVICE
+ RESET OCCURRED)
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577462
+\begin_inset Quotes eld
+\end_inset
+
+rescan LUN/target
+\begin_inset Quotes erd
+\end_inset
+
+ maps to sense key UNIT ATTENTION, asc 0x3f, ascq 0x0e (REPORTED LUNS DATA
+ HAS CHANGED)
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575482
+The preferred way to detect transport reset is always to use events, because
+ sense codes are only seen by the driver when it sends a SCSI command to
+ the logical unit or target.
+ However, in case events are dropped, the initiator will still be able to
+ synchronize with the actual state of the controller if the driver asks
+ the initiator to rescan of the SCSI bus.
+ During the rescan, the initiator will be able to observe the above sense
+ codes, and it will process them as if it the driver had received the equivalent
+ event.
+ 
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576987
+Asynchronous
+\begin_inset space \space{}
+\end_inset
+
+notification 
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+#define VIRTIO_SCSI_T_ASYNC_NOTIFY     2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725913
+
+struct virtio_scsi_event_an {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322725880
+
+    // Write-only part
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+    u32 event;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+    u8  lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+    u32 reason;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575520
+By sending this event, the device signals that an asynchronous event was
+ fired from a physical interface.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575546
+All fields are written by the device.
+ The 
+\series bold
+event
+\series default
+ field is set to VIRTIO_SCSI_T_ASYNC_NOTIFY.
+ The 
+\series bold
+lun
+\series default
+ field addresses a logical unit in the SCSI host.
+ The 
+\series bold
+reason
+\series default
+ field is a subset of the events that the driver has subscribed to via the
+ "Asynchronous notification subscription" command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575520
+When dropped events are reported, the driver should poll for asynchronous
+ events manually using SCSI commands.
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Chapter*
 Appendix X: virtio-mmio
 \end_layout

^ permalink raw reply

* Re: [PATCH 3/6] drivers: hv: Cleanup the kvp related state in hyperv.h
From: Joe Perches @ 2012-02-03  2:39 UTC (permalink / raw)
  To: Greg KH
  Cc: KY Srinivasan, Haiyang Zhang, gregkh@suse.de, ohering@suse.com,
	linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
	zbr@ioremap.net, devel@linuxdriverproject.org
In-Reply-To: <20120202234848.GA26647@kroah.com>

On Thu, 2012-02-02 at 15:48 -0800, Greg KH wrote:
> On Thu, Feb 02, 2012 at 11:41:29PM +0000, KY Srinivasan wrote:
> > > > +#define __packed __attribute__((packed))
[]
> > When I ran the checkpatch script against these patches I got a warning that the
> > preferred directive was to use "__packed".
> Well, checkpatch is stupid sometimes.

True.

I suppose checkpatch warning could be improved to
say that it's preferred for "kernel only" files
and should be ignored for public/user cases.

^ permalink raw reply

* [PATCH 4/4] drivers: hv: Increase the number of VCPUs supported in the guest
From: K. Y. Srinivasan @ 2012-02-03  0:56 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering
  Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1328230611-3315-1-git-send-email-kys@microsoft.com>

The current code arbirarily limited the number of CPUs the guest could have.
Change that so that we can support the maximum number of CPUs the guest can
support. While we use NR_CPUS to size the per-cpu state all we are allocating
based on NR_CPUS are the  pointers to per-cpu state that will be allocatted in
the context of the initializing CPU. This patch triggers a checkpatch warning
for the usage of NR_CPU and since all we are allocating a couple of pointers
per CPU, it should be ok.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/hv/hv.c           |    4 ++--
 drivers/hv/hyperv_vmbus.h |    5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 12aa97f..15956bd 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -155,9 +155,9 @@ int hv_init(void)
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 	void *virtaddr = NULL;
 
-	memset(hv_context.synic_event_page, 0, sizeof(void *) * MAX_NUM_CPUS);
+	memset(hv_context.synic_event_page, 0, sizeof(void *) * NR_CPUS);
 	memset(hv_context.synic_message_page, 0,
-	       sizeof(void *) * MAX_NUM_CPUS);
+	       sizeof(void *) * NR_CPUS);
 
 	if (!query_hypervisor_presence())
 		goto cleanup;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 6d7d286..699f0d8 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -457,7 +457,6 @@ static const uuid_le VMBUS_SERVICE_ID = {
 	},
 };
 
-#define MAX_NUM_CPUS	32
 
 
 struct hv_input_signal_event_buffer {
@@ -483,8 +482,8 @@ struct hv_context {
 	/* 8-bytes aligned of the buffer above */
 	struct hv_input_signal_event *signal_event_param;
 
-	void *synic_message_page[MAX_NUM_CPUS];
-	void *synic_event_page[MAX_NUM_CPUS];
+	void *synic_message_page[NR_CPUS];
+	void *synic_event_page[NR_CPUS];
 };
 
 extern struct hv_context hv_context;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 3/4] drivers: hv: kvp: Cleanup the kernel/user protocol
From: K. Y. Srinivasan @ 2012-02-03  0:56 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering
  Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1328230611-3315-1-git-send-email-kys@microsoft.com>

Now, cleanup the user/kernel KVP protocol by using the same structure
definition that is used for host/guest KVP protocol. This simplifies the code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/hv/hv_kvp.c      |   41 +++++++++++++++++++++++++----------------
 include/linux/hyperv.h   |   30 +++++-------------------------
 tools/hv/hv_kvp_daemon.c |   30 +++++++++++++++---------------
 3 files changed, 45 insertions(+), 56 deletions(-)

diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 4a6971e..0ef4c1f 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -71,15 +71,20 @@ kvp_register(void)
 {
 
 	struct cn_msg *msg;
+	struct hv_kvp_msg *kvp_msg;
+	char *version;
 
-	msg = kzalloc(sizeof(*msg) + strlen(HV_DRV_VERSION) + 1 , GFP_ATOMIC);
+	msg = kzalloc(sizeof(*msg) + sizeof(struct hv_kvp_msg), GFP_ATOMIC);
 
 	if (msg) {
+		kvp_msg = (struct hv_kvp_msg *)msg->data;
+		version = kvp_msg->body.kvp_version;
 		msg->id.idx =  CN_KVP_IDX;
 		msg->id.val = CN_KVP_VAL;
-		msg->seq = KVP_REGISTER;
-		strcpy(msg->data, HV_DRV_VERSION);
-		msg->len = strlen(HV_DRV_VERSION) + 1;
+
+		kvp_msg->kvp_hdr.operation = KVP_OP_REGISTER;
+		strcpy(version, HV_DRV_VERSION);
+		msg->len = sizeof(struct hv_kvp_msg);
 		cn_netlink_send(msg, 0, GFP_ATOMIC);
 		kfree(msg);
 	}
@@ -101,23 +106,24 @@ kvp_work_func(struct work_struct *dummy)
 static void
 kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
-	struct hv_ku_msg *message;
+	struct hv_kvp_msg *message;
+	struct hv_kvp_msg_enumerate *data;
 
-	message = (struct hv_ku_msg *)msg->data;
-	if (msg->seq == KVP_REGISTER) {
+	message = (struct hv_kvp_msg *)msg->data;
+	if (message->kvp_hdr.operation == KVP_OP_REGISTER) {
 		pr_info("KVP: user-mode registering done.\n");
 		kvp_register();
 	}
 
-	if (msg->seq == KVP_USER_SET) {
+	if (message->kvp_hdr.operation == KVP_OP_ENUMERATE) {
+		data = &message->body.kvp_enum_data;
 		/*
 		 * Complete the transaction by forwarding the key value
 		 * to the host. But first, cancel the timeout.
 		 */
 		if (cancel_delayed_work_sync(&kvp_work))
-			kvp_respond_to_host(message->kvp_key,
-						message->kvp_value,
-						!strlen(message->kvp_key));
+			kvp_respond_to_host(data->data.key, data->data.value,
+					!strlen(data->data.key));
 	}
 }
 
@@ -125,6 +131,7 @@ static void
 kvp_send_key(struct work_struct *dummy)
 {
 	struct cn_msg *msg;
+	struct hv_kvp_msg *message;
 	int index = kvp_transaction.index;
 
 	msg = kzalloc(sizeof(*msg) + sizeof(struct hv_kvp_msg) , GFP_ATOMIC);
@@ -132,9 +139,11 @@ kvp_send_key(struct work_struct *dummy)
 	if (msg) {
 		msg->id.idx =  CN_KVP_IDX;
 		msg->id.val = CN_KVP_VAL;
-		msg->seq = KVP_KERNEL_GET;
-		((struct hv_ku_msg *)msg->data)->kvp_index = index;
-		msg->len = sizeof(struct hv_ku_msg);
+
+		message = (struct hv_kvp_msg *)msg->data;
+		message->kvp_hdr.operation = KVP_OP_ENUMERATE;
+		message->body.kvp_enum_data.index = index;
+		msg->len = sizeof(struct hv_kvp_msg);
 		cn_netlink_send(msg, 0, GFP_ATOMIC);
 		kfree(msg);
 	}
@@ -191,7 +200,7 @@ kvp_respond_to_host(char *key, char *value, int error)
 	kvp_msg = (struct hv_kvp_msg *)
 			&recv_buffer[sizeof(struct vmbuspipe_hdr) +
 			sizeof(struct icmsg_hdr)];
-	kvp_data = &kvp_msg->kvp_data;
+	kvp_data = &kvp_msg->body.kvp_enum_data;
 	key_name = key;
 
 	/*
@@ -266,7 +275,7 @@ void hv_kvp_onchannelcallback(void *context)
 				sizeof(struct vmbuspipe_hdr) +
 				sizeof(struct icmsg_hdr)];
 
-			kvp_data = &kvp_msg->kvp_data;
+			kvp_data = &kvp_msg->body.kvp_enum_data;
 
 			/*
 			 * We only support the "get" operation on
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b822978..75aee67 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -113,30 +113,6 @@
  * (not supported), a NULL key string is returned.
  */
 
-/*
- *
- * The following definitions are shared with the user-mode component; do not
- * change any of this without making the corresponding changes in
- * the KVP user-mode component.
- */
-
-enum hv_ku_op {
-	KVP_REGISTER = 0, /* Register the user mode component */
-	KVP_KERNEL_GET, /* Kernel is requesting the value */
-	KVP_KERNEL_SET, /* Kernel is providing the value */
-	KVP_USER_GET,  /* User is requesting the value */
-	KVP_USER_SET  /* User is providing the value */
-};
-
-struct hv_ku_msg {
-	__u32 kvp_index; /* Key index */
-	__u8  kvp_key[HV_KVP_EXCHANGE_MAX_KEY_SIZE]; /* Key name */
-	__u8  kvp_value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE]; /* Key  value */
-};
-
-
-
-
 
 /*
  * Registry value types.
@@ -149,6 +125,7 @@ enum hv_kvp_exchg_op {
 	KVP_OP_SET,
 	KVP_OP_DELETE,
 	KVP_OP_ENUMERATE,
+	KVP_OP_REGISTER,
 	KVP_OP_COUNT /* Number of operations, must be last. */
 };
 
@@ -182,7 +159,10 @@ struct hv_kvp_msg_enumerate {
 
 struct hv_kvp_msg {
 	struct hv_kvp_hdr	kvp_hdr;
-	struct hv_kvp_msg_enumerate	kvp_data;
+	union {
+		struct hv_kvp_msg_enumerate     kvp_enum_data;
+		char    kvp_version[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+	} body;
 } __attribute__((packed));
 
 #ifdef __KERNEL__
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index b75523c..4ebf703 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -302,7 +302,7 @@ int main(void)
 	struct pollfd pfd;
 	struct nlmsghdr *incoming_msg;
 	struct cn_msg	*incoming_cn_msg;
-	struct hv_ku_msg *hv_msg;
+	struct hv_kvp_msg *hv_msg;
 	char	*p;
 	char	*key_value;
 	char	*key_name;
@@ -340,9 +340,11 @@ int main(void)
 	message = (struct cn_msg *)kvp_send_buffer;
 	message->id.idx = CN_KVP_IDX;
 	message->id.val = CN_KVP_VAL;
-	message->seq = KVP_REGISTER;
+
+	hv_msg = (struct hv_kvp_msg *)message->data;
+	hv_msg->kvp_hdr.operation = KVP_OP_REGISTER;
 	message->ack = 0;
-	message->len = 0;
+	message->len = sizeof(struct hv_kvp_msg);
 
 	len = netlink_send(fd, message);
 	if (len < 0) {
@@ -368,14 +370,15 @@ int main(void)
 
 		incoming_msg = (struct nlmsghdr *)kvp_recv_buffer;
 		incoming_cn_msg = (struct cn_msg *)NLMSG_DATA(incoming_msg);
+		hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
 
-		switch (incoming_cn_msg->seq) {
-		case KVP_REGISTER:
+		switch (hv_msg->kvp_hdr.operation) {
+		case KVP_OP_REGISTER:
 			/*
 			 * Driver is registering with us; stash away the version
 			 * information.
 			 */
-			p = (char *)incoming_cn_msg->data;
+			p = (char *)hv_msg->body.kvp_version;
 			lic_version = malloc(strlen(p) + 1);
 			if (lic_version) {
 				strcpy(lic_version, p);
@@ -386,17 +389,15 @@ int main(void)
 			}
 			continue;
 
-		case KVP_KERNEL_GET:
-			break;
 		default:
-			continue;
+			break;
 		}
 
-		hv_msg = (struct hv_ku_msg *)incoming_cn_msg->data;
-		key_name = (char *)hv_msg->kvp_key;
-		key_value = (char *)hv_msg->kvp_value;
+		hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
+		key_name = (char *)hv_msg->body.kvp_enum_data.data.key;
+		key_value = (char *)hv_msg->body.kvp_enum_data.data.value;
 
-		switch (hv_msg->kvp_index) {
+		switch (hv_msg->body.kvp_enum_data.index) {
 		case FullyQualifiedDomainName:
 			kvp_get_domain_name(key_value,
 					HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
@@ -456,9 +457,8 @@ int main(void)
 
 		incoming_cn_msg->id.idx = CN_KVP_IDX;
 		incoming_cn_msg->id.val = CN_KVP_VAL;
-		incoming_cn_msg->seq = KVP_USER_SET;
 		incoming_cn_msg->ack = 0;
-		incoming_cn_msg->len = sizeof(struct hv_ku_msg);
+		incoming_cn_msg->len = sizeof(struct hv_kvp_msg);
 
 		len = netlink_send(fd, incoming_cn_msg);
 		if (len < 0) {
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 2/4] tools: hv: Use hyperv.h to get the KVP definitions
From: K. Y. Srinivasan @ 2012-02-03  0:56 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering
  Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1328230611-3315-1-git-send-email-kys@microsoft.com>

Now use hyperv.h to get the KVP defines in the KVP user-mode code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   28 +---------------------------
 1 files changed, 1 insertions(+), 27 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 2b6a2d9..b75523c 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -34,17 +34,12 @@
 #include <errno.h>
 #include <arpa/inet.h>
 #include <linux/connector.h>
+#include <linux/hyperv.h>
 #include <linux/netlink.h>
 #include <ifaddrs.h>
 #include <netdb.h>
 #include <syslog.h>
 
-/*
- *
- * The following definitions are shared with the in-kernel component; do not
- * change any of this without making the corresponding changes in
- * the KVP kernel component.
- */
 
 /*
  * KVP protocol: The user mode component first registers with the
@@ -56,25 +51,8 @@
  * We use this infrastructure for also supporting queries from user mode
  * application for state that may be maintained in the KVP kernel component.
  *
- * XXXKYS: Have a shared header file between the user and kernel (TODO)
  */
 
-enum kvp_op {
-	KVP_REGISTER = 0, /* Register the user mode component*/
-	KVP_KERNEL_GET, /*Kernel is requesting the value for the specified key*/
-	KVP_KERNEL_SET, /*Kernel is providing the value for the specified key*/
-	KVP_USER_GET, /*User is requesting the value for the specified key*/
-	KVP_USER_SET /*User is providing the value for the specified key*/
-};
-
-#define HV_KVP_EXCHANGE_MAX_KEY_SIZE	512
-#define HV_KVP_EXCHANGE_MAX_VALUE_SIZE	2048
-
-struct hv_ku_msg {
-	__u32	kvp_index;
-	__u8  kvp_key[HV_KVP_EXCHANGE_MAX_KEY_SIZE]; /* Key name */
-	__u8  kvp_value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE]; /* Key  value */
-};
 
 enum key_index {
 	FullyQualifiedDomainName = 0,
@@ -89,10 +67,6 @@ enum key_index {
 	ProcessorArchitecture
 };
 
-/*
- * End of shared definitions.
- */
-
 static char kvp_send_buffer[4096];
 static char kvp_recv_buffer[4096];
 static struct sockaddr_nl addr;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 1/4] drivers: hv: Cleanup the kvp related state in hyperv.h
From: K. Y. Srinivasan @ 2012-02-03  0:56 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering
  Cc: K. Y. Srinivasan, Haiyang Zhang
In-Reply-To: <1328230577-3263-1-git-send-email-kys@microsoft.com>

Now cleanup the hyperv.h with regards to KVP definitions.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 include/linux/hyperv.h |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7332b3f..b822978 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -137,7 +137,6 @@ struct hv_ku_msg {
 
 
 
-#ifdef __KERNEL__
 
 /*
  * Registry value types.
@@ -163,28 +162,30 @@ enum hv_kvp_exchg_pool {
 };
 
 struct hv_kvp_hdr {
-	u8 operation;
-	u8 pool;
-};
+	__u8 operation;
+	__u8 pool;
+	__u16 pad;
+} __attribute__((packed));
 
 struct hv_kvp_exchg_msg_value {
-	u32 value_type;
-	u32 key_size;
-	u32 value_size;
-	u8 key[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
-	u8 value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE];
-};
+	__u32 value_type;
+	__u32 key_size;
+	__u32 value_size;
+	__u8 key[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+	__u8 value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE];
+} __attribute__((packed));
 
 struct hv_kvp_msg_enumerate {
-	u32 index;
+	__u32 index;
 	struct hv_kvp_exchg_msg_value data;
-};
+} __attribute__((packed));
 
 struct hv_kvp_msg {
 	struct hv_kvp_hdr	kvp_hdr;
 	struct hv_kvp_msg_enumerate	kvp_data;
-};
+} __attribute__((packed));
 
+#ifdef __KERNEL__
 #include <linux/scatterlist.h>
 #include <linux/list.h>
 #include <linux/uuid.h>
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 0000/0004] drivers: hv
From: K. Y. Srinivasan @ 2012-02-03  0:56 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering; +Cc: K. Y. Srinivasan


This patch-set does some cleanup of the KVP component. Also included is a
patch that will remove artificial limitation on the number of VCPUs that
can be assigned to the Linux guest on Hyper-V. This is a resend of a subset
of the patches that was sent earlier. Of the original 6 patch set,
the first two were applied. I am re-sending the remaining four with
the changes Greg wanted.

Regards,

K. Y

^ permalink raw reply

* Re: [PATCH 3/6] drivers: hv: Cleanup the kvp related state in hyperv.h
From: Greg KH @ 2012-02-02 23:48 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: gregkh@suse.de, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, virtualization@lists.osdl.org,
	ohering@suse.com, zbr@ioremap.net, Haiyang Zhang
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B70927D@TK5EX14MBXC128.redmond.corp.microsoft.com>

On Thu, Feb 02, 2012 at 11:41:29PM +0000, KY Srinivasan wrote:
> > > +#ifndef __packed
> > > +#define __packed __attribute__((packed))
> > > +#endif
> > 
> > Why do this?
> > 
> > If you are so worried about this in userspace, then just change the
> > values below to __attribute__((packed)), like all of the other public .h
> > files do.
> 
> Greg,
> 
> When I ran the checkpatch script against these patches I got a warning that the
> preferred directive was to use "__packed".

Well, checkpatch is stupid sometimes.

> So, for the header file that will be
> consumed in the kernel, I chose to go with the __packed. For inclusion of this
> header file in the user space daemon, I put in this definition here. So rather
> than having numerous warnings, I now have a single warning. If you prefer,
> I can move this definition to the daemon code where it is really needed.

Please do it like all other public kernel header files do.

greg k-h

^ permalink raw reply

* RE: [PATCH 3/6] drivers: hv: Cleanup the kvp related state in hyperv.h
From: KY Srinivasan @ 2012-02-02 23:41 UTC (permalink / raw)
  To: Greg KH
  Cc: gregkh@suse.de, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, virtualization@lists.osdl.org,
	ohering@suse.com, zbr@ioremap.net, Haiyang Zhang
In-Reply-To: <20120202232936.GA9614@kroah.com>



> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Thursday, February 02, 2012 6:30 PM
> To: KY Srinivasan
> Cc: gregkh@suse.de; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> zbr@ioremap.net; Haiyang Zhang
> Subject: Re: [PATCH 3/6] drivers: hv: Cleanup the kvp related state in hyperv.h
> 
> On Fri, Jan 27, 2012 at 03:55:59PM -0800, K. Y. Srinivasan wrote:
> > Now cleanup the hyperv.h with regards to KVP definitions.
> >
> > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> >  include/linux/hyperv.h |   32 +++++++++++++++++++-------------
> >  1 files changed, 19 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> > index 7332b3f..802ece8 100644
> > --- a/include/linux/hyperv.h
> > +++ b/include/linux/hyperv.h
> > @@ -137,7 +137,6 @@ struct hv_ku_msg {
> >
> >
> >
> > -#ifdef __KERNEL__
> >
> >  /*
> >   * Registry value types.
> > @@ -162,29 +161,36 @@ enum hv_kvp_exchg_pool {
> >  	KVP_POOL_COUNT /* Number of pools, must be last. */
> >  };
> >
> > +#ifndef __packed
> > +#define __packed __attribute__((packed))
> > +#endif
> 
> Why do this?
> 
> If you are so worried about this in userspace, then just change the
> values below to __attribute__((packed)), like all of the other public .h
> files do.

Greg,

When I ran the checkpatch script against these patches I got a warning that the
preferred directive was to use "__packed".  So, for the header file that will be
consumed in the kernel, I chose to go with the __packed. For inclusion of this
header file in the user space daemon, I put in this definition here. So rather
than having numerous warnings, I now have a single warning. If you prefer,
I can move this definition to the daemon code where it is really needed.

Regards,

K. Y 

^ permalink raw reply

* Re: [PATCH 0000/0006] drivers: hv
From: Greg KH @ 2012-02-02 23:31 UTC (permalink / raw)
  To: K. Y. Srinivasan
  Cc: gregkh, linux-kernel, devel, virtualization, ohering, zbr
In-Reply-To: <1327708522-26914-1-git-send-email-kys@microsoft.com>

On Fri, Jan 27, 2012 at 03:55:22PM -0800, K. Y. Srinivasan wrote:
> 
> This patch-set does some cleanup of the KVP component. Also included is a
> patch that will remove artificial limitation on the number of VCPUs that
> can be assigned to the Linux guest on Hyper-V.

I have applied the first 2, please fix patch 3, and resend the rest.

thanks,

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox