* [Qemu-devel] [PATCH 0/1 V6] VMWare PVSCSI paravirtual device implementation
@ 2013-04-08 18:39 Dmitry Fleytman
2013-04-08 18:39 ` [Qemu-devel] [PATCH 1/1 " Dmitry Fleytman
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Fleytman @ 2013-04-08 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: Dmitry Fleytman, Yan Vugenfirer, Deep Debroy, Anthony Liguori,
Paolo Bonzini
Below is the implementation of VMWare PVSCSI device
PVSCSI implementation is based on Paolo Bonzini code sumbitted
some time ago but never applied.
See commit messages and file headers for details.
This patch contains changes made by Deep Debroy, see here:
http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03585.html
Cc: Deep Debroy <ddebroy@gmail.com>
Implementation supports of all the device features.
Code was tested on different OSes:
Fedora 15
Ubuntu 10.4
Centos 6.2
Windows 2008R2
Windows 2008 64bit
Windows 2008 32bit
Windows 2003 64bit
Windows 2003 32bit
Changes since V5:
1. SCSI hotplug support added
2. Code rebase for mainline
Changes since V4:
Array access checks and minor beautification as suggested by Blue Swirl.
Reported-by: Blue Swirl <blauwirbel@gmail.com>
Changes since V3:
1. Utility function strpadcpy() and structure changes in SCSI devices removed from v4 since they are already applied to scsi-next from v3 by Paolo.
2. Logging ported to use tracepoints. All ifdef based custom macros for logging removed.
3. The vmware_utils.h is no longer present with necessary macros inlined.
4. pvscsi.h replaced by vmw_pvscsi.h from linux kernel with some minor modifications to build in qemu.
5. Various fixes and beautification as suggested by Blue Swirl.
Reported-by: Blue Swirl <blauwirbel@gmail.com>
Changes since V1:
Various fixes and beautification as suggested by Paolo Bonzini
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Dmitry Fleytman (1):
VMWare PVSCSI paravirtual device implementation
default-configs/pci.mak | 1 +
docs/specs/pvscsi-spec.txt | 92 ++++
hw/Makefile.objs | 1 +
hw/pci/pci.h | 1 +
hw/pvscsi.c | 1194 ++++++++++++++++++++++++++++++++++++++++++++
hw/vmw_pvscsi.h | 434 ++++++++++++++++
trace-events | 36 ++
7 files changed, 1759 insertions(+)
create mode 100644 docs/specs/pvscsi-spec.txt
create mode 100644 hw/pvscsi.c
create mode 100644 hw/vmw_pvscsi.h
--
1.8.1.4
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 1/1 V6] VMWare PVSCSI paravirtual device implementation
2013-04-08 18:39 [Qemu-devel] [PATCH 0/1 V6] VMWare PVSCSI paravirtual device implementation Dmitry Fleytman
@ 2013-04-08 18:39 ` Dmitry Fleytman
2013-04-10 9:33 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Fleytman @ 2013-04-08 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: Dmitry Fleytman, Yan Vugenfirer, Anthony Liguori, Paolo Bonzini
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
---
default-configs/pci.mak | 1 +
docs/specs/pvscsi-spec.txt | 92 ++++
hw/Makefile.objs | 1 +
hw/pci/pci.h | 1 +
hw/pvscsi.c | 1194 ++++++++++++++++++++++++++++++++++++++++++++
hw/vmw_pvscsi.h | 434 ++++++++++++++++
trace-events | 36 ++
7 files changed, 1759 insertions(+)
create mode 100644 docs/specs/pvscsi-spec.txt
create mode 100644 hw/pvscsi.c
create mode 100644 hw/vmw_pvscsi.h
diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index ce56d58..3f8375c 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -10,6 +10,7 @@ CONFIG_EEPRO100_PCI=y
CONFIG_PCNET_PCI=y
CONFIG_PCNET_COMMON=y
CONFIG_LSI_SCSI_PCI=y
+CONFIG_PVSCSI_SCSI_PCI=y
CONFIG_MEGASAS_SCSI_PCI=y
CONFIG_RTL8139_PCI=y
CONFIG_E1000_PCI=y
diff --git a/docs/specs/pvscsi-spec.txt b/docs/specs/pvscsi-spec.txt
new file mode 100644
index 0000000..b2c3a55
--- /dev/null
+++ b/docs/specs/pvscsi-spec.txt
@@ -0,0 +1,92 @@
+General Description
+===================
+
+This document describes VMWare PVSCSI device interface specification.
+Created by Dmitry Fleytman (dmitry@daynix.com), Daynix Computing LTD.
+Based on source code of PVSCSI Linux driver from kernel 3.0.4
+
+PVSCSI Device Interface Overview
+================================
+
+The interface is based on memory area shared between hypervisor and VM.
+Memory area is obtained by driver as device IO memory resource of
+PVSCSI_MEM_SPACE_SIZE length.
+The shared memory consists of registers area and rings area.
+The registers area is used to raise hypervisor interrupts and issue device
+commands. The rings area is used to transfer data descriptors and SCSI
+commands from VM to hypervisor and to transfer messages produced by
+hypervisor to VM. Data itself is transferred via virtual scatter-gather DMA.
+
+PVSCSI Device Registers
+=======================
+
+Registers area length is 1 page (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES).
+Registers area structure is described by PVSCSIRegOffset enumeration.
+There are registers to issue device command (with optional short data),
+issue device interrupt, control interrupts masking.
+
+PVSCSI Device Rings
+===================
+
+There are three rings in shared memory:
+
+ 1. Request ring (struct PVSCSIRingReqDesc *req_ring)
+ - ring for OS to device requests
+ 2. Completion ring (struct PVSCSIRingCmpDesc *cmp_ring)
+ - ring for device request completions
+ 3. Message ring (struct PVSCSIRingMsgDesc *msg_ring)
+ - ring for messages from device.
+ This ring is optional and may be not configured.
+There is a control area (struct PVSCSIRingsState *rings_state) used to control
+rings operation.
+
+PVSCSI Device to Host Interrupts
+================================
+There are following interrupt types supported by PVSCSI device:
+ 1. Completion interrupts (completion ring notifications):
+ PVSCSI_INTR_CMPL_0
+ PVSCSI_INTR_CMPL_1
+ 2. Message interrupts (message ring notifications):
+ PVSCSI_INTR_MSG_0
+ PVSCSI_INTR_MSG_1
+
+Interrupts are controlled via PVSCSI_REG_OFFSET_INTR_MASK register
+Bit set means interrupt enabled, bit cleared - disabled
+
+Interrupt modes supported are legacy, MSI and MSI-X
+In case of legacy interrupts register PVSCSI_REG_OFFSET_INTR_STATUS
+used to verify interrupt arrival and to clear interrupt state
+Interrupts are cleared by writing processed bits back
+to interrupt status register.
+
+PVSCSI Device Operation Sequences
+=================================
+
+1. Startup sequence:
+ a. Issue PVSCSI_CMD_ADAPTER_RESET command;
+ aa. Windows driver reads interrupt status register here;
+ b. Issue PVSCSI_CMD_SETUP_MSG_RING command with no additional data,
+ check status and disable device messages if error returned;
+ (Omitted if device messages disabled by driver configuration)
+ c. Issue PVSCSI_CMD_SETUP_RINGS command, provide rings configuration
+ as struct PVSCSICmdDescSetupRings;
+ d. Issue PVSCSI_CMD_SETUP_MSG_RING command again, provide
+ rings configuration as struct PVSCSICmdDescSetupMsgRing;
+ e. Unmask completion and message (if device messages enabled) interrupts.
+
+2. Shutdown sequences
+ a. Mask interrupts;
+ b. Flush request ring using PVSCSI_REG_OFFSET_KICK_NON_RW_IO;
+ c. Issue PVSCSI_CMD_ADAPTER_RESET command.
+
+3. Send request
+ a. Fill next free request ring descriptor;
+ b. Issue PVSCSI_REG_OFFSET_KICK_RW_IO for R/W operations;
+ or PVSCSI_REG_OFFSET_KICK_NON_RW_IO for other operations.
+
+4. Abort command
+ a. Issue PVSCSI_CMD_ABORT_CMD command;
+
+5. Request completion processing
+ a. Upon completion interrupt arrival process completion
+ and message (if enabled) rings.
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index d0b2ecb..6e43763 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -130,6 +130,7 @@ common-obj-$(CONFIG_OPENCORES_ETH) += opencores_eth.o
# SCSI layer
common-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
common-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
+common-obj-$(CONFIG_PVSCSI_SCSI_PCI) += pvscsi.o
common-obj-$(CONFIG_ESP) += esp.o
common-obj-$(CONFIG_ESP_PCI) += esp-pci.o
diff --git a/hw/pci/pci.h b/hw/pci/pci.h
index 9ea67a3..1767fe5 100644
--- a/hw/pci/pci.h
+++ b/hw/pci/pci.h
@@ -59,6 +59,7 @@
#define PCI_DEVICE_ID_VMWARE_SVGA 0x0710
#define PCI_DEVICE_ID_VMWARE_NET 0x0720
#define PCI_DEVICE_ID_VMWARE_SCSI 0x0730
+#define PCI_DEVICE_ID_VMWARE_PVSCSI 0x07C0
#define PCI_DEVICE_ID_VMWARE_IDE 0x1729
#define PCI_DEVICE_ID_VMWARE_VMXNET3 0x07B0
diff --git a/hw/pvscsi.c b/hw/pvscsi.c
new file mode 100644
index 0000000..4c66671
--- /dev/null
+++ b/hw/pvscsi.c
@@ -0,0 +1,1194 @@
+/*
+ * QEMU VMWARE PVSCSI paravirtual SCSI bus
+ *
+ * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Based on implementation by Paolo Bonzini
+ * http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00729.html
+ *
+ * Authors:
+ * Paolo Bonzini <pbonzini@redhat.com>
+ * Dmitry Fleytman <dmitry@daynix.com>
+ * Yan Vugenfirer <yan@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ * NOTE about MSI-X:
+ * MSI-X support has been removed for the moment because it leads Windows OS
+ * to crash on startup. The crash happens because Windows driver requires
+ * MSI-X shared memory to be part of the same BAR used for rings state
+ * registers, etc. This is not supported by QEMU infrastructure so separate
+ * BAR created from MSI-X purposes. Windows driver fails to deal with 2 BARs.
+ *
+ */
+
+#include "scsi-defs.h"
+#include "hw/scsi.h"
+#include "hw/pci/msi.h"
+#include "vmw_pvscsi.h"
+#include "trace.h"
+
+
+#define PVSCSI_MSI_OFFSET (0x50)
+#define PVSCSI_USE_64BIT (true)
+#define PVSCSI_PER_VECTOR_MASK (false)
+
+#define PVSCSI_MAX_DEVS (64)
+#define PVSCSI_MSIX_NUM_VECTORS (1)
+
+#define PVSCSI_MAX_CMD_DATA_WORDS \
+ (sizeof(PVSCSICmdDescSetupRings)/sizeof(uint32_t))
+
+#define RS_GET_FIELD(rs_pa, field) \
+ (ldl_le_phys(rs_pa + offsetof(struct PVSCSIRingsState, field)))
+#define RS_SET_FIELD(rs_pa, field, val) \
+ (stl_le_phys(rs_pa + offsetof(struct PVSCSIRingsState, field), val))
+
+#define TYPE_PVSCSI "pvscsi"
+#define PVSCSI(obj) OBJECT_CHECK(PVSCSIState, (obj), TYPE_PVSCSI)
+
+typedef struct PVSCSIRingsMgr {
+ uint64_t rs_pa;
+ uint32_t txr_len_mask;
+ uint32_t rxr_len_mask;
+ uint32_t msg_len_mask;
+ uint64_t req_ring_pages_pa[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
+ uint64_t cmp_ring_pages_pa[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
+ uint64_t msg_ring_pages_pa[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
+ uint64_t consumed_ptr;
+ uint64_t filled_cmp_ptr;
+ uint64_t filled_msg_ptr;
+} PVSCSIRingsMgr;
+
+typedef struct PVSCSISGState {
+ hwaddr elemAddr;
+ hwaddr dataAddr;
+ uint32_t resid;
+} PVSCSISGState;
+
+typedef QTAILQ_HEAD(, PVSCSIRequest) PVSCSIRequestList;
+
+typedef struct {
+ PCIDevice parent_obj;
+ MemoryRegion io_space;
+ SCSIBus bus;
+ QEMUBH *completion_worker;
+ PVSCSIRequestList pending_queue;
+ PVSCSIRequestList completion_queue;
+
+ uint64_t reg_interrupt_status; /* Interrupt status register value */
+ uint64_t reg_interrupt_enabled; /* Interrupt mask register value */
+ uint64_t reg_command_status; /* Command status register value */
+
+ /* Command data adoption mechanism */
+ uint64_t curr_cmd; /* Last command arrived */
+ uint32_t curr_cmd_data_cntr; /* Amount of data for last command */
+
+ /* Collector for current command data */
+ uint32_t curr_cmd_data[PVSCSI_MAX_CMD_DATA_WORDS];
+
+ uint8_t rings_info_valid; /* Whether data rings initialized */
+ uint8_t msg_ring_info_valid; /* Whether message ring initialized */
+
+ uint8_t msi_used; /* Whether MSI support was installed successfully */
+
+ PVSCSIRingsMgr rings; /* Data transfer rings manager */
+} PVSCSIState;
+
+typedef struct PVSCSIRequest {
+ SCSIRequest *sreq;
+ PVSCSIState *dev;
+ uint8_t sense_key;
+ uint8_t completed;
+ int lun;
+ QEMUSGList sgl;
+ PVSCSISGState sg;
+ struct PVSCSIRingReqDesc req;
+ struct PVSCSIRingCmpDesc cmp;
+ QTAILQ_ENTRY(PVSCSIRequest) next;
+} PVSCSIRequest;
+
+/* Integer binary logarithm */
+static int
+pvscsi_log2(uint32_t input)
+{
+ int log = 0;
+ assert(input > 0);
+ while (input >> ++log) {
+ }
+ return log;
+}
+
+static void
+pvscsi_rings_mgr_init_data(PVSCSIRingsMgr *m, PVSCSICmdDescSetupRings *ri)
+{
+ int i;
+ uint32_t txr_len_log2, rxr_len_log2;
+ uint32_t req_ring_size, cmp_ring_size;
+ m->rs_pa = ri->ringsStatePPN << VMW_PAGE_SHIFT;
+
+ req_ring_size = ri->reqRingNumPages * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
+ cmp_ring_size = ri->cmpRingNumPages * PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
+ txr_len_log2 = pvscsi_log2(req_ring_size - 1);
+ rxr_len_log2 = pvscsi_log2(cmp_ring_size - 1);
+
+ m->txr_len_mask = MASK(txr_len_log2);
+ m->rxr_len_mask = MASK(rxr_len_log2);
+
+ m->consumed_ptr = 0;
+ m->filled_cmp_ptr = 0;
+
+ for (i = 0; i < ri->reqRingNumPages; i++) {
+ m->req_ring_pages_pa[i] = ri->reqRingPPNs[i] << VMW_PAGE_SHIFT;
+ }
+
+ for (i = 0; i < ri->cmpRingNumPages; i++) {
+ m->cmp_ring_pages_pa[i] = ri->cmpRingPPNs[i] << VMW_PAGE_SHIFT;
+ }
+
+ RS_SET_FIELD(m->rs_pa, reqProdIdx, 0);
+ RS_SET_FIELD(m->rs_pa, reqConsIdx, 0);
+ RS_SET_FIELD(m->rs_pa, reqNumEntriesLog2, txr_len_log2);
+
+ RS_SET_FIELD(m->rs_pa, cmpProdIdx, 0);
+ RS_SET_FIELD(m->rs_pa, cmpConsIdx, 0);
+ RS_SET_FIELD(m->rs_pa, cmpNumEntriesLog2, rxr_len_log2);
+
+ trace_pvscsi_rings_mgr_init_data(txr_len_log2, rxr_len_log2);
+
+ /* Flush ring state page changes */
+ smp_wmb();
+}
+
+static void
+pvscsi_rings_mgr_init_msg(PVSCSIRingsMgr *m, PVSCSICmdDescSetupMsgRing *ri)
+{
+ int i;
+ uint32_t len_log2;
+ uint32_t ring_size;
+
+ ring_size = ri->numPages * PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
+ len_log2 = pvscsi_log2(ring_size - 1);
+
+ m->msg_len_mask = MASK(len_log2);
+
+ m->filled_msg_ptr = 0;
+
+ for (i = 0; i < ri->numPages; i++) {
+ m->msg_ring_pages_pa[i] = ri->ringPPNs[i] << VMW_PAGE_SHIFT;
+ }
+
+ RS_SET_FIELD(m->rs_pa, msgProdIdx, 0);
+ RS_SET_FIELD(m->rs_pa, msgConsIdx, 0);
+ RS_SET_FIELD(m->rs_pa, msgNumEntriesLog2, len_log2);
+
+ trace_pvscsi_rings_mgr_init_msg(len_log2);
+
+ /* Flush ring state page changes */
+ smp_wmb();
+}
+
+static void
+pvscsi_rings_mgr_cleanup(PVSCSIRingsMgr *mgr)
+{
+ mgr->rs_pa = 0;
+ mgr->txr_len_mask = 0;
+ mgr->rxr_len_mask = 0;
+ mgr->msg_len_mask = 0;
+ mgr->consumed_ptr = 0;
+ mgr->filled_cmp_ptr = 0;
+ mgr->filled_msg_ptr = 0;
+ memset(mgr->req_ring_pages_pa, 0, sizeof(mgr->req_ring_pages_pa));
+ memset(mgr->cmp_ring_pages_pa, 0, sizeof(mgr->cmp_ring_pages_pa));
+ memset(mgr->msg_ring_pages_pa, 0, sizeof(mgr->msg_ring_pages_pa));
+}
+
+static hwaddr
+pvscsi_rings_mgr_pop_req_descr(PVSCSIRingsMgr *mgr)
+{
+ uint32_t ready_ptr = RS_GET_FIELD(mgr->rs_pa, reqProdIdx);
+
+ if (ready_ptr != mgr->consumed_ptr) {
+ uint32_t next_ready_ptr =
+ mgr->consumed_ptr++ & mgr->txr_len_mask;
+ uint32_t next_ready_page =
+ next_ready_ptr / PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
+ uint32_t inpage_idx =
+ next_ready_ptr % PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
+
+ return mgr->req_ring_pages_pa[next_ready_page] +
+ inpage_idx * sizeof(PVSCSIRingReqDesc);
+ } else {
+ return 0;
+ }
+}
+
+static void
+pvscsi_rings_mgr_flush_req_ring(PVSCSIRingsMgr *mgr)
+{
+ RS_SET_FIELD(mgr->rs_pa, reqConsIdx, mgr->consumed_ptr);
+}
+
+static hwaddr
+pvscsi_rings_mgr_pop_cmp_descr(PVSCSIRingsMgr *mgr)
+{
+ /*
+ * According to Linux driver code it explicitly verifies that number
+ * of requests being processed by device is less then the size of
+ * completion queue, so device may omit completion queue overflow
+ * conditions check. We assume that this is true for other (Windows)
+ * drivers as well.
+ */
+
+ uint32_t free_cmp_ptr =
+ mgr->filled_cmp_ptr++ & mgr->rxr_len_mask;
+ uint32_t free_cmp_page =
+ free_cmp_ptr / PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
+ uint32_t inpage_idx =
+ free_cmp_ptr % PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
+ return mgr->cmp_ring_pages_pa[free_cmp_page] +
+ inpage_idx * sizeof(PVSCSIRingCmpDesc);
+}
+
+static hwaddr
+pvscsi_rings_mgr_pop_msg_descr(PVSCSIRingsMgr *mgr)
+{
+ uint32_t free_msg_ptr =
+ mgr->filled_msg_ptr++ & mgr->msg_len_mask;
+ uint32_t free_msg_page =
+ free_msg_ptr / PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
+ uint32_t inpage_idx =
+ free_msg_ptr % PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
+ return mgr->msg_ring_pages_pa[free_msg_page] +
+ inpage_idx * sizeof(PVSCSIRingMsgDesc);
+}
+
+static void
+pvscsi_rings_mgr_flush_cmp_ring(PVSCSIRingsMgr *mgr)
+{
+ /* Flush descriptor changes */
+ smp_wmb();
+
+ trace_pvscsi_rings_mgr_flush_cmp_ring(mgr->filled_cmp_ptr);
+
+ RS_SET_FIELD(mgr->rs_pa, cmpProdIdx, mgr->filled_cmp_ptr);
+}
+
+static bool
+pvscsi_rings_mgr_msg_has_room(PVSCSIRingsMgr *mgr)
+{
+ uint32_t prodIdx = RS_GET_FIELD(mgr->rs_pa, msgProdIdx);
+ uint32_t consIdx = RS_GET_FIELD(mgr->rs_pa, msgConsIdx);
+
+ return (prodIdx - consIdx) < (mgr->msg_len_mask + 1);
+}
+
+static void
+pvscsi_rings_mgr_flush_msg_ring(PVSCSIRingsMgr *mgr)
+{
+ /* Flush descriptor changes */
+ smp_wmb();
+
+ trace_pvscsi_rings_mgr_flush_msg_ring(mgr->filled_msg_ptr);
+
+ RS_SET_FIELD(mgr->rs_pa, msgProdIdx, mgr->filled_msg_ptr);
+}
+
+static void
+pvscsi_reset_state(PVSCSIState *s)
+{
+ s->curr_cmd = PVSCSI_CMD_FIRST;
+ s->curr_cmd_data_cntr = 0;
+ s->reg_command_status = PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
+ s->reg_interrupt_status = 0;
+ pvscsi_rings_mgr_cleanup(&s->rings);
+ s->rings_info_valid = FALSE;
+ s->msg_ring_info_valid = FALSE;
+ QTAILQ_INIT(&s->pending_queue);
+ QTAILQ_INIT(&s->completion_queue);
+}
+
+static void
+pvscsi_free_queue(PVSCSIRequestList *req_list)
+{
+ PVSCSIRequest *pvscsi_req;
+
+ while (!QTAILQ_EMPTY(req_list)) {
+ pvscsi_req = QTAILQ_FIRST(req_list);
+ QTAILQ_REMOVE(req_list, pvscsi_req, next);
+ g_free(pvscsi_req);
+ }
+}
+
+static void
+pvscsi_reset_adapter(PVSCSIState *s)
+{
+ qbus_reset_all_fn(&s->bus);
+ pvscsi_free_queue(&s->completion_queue);
+ assert(QTAILQ_EMPTY(&s->pending_queue));
+ pvscsi_reset_state(s);
+}
+
+static void
+pvscsi_update_irq_status(PVSCSIState *s)
+{
+ PCIDevice *d = PCI_DEVICE(s);
+ bool should_raise = s->reg_interrupt_enabled & s->reg_interrupt_status;
+
+ trace_pvscsi_update_irq_level(should_raise, s->reg_interrupt_enabled,
+ s->reg_interrupt_status);
+
+ if (s->msi_used && msi_enabled(d)) {
+ if (should_raise) {
+ trace_pvscsi_update_irq_msi();
+ msi_notify(d, PVSCSI_VECTOR_COMPLETION);
+ }
+ return;
+ }
+
+ qemu_set_irq(d->irq[0], !!should_raise);
+}
+
+static void
+pvscsi_raise_completion_interrupt(PVSCSIState *s)
+{
+ s->reg_interrupt_status |= PVSCSI_INTR_CMPL_0;
+
+ /* Memory barrier to flush interrupt status register changes*/
+ smp_wmb();
+
+ pvscsi_update_irq_status(s);
+}
+
+static void
+pvscsi_raise_message_interrupt(PVSCSIState *s)
+{
+ s->reg_interrupt_status |= PVSCSI_INTR_MSG_0;
+
+ /* Memory barrier to flush interrupt status register changes*/
+ smp_wmb();
+
+ pvscsi_update_irq_status(s);
+}
+
+static void
+pvscsi_cmp_ring_put(PVSCSIState *s, struct PVSCSIRingCmpDesc *cmp_desc)
+{
+ hwaddr cmp_descr_pa;
+
+ cmp_descr_pa = pvscsi_rings_mgr_pop_cmp_descr(&s->rings);
+ trace_pvscsi_cmp_ring_put(cmp_descr_pa);
+ cpu_physical_memory_write(cmp_descr_pa, (void *)cmp_desc,
+ sizeof(*cmp_desc));
+}
+
+static void
+pvscsi_msg_ring_put(PVSCSIState *s, struct PVSCSIRingMsgDesc *msg_desc)
+{
+ hwaddr msg_descr_pa;
+
+ msg_descr_pa = pvscsi_rings_mgr_pop_msg_descr(&s->rings);
+ trace_pvscsi_msg_ring_put(msg_descr_pa);
+ cpu_physical_memory_write(msg_descr_pa, (void *)msg_desc,
+ sizeof(*msg_desc));
+}
+
+static void
+pvscsi_process_completion_queue(void *opaque)
+{
+ PVSCSIState *s = opaque;
+ PVSCSIRequest *pvscsi_req;
+ bool has_completed = false;
+
+ while (!QTAILQ_EMPTY(&s->completion_queue)) {
+ pvscsi_req = QTAILQ_FIRST(&s->completion_queue);
+ QTAILQ_REMOVE(&s->completion_queue, pvscsi_req, next);
+ pvscsi_cmp_ring_put(s, &pvscsi_req->cmp);
+ g_free(pvscsi_req);
+ has_completed++;
+ }
+
+ if (has_completed) {
+ pvscsi_rings_mgr_flush_cmp_ring(&s->rings);
+ pvscsi_raise_completion_interrupt(s);
+ }
+}
+
+static void
+pvscsi_schedule_completion_processing(PVSCSIState *s)
+{
+ /* Try putting more complete requests on the ring. */
+ if (!QTAILQ_EMPTY(&s->completion_queue)) {
+ qemu_bh_schedule(s->completion_worker);
+ }
+}
+
+static void
+pvscsi_complete_request(PVSCSIState *s, PVSCSIRequest *r)
+{
+ assert(!r->completed);
+
+ trace_pvscsi_complete_request(r->cmp.context, r->cmp.dataLen,
+ r->sense_key);
+ if (r->sreq != NULL) {
+ scsi_req_unref(r->sreq);
+ r->sreq = NULL;
+ }
+ r->completed = 1;
+ QTAILQ_REMOVE(&s->pending_queue, r, next);
+ QTAILQ_INSERT_TAIL(&s->completion_queue, r, next);
+ pvscsi_schedule_completion_processing(s);
+}
+
+static QEMUSGList *pvscsi_get_sg_list(SCSIRequest *r)
+{
+ PVSCSIRequest *req = r->hba_private;
+
+ trace_pvscsi_get_sg_list(req->sgl.nsg, req->sgl.size);
+
+ return &req->sgl;
+}
+
+static void
+pvscsi_get_next_sg_elem(PVSCSISGState *sg)
+{
+ struct PVSCSISGElement elem;
+
+ for (;; sg->elemAddr = elem.addr) {
+ cpu_physical_memory_read(sg->elemAddr, (void *)&elem, sizeof(elem));
+ if ((elem.flags & ~PVSCSI_KNOWN_FLAGS) != 0) {
+ /*
+ * There is PVSCSI_SGE_FLAG_CHAIN_ELEMENT flag described in
+ * header file but its value is unknown. This flag requires
+ * additional processing, so we put warning here to catch it
+ * some day and make proper implementation
+ */
+ trace_pvscsi_get_next_sg_elem(elem.flags);
+ }
+ break;
+ }
+
+ sg->elemAddr += sizeof(elem);
+ sg->dataAddr = elem.addr;
+ sg->resid = elem.length;
+}
+
+static void
+pvscsi_write_sense(PVSCSIRequest *r, uint8_t *sense, int len)
+{
+ r->cmp.senseLen = MIN(r->req.senseLen, len);
+ r->sense_key = sense[2];
+ cpu_physical_memory_write(r->req.senseAddr, sense, r->cmp.senseLen);
+}
+
+static void
+pvscsi_command_complete(SCSIRequest *req, uint32_t status, size_t resid)
+{
+ PVSCSIRequest *pvscsi_req = req->hba_private;
+ PVSCSIState *s = pvscsi_req->dev;
+
+ if (!pvscsi_req) {
+ trace_pvscsi_command_complete_not_found(req->tag);
+ return;
+ }
+
+ if (resid) {
+ /* Short transfer. */
+ trace_pvscsi_command_complete_data_run();
+ pvscsi_req->cmp.hostStatus = BTSTAT_DATARUN;
+ }
+
+ pvscsi_req->cmp.scsiStatus = status;
+ if (pvscsi_req->cmp.scsiStatus == CHECK_CONDITION) {
+ uint8_t sense[SCSI_SENSE_BUF_SIZE];
+ int sense_len =
+ scsi_req_get_sense(pvscsi_req->sreq, sense, sizeof(sense));
+
+ trace_pvscsi_command_complete_sense_len(sense_len);
+ pvscsi_write_sense(pvscsi_req, sense, sense_len);
+ }
+ qemu_sglist_destroy(&pvscsi_req->sgl);
+ pvscsi_complete_request(s, pvscsi_req);
+}
+
+static void
+pvscsi_send_msg(PVSCSIState *s, SCSIDevice *dev, uint32_t msg_type)
+{
+ if (s->msg_ring_info_valid && pvscsi_rings_mgr_msg_has_room(&s->rings)) {
+ PVSCSIMsgDescDevStatusChanged msg = {0};
+
+ msg.type = msg_type;
+ msg.bus = dev->channel;
+ msg.target = dev->id;
+ msg.lun[1] = dev->lun;
+
+ pvscsi_msg_ring_put(s, (PVSCSIRingMsgDesc *)&msg);
+ pvscsi_rings_mgr_flush_msg_ring(&s->rings);
+ pvscsi_raise_message_interrupt(s);
+ }
+}
+
+static void
+pvscsi_hotplug(SCSIBus *bus, SCSIDevice *dev)
+{
+ PVSCSIState *s = container_of(bus, PVSCSIState, bus);
+ pvscsi_send_msg(s, dev, PVSCSI_MSG_DEV_ADDED);
+}
+
+static void
+pvscsi_hot_unplug(SCSIBus *bus, SCSIDevice *dev)
+{
+ PVSCSIState *s = container_of(bus, PVSCSIState, bus);
+ pvscsi_send_msg(s, dev, PVSCSI_MSG_DEV_REMOVED);
+}
+
+static void
+pvscsi_request_cancelled(SCSIRequest *req)
+{
+ PVSCSIRequest *pvscsi_req = req->hba_private;
+ PVSCSIState *s = pvscsi_req->dev;
+
+ if (pvscsi_req->cmp.hostStatus == BTSTAT_SUCCESS) {
+ pvscsi_req->cmp.hostStatus = BTSTAT_ABORTQUEUE;
+ }
+ pvscsi_complete_request(s, pvscsi_req);
+}
+
+static SCSIDevice*
+pvscsi_device_find(PVSCSIState *s, int channel, int target,
+ uint8_t *requested_lun, uint8_t *target_lun)
+{
+ if (requested_lun[0] || requested_lun[2] || requested_lun[3] ||
+ requested_lun[4] || requested_lun[5] || requested_lun[6] ||
+ requested_lun[7] || (target > PVSCSI_MAX_DEVS)) {
+ return NULL;
+ } else {
+ *target_lun = requested_lun[1];
+ return scsi_device_find(&s->bus, channel, target, *target_lun);
+ }
+}
+
+static PVSCSIRequest *
+pvscsi_queue_pending_descriptor(PVSCSIState *s, SCSIDevice **d,
+ struct PVSCSIRingReqDesc *descr)
+{
+ PVSCSIRequest *pvscsi_req;
+ uint8_t lun;
+
+ pvscsi_req = g_malloc0(sizeof(*pvscsi_req));
+ pvscsi_req->dev = s;
+ pvscsi_req->req = *descr;
+ pvscsi_req->cmp.context = pvscsi_req->req.context;
+ QTAILQ_INSERT_TAIL(&s->pending_queue, pvscsi_req, next);
+
+ *d = pvscsi_device_find(s, descr->bus, descr->target, descr->lun, &lun);
+ if (!*d) {
+ return pvscsi_req;
+ }
+
+ pvscsi_req->lun = lun;
+ return pvscsi_req;
+}
+
+static void
+pvscsi_convert_sglist(PVSCSIRequest *r)
+{
+ int chunk_size;
+ uint64_t data_length = r->req.dataLen;
+ PVSCSISGState sg = r->sg;
+ while (data_length) {
+ while (!sg.resid) {
+ pvscsi_get_next_sg_elem(&sg);
+ trace_pvscsi_convert_sglist(r->req.context, r->sg.dataAddr,
+ r->sg.resid);
+ }
+ assert(data_length > 0);
+ chunk_size = MIN((unsigned) data_length, sg.resid);
+ if (chunk_size) {
+ qemu_sglist_add(&r->sgl, sg.dataAddr, chunk_size);
+ }
+
+ sg.dataAddr += chunk_size;
+ data_length -= chunk_size;
+ sg.resid -= chunk_size;
+ }
+}
+
+static void
+pvscsi_build_sglist(PVSCSIState *s, PVSCSIRequest *r)
+{
+ PCIDevice *d = PCI_DEVICE(s);
+
+ qemu_sglist_init(&r->sgl, 1, pci_dma_context(d));
+ if (r->req.flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
+ pvscsi_convert_sglist(r);
+ } else {
+ qemu_sglist_add(&r->sgl, r->req.dataAddr, r->req.dataLen);
+ }
+}
+
+static void
+pvscsi_process_request_descriptor(PVSCSIState *s,
+ struct PVSCSIRingReqDesc *descr)
+{
+ SCSIDevice *d;
+ PVSCSIRequest *r = pvscsi_queue_pending_descriptor(s, &d, descr);
+ int64_t n;
+
+ trace_pvscsi_process_req_descr(descr->cdb[0], descr->context);
+
+ if (!d) {
+ r->cmp.hostStatus = BTSTAT_SELTIMEO;
+ trace_pvscsi_process_req_descr_unknown_device();
+ pvscsi_complete_request(s, r);
+ return;
+ }
+
+ if (descr->flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
+ r->sg.elemAddr = descr->dataAddr;
+ }
+
+ r->sreq = scsi_req_new(d, descr->context, r->lun, descr->cdb, r);
+ if (r->sreq->cmd.mode == SCSI_XFER_FROM_DEV &&
+ (descr->flags & PVSCSI_FLAG_CMD_DIR_TODEVICE)) {
+ r->cmp.hostStatus = BTSTAT_BADMSG;
+ trace_pvscsi_process_req_descr_invalid_dir();
+ scsi_req_cancel(r->sreq);
+ return;
+ }
+ if (r->sreq->cmd.mode == SCSI_XFER_TO_DEV &&
+ (descr->flags & PVSCSI_FLAG_CMD_DIR_TOHOST)) {
+ r->cmp.hostStatus = BTSTAT_BADMSG;
+ trace_pvscsi_process_req_descr_invalid_dir();
+ scsi_req_cancel(r->sreq);
+ return;
+ }
+
+ pvscsi_build_sglist(s, r);
+ n = scsi_req_enqueue(r->sreq);
+
+ if (n) {
+ scsi_req_continue(r->sreq);
+ }
+}
+
+static void
+pvscsi_process_io(PVSCSIState *s)
+{
+ PVSCSIRingReqDesc descr;
+ hwaddr next_descr_pa;
+
+ assert(s->rings_info_valid);
+ while ((next_descr_pa = pvscsi_rings_mgr_pop_req_descr(&s->rings)) != 0) {
+
+ /* Only read after production index verification */
+ smp_rmb();
+
+ trace_pvscsi_process_io(next_descr_pa);
+ cpu_physical_memory_read(next_descr_pa, &descr, sizeof(descr));
+ pvscsi_process_request_descriptor(s, &descr);
+ }
+
+ pvscsi_rings_mgr_flush_req_ring(&s->rings);
+}
+
+static void
+pvscsi_dbg_dump_tx_rings_config(PVSCSICmdDescSetupRings *rc)
+{
+ int i;
+ trace_pvscsi_tx_rings_ppn("Rings State", rc->ringsStatePPN);
+
+ trace_pvscsi_tx_rings_num_pages("Request Ring", rc->reqRingNumPages);
+ for (i = 0; i < rc->reqRingNumPages; i++) {
+ trace_pvscsi_tx_rings_ppn("Request Ring", rc->reqRingPPNs[i]);
+ }
+
+ trace_pvscsi_tx_rings_num_pages("Confirm Ring", rc->cmpRingNumPages);
+ for (i = 0; i < rc->cmpRingNumPages; i++) {
+ trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->reqRingPPNs[i]);
+ }
+}
+
+static uint64_t
+pvscsi_on_cmd_config(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_CONFIG");
+ return PVSCSI_COMMAND_PROCESSING_FAILED;
+}
+
+static uint64_t
+pvscsi_on_cmd_unplug(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_DEVICE_UNPLUG");
+ return PVSCSI_COMMAND_PROCESSING_FAILED;
+}
+
+static uint64_t
+pvscsi_on_issue_scsi(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_ISSUE_SCSI");
+ return PVSCSI_COMMAND_PROCESSING_FAILED;
+}
+
+static uint64_t
+pvscsi_on_cmd_setup_rings(PVSCSIState *s)
+{
+ PVSCSICmdDescSetupRings *rc =
+ (PVSCSICmdDescSetupRings *) s->curr_cmd_data;
+
+ trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_RINGS");
+
+ pvscsi_dbg_dump_tx_rings_config(rc);
+ pvscsi_rings_mgr_init_data(&s->rings, rc);
+ s->rings_info_valid = TRUE;
+ return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
+}
+
+static uint64_t
+pvscsi_on_cmd_abort(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_abort(
+ ((struct PVSCSICmdDescAbortCmd *) s->curr_cmd_data)->context,
+ ((struct PVSCSICmdDescAbortCmd *) s->curr_cmd_data)->target);
+ return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
+}
+
+static uint64_t
+pvscsi_on_cmd_unknown(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_unknown_data(s->curr_cmd_data[0]);
+ return PVSCSI_COMMAND_PROCESSING_FAILED;
+}
+
+static uint64_t
+pvscsi_on_cmd_reset_device(PVSCSIState *s)
+{
+ uint8_t target_lun = 0;
+ struct PVSCSICmdDescResetDevice *cmd =
+ (struct PVSCSICmdDescResetDevice *) s->curr_cmd_data;
+ SCSIDevice *sdev;
+
+ sdev = pvscsi_device_find(s, 0, cmd->target, cmd->lun, &target_lun);
+
+ trace_pvscsi_on_cmd_reset_dev(cmd->target, (int) target_lun, sdev);
+
+ if (sdev != NULL) {
+ device_reset(&sdev->qdev);
+ return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
+ }
+
+ return PVSCSI_COMMAND_PROCESSING_FAILED;
+}
+
+static uint64_t
+pvscsi_on_cmd_reset_bus(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_RESET_BUS");
+
+ qbus_reset_all_fn(&s->bus);
+ return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
+}
+
+static uint64_t
+pvscsi_on_cmd_setup_msg_ring(PVSCSIState *s)
+{
+ PVSCSICmdDescSetupMsgRing *rc =
+ (PVSCSICmdDescSetupMsgRing *) s->curr_cmd_data;
+
+ trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_MSG_RING");
+
+ if (s->rings_info_valid) {
+ pvscsi_rings_mgr_init_msg(&s->rings, rc);
+ s->msg_ring_info_valid = TRUE;
+ }
+ return sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t);
+}
+
+static uint64_t
+pvscsi_on_cmd_adapter_reset(PVSCSIState *s)
+{
+ trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_ADAPTER_RESET");
+
+ pvscsi_reset_adapter(s);
+ return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
+}
+
+static const struct {
+ int data_size;
+ uint64_t (*handler_fn)(PVSCSIState *s);
+} pvscsi_commands[] = {
+ [PVSCSI_CMD_FIRST] = {
+ .data_size = 0,
+ .handler_fn = pvscsi_on_cmd_unknown,
+ },
+
+ /* Not implemented, data size defined based on what arrives on windows */
+ [PVSCSI_CMD_CONFIG] = {
+ .data_size = 6 * sizeof(uint32_t),
+ .handler_fn = pvscsi_on_cmd_config,
+ },
+
+ /* Command not implemented, data size is unknown */
+ [PVSCSI_CMD_ISSUE_SCSI] = {
+ .data_size = 0,
+ .handler_fn = pvscsi_on_issue_scsi,
+ },
+
+ /* Command not implemented, data size is unknown */
+ [PVSCSI_CMD_DEVICE_UNPLUG] = {
+ .data_size = 0,
+ .handler_fn = pvscsi_on_cmd_unplug,
+ },
+
+ [PVSCSI_CMD_SETUP_RINGS] = {
+ .data_size = sizeof(PVSCSICmdDescSetupRings),
+ .handler_fn = pvscsi_on_cmd_setup_rings,
+ },
+
+ [PVSCSI_CMD_RESET_DEVICE] = {
+ .data_size = sizeof(struct PVSCSICmdDescResetDevice),
+ .handler_fn = pvscsi_on_cmd_reset_device,
+ },
+
+ [PVSCSI_CMD_RESET_BUS] = {
+ .data_size = 0,
+ .handler_fn = pvscsi_on_cmd_reset_bus,
+ },
+
+ [PVSCSI_CMD_SETUP_MSG_RING] = {
+ .data_size = sizeof(PVSCSICmdDescSetupMsgRing),
+ .handler_fn = pvscsi_on_cmd_setup_msg_ring,
+ },
+
+ [PVSCSI_CMD_ADAPTER_RESET] = {
+ .data_size = 0,
+ .handler_fn = pvscsi_on_cmd_adapter_reset,
+ },
+
+ [PVSCSI_CMD_ABORT_CMD] = {
+ .data_size = sizeof(struct PVSCSICmdDescAbortCmd),
+ .handler_fn = pvscsi_on_cmd_abort,
+ },
+};
+
+static void
+pvscsi_do_command_processing(PVSCSIState *s)
+{
+ size_t bytes_arrived = s->curr_cmd_data_cntr * sizeof(uint32_t);
+
+ assert(s->curr_cmd < PVSCSI_CMD_LAST);
+ if (bytes_arrived >= pvscsi_commands[s->curr_cmd].data_size) {
+ s->reg_command_status = pvscsi_commands[s->curr_cmd].handler_fn(s);
+ s->curr_cmd = PVSCSI_CMD_FIRST;
+ s->curr_cmd_data_cntr = 0;
+ }
+}
+
+static void
+pvscsi_on_command_data(PVSCSIState *s, uint32_t value)
+{
+ size_t bytes_arrived = s->curr_cmd_data_cntr * sizeof(uint32_t);
+
+ assert(bytes_arrived < sizeof(s->curr_cmd_data));
+ s->curr_cmd_data[s->curr_cmd_data_cntr++] = value;
+
+ pvscsi_do_command_processing(s);
+}
+
+static void
+pvscsi_on_command(PVSCSIState *s, uint64_t cmd_id)
+{
+ if ((cmd_id > PVSCSI_CMD_FIRST) && (cmd_id < PVSCSI_CMD_LAST)) {
+ s->curr_cmd = cmd_id;
+ } else {
+ s->curr_cmd = PVSCSI_CMD_FIRST;
+ trace_pvscsi_on_cmd_unknown(cmd_id);
+ }
+
+ s->curr_cmd_data_cntr = 0;
+ s->reg_command_status = PVSCSI_COMMAND_NOT_ENOUGH_DATA;
+
+ pvscsi_do_command_processing(s);
+}
+
+static void
+pvscsi_io_write(void *opaque, hwaddr addr,
+ uint64_t val, unsigned size)
+{
+ PVSCSIState *s = opaque;
+
+ switch (addr) {
+ case PVSCSI_REG_OFFSET_COMMAND:
+ pvscsi_on_command(s, val);
+ break;
+
+ case PVSCSI_REG_OFFSET_COMMAND_DATA:
+ pvscsi_on_command_data(s, (uint32_t) val);
+ break;
+
+ case PVSCSI_REG_OFFSET_INTR_STATUS:
+ trace_pvscsi_io_write("PVSCSI_REG_OFFSET_INTR_STATUS", val);
+ s->reg_interrupt_status &= ~val;
+ pvscsi_update_irq_status(s);
+ pvscsi_schedule_completion_processing(s);
+ break;
+
+ case PVSCSI_REG_OFFSET_INTR_MASK:
+ trace_pvscsi_io_write("PVSCSI_REG_OFFSET_INTR_MASK", val);
+ s->reg_interrupt_enabled = val;
+ pvscsi_update_irq_status(s);
+ break;
+
+ case PVSCSI_REG_OFFSET_KICK_NON_RW_IO:
+ trace_pvscsi_io_write("PVSCSI_REG_OFFSET_KICK_NON_RW_IO", val);
+ pvscsi_process_io(s);
+ break;
+
+ case PVSCSI_REG_OFFSET_KICK_RW_IO:
+ trace_pvscsi_io_write("PVSCSI_REG_OFFSET_KICK_RW_IO", val);
+ pvscsi_process_io(s);
+ break;
+
+ case PVSCSI_REG_OFFSET_DEBUG:
+ trace_pvscsi_io_write("PVSCSI_REG_OFFSET_DEBUG", val);
+ break;
+
+ default:
+ trace_pvscsi_io_write_unknown(addr, size, val);
+ break;
+ }
+
+}
+
+static uint64_t
+pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
+{
+ PVSCSIState *s = opaque;
+
+ switch (addr) {
+ case PVSCSI_REG_OFFSET_INTR_STATUS:
+ trace_pvscsi_io_read("PVSCSI_REG_OFFSET_INTR_STATUS",
+ s->reg_interrupt_status);
+ return s->reg_interrupt_status;
+
+ case PVSCSI_REG_OFFSET_INTR_MASK:
+ trace_pvscsi_io_read("PVSCSI_REG_OFFSET_INTR_MASK",
+ s->reg_interrupt_status);
+ return s->reg_interrupt_enabled;
+
+ case PVSCSI_REG_OFFSET_COMMAND_STATUS:
+ trace_pvscsi_io_read("PVSCSI_REG_OFFSET_COMMAND_STATUS",
+ s->reg_interrupt_status);
+ return s->reg_command_status;
+
+ default:
+ trace_pvscsi_io_read_unknown(addr, size);
+ return 0;
+ }
+}
+
+
+static bool
+pvscsi_init_msi(PVSCSIState *s)
+{
+ int res;
+ PCIDevice *d = PCI_DEVICE(s);
+
+ res = msi_init(d, PVSCSI_MSI_OFFSET, PVSCSI_MSIX_NUM_VECTORS,
+ PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK);
+ if (res < 0) {
+ trace_pvscsi_init_msi_fail(res);
+ s->msi_used = false;
+ } else {
+ s->msi_used = true;
+ }
+
+ return s->msi_used;
+}
+
+static void
+pvscsi_cleanup_msi(PVSCSIState *s)
+{
+ PCIDevice *d = PCI_DEVICE(s);
+
+ if (s->msi_used) {
+ msi_uninit(d);
+ }
+}
+
+static const MemoryRegionOps pvscsi_ops = {
+ .read = pvscsi_io_read,
+ .write = pvscsi_io_write,
+ .endianness = DEVICE_LITTLE_ENDIAN,
+ .impl = {
+ .min_access_size = 4,
+ .max_access_size = 4,
+ },
+};
+
+static const struct SCSIBusInfo pvscsi_scsi_info = {
+ .tcq = true,
+ .max_target = PVSCSI_MAX_DEVS,
+ .max_channel = 0,
+ .max_lun = 0,
+
+ .get_sg_list = pvscsi_get_sg_list,
+ .complete = pvscsi_command_complete,
+ .cancel = pvscsi_request_cancelled,
+ .hotplug = pvscsi_hotplug,
+ .hot_unplug = pvscsi_hot_unplug,
+};
+
+static int
+pvscsi_init(PCIDevice *pci_dev)
+{
+ PVSCSIState *s = PVSCSI(pci_dev);
+
+ trace_pvscsi_state("init");
+
+ /* PCI subsystem ID */
+ pci_dev->config[PCI_SUBSYSTEM_ID] = 0x00;
+ pci_dev->config[PCI_SUBSYSTEM_ID + 1] = 0x10;
+
+ /* PCI latency timer = 255 */
+ pci_dev->config[PCI_LATENCY_TIMER] = 0xff;
+
+ /* Interrupt pin A */
+ pci_config_set_interrupt_pin(pci_dev->config, 1);
+
+ memory_region_init_io(&s->io_space, &pvscsi_ops, s,
+ "pvscsi-io", PVSCSI_MEM_SPACE_SIZE);
+ pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->io_space);
+
+ pvscsi_init_msi(s);
+
+ s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
+ if (!s->completion_worker) {
+ pvscsi_cleanup_msi(s);
+ memory_region_destroy(&s->io_space);
+ return -ENOMEM;
+ }
+
+ scsi_bus_new(&s->bus, &pci_dev->qdev, &pvscsi_scsi_info);
+ pvscsi_reset_state(s);
+
+ return 0;
+}
+
+static void
+pvscsi_uninit(PCIDevice *pci_dev)
+{
+ PVSCSIState *s = PVSCSI(pci_dev);
+
+ trace_pvscsi_state("uninit");
+ qemu_bh_delete(s->completion_worker);
+
+ pvscsi_cleanup_msi(s);
+
+ memory_region_destroy(&s->io_space);
+}
+
+static void
+pvscsi_reset(DeviceState *dev)
+{
+ PCIDevice *d = PCI_DEVICE(dev);
+ PVSCSIState *s = PVSCSI(d);
+
+ trace_pvscsi_state("reset");
+ pvscsi_reset_adapter(s);
+}
+
+static void
+pvscsi_pre_save(void *opaque)
+{
+ PVSCSIState *s = (PVSCSIState *) opaque;
+
+ trace_pvscsi_state("presave");
+
+ assert(QTAILQ_EMPTY(&s->pending_queue));
+ assert(QTAILQ_EMPTY(&s->completion_queue));
+}
+
+static int
+pvscsi_post_load(void *opaque, int version_id)
+{
+ trace_pvscsi_state("postload");
+ return 0;
+}
+
+static const VMStateDescription vmstate_pvscsi = {
+ .name = TYPE_PVSCSI,
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .minimum_version_id_old = 0,
+ .pre_save = pvscsi_pre_save,
+ .post_load = pvscsi_post_load,
+ .fields = (VMStateField[]) {
+ VMSTATE_PCI_DEVICE(parent_obj, PVSCSIState),
+ VMSTATE_UINT8(msi_used, PVSCSIState),
+ VMSTATE_UINT64(reg_interrupt_status, PVSCSIState),
+ VMSTATE_UINT64(reg_interrupt_enabled, PVSCSIState),
+ VMSTATE_UINT64(reg_command_status, PVSCSIState),
+ VMSTATE_UINT64(curr_cmd, PVSCSIState),
+ VMSTATE_UINT32(curr_cmd_data_cntr, PVSCSIState),
+ VMSTATE_UINT32_ARRAY(curr_cmd_data, PVSCSIState,
+ ARRAY_SIZE(((PVSCSIState *)NULL)->curr_cmd_data)),
+ VMSTATE_UINT8(rings_info_valid, PVSCSIState),
+ VMSTATE_UINT8(msg_ring_info_valid, PVSCSIState),
+
+ VMSTATE_UINT64(rings.rs_pa, PVSCSIState),
+ VMSTATE_UINT32(rings.txr_len_mask, PVSCSIState),
+ VMSTATE_UINT32(rings.rxr_len_mask, PVSCSIState),
+ VMSTATE_UINT64_ARRAY(rings.req_ring_pages_pa, PVSCSIState,
+ PVSCSI_SETUP_RINGS_MAX_NUM_PAGES),
+ VMSTATE_UINT64_ARRAY(rings.cmp_ring_pages_pa, PVSCSIState,
+ PVSCSI_SETUP_RINGS_MAX_NUM_PAGES),
+ VMSTATE_UINT64(rings.consumed_ptr, PVSCSIState),
+ VMSTATE_UINT64(rings.filled_cmp_ptr, PVSCSIState),
+
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+static void
+pvscsi_write_config(PCIDevice *pci, uint32_t addr, uint32_t val, int len)
+{
+ pci_default_write_config(pci, addr, val, len);
+ msi_write_config(pci, addr, val, len);
+}
+
+static void pvscsi_class_init(ObjectClass *klass, void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+ PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+ k->init = pvscsi_init;
+ k->exit = pvscsi_uninit;
+ k->vendor_id = PCI_VENDOR_ID_VMWARE;
+ k->device_id = PCI_DEVICE_ID_VMWARE_PVSCSI;
+ k->class_id = PCI_CLASS_STORAGE_SCSI;
+ k->subsystem_id = 0x1000;
+ dc->reset = pvscsi_reset;
+ dc->vmsd = &vmstate_pvscsi;
+ k->config_write = pvscsi_write_config;
+}
+
+static const TypeInfo pvscsi_info = {
+ .name = "pvscsi",
+ .parent = TYPE_PCI_DEVICE,
+ .instance_size = sizeof(PVSCSIState),
+ .class_init = pvscsi_class_init,
+};
+
+static void
+pvscsi_register_types(void)
+{
+ type_register_static(&pvscsi_info);
+
+ trace_pvscsi_register();
+}
+
+type_init(pvscsi_register_types);
diff --git a/hw/vmw_pvscsi.h b/hw/vmw_pvscsi.h
new file mode 100644
index 0000000..17fcf66
--- /dev/null
+++ b/hw/vmw_pvscsi.h
@@ -0,0 +1,434 @@
+/*
+ * VMware PVSCSI header file
+ *
+ * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Maintained by: Arvind Kumar <arvindkumar@vmware.com>
+ *
+ */
+
+#ifndef VMW_PVSCSI_H
+#define VMW_PVSCSI_H
+
+#define VMW_PAGE_SIZE (4096)
+#define VMW_PAGE_SHIFT (12)
+
+#define MASK(n) ((1 << (n)) - 1) /* make an n-bit mask */
+
+/*
+ * host adapter status/error codes
+ */
+enum HostBusAdapterStatus {
+ BTSTAT_SUCCESS = 0x00, /* CCB complete normally with no errors */
+ BTSTAT_LINKED_COMMAND_COMPLETED = 0x0a,
+ BTSTAT_LINKED_COMMAND_COMPLETED_WITH_FLAG = 0x0b,
+ BTSTAT_DATA_UNDERRUN = 0x0c,
+ BTSTAT_SELTIMEO = 0x11, /* SCSI selection timeout */
+ BTSTAT_DATARUN = 0x12, /* data overrun/underrun */
+ BTSTAT_BUSFREE = 0x13, /* unexpected bus free */
+ BTSTAT_INVPHASE = 0x14, /* invalid bus phase or sequence */
+ /* requested by target */
+ BTSTAT_LUNMISMATCH = 0x17, /* linked CCB has different LUN */
+ /* from first CCB */
+ BTSTAT_SENSFAILED = 0x1b, /* auto request sense failed */
+ BTSTAT_TAGREJECT = 0x1c, /* SCSI II tagged queueing message */
+ /* rejected by target */
+ BTSTAT_BADMSG = 0x1d, /* unsupported message received by */
+ /* the host adapter */
+ BTSTAT_HAHARDWARE = 0x20, /* host adapter hardware failed */
+ BTSTAT_NORESPONSE = 0x21, /* target did not respond to SCSI ATN, */
+ /* sent a SCSI RST */
+ BTSTAT_SENTRST = 0x22, /* host adapter asserted a SCSI RST */
+ BTSTAT_RECVRST = 0x23, /* other SCSI devices asserted a SCSI RST */
+ BTSTAT_DISCONNECT = 0x24, /* target device reconnected improperly */
+ /* (w/o tag) */
+ BTSTAT_BUSRESET = 0x25, /* host adapter issued BUS device reset */
+ BTSTAT_ABORTQUEUE = 0x26, /* abort queue generated */
+ BTSTAT_HASOFTWARE = 0x27, /* host adapter software error */
+ BTSTAT_HATIMEOUT = 0x30, /* host adapter hardware timeout error */
+ BTSTAT_SCSIPARITY = 0x34, /* SCSI parity error detected */
+};
+
+/*
+ * Register offsets.
+ *
+ * These registers are accessible both via i/o space and mm i/o.
+ */
+
+enum PVSCSIRegOffset {
+ PVSCSI_REG_OFFSET_COMMAND = 0x0,
+ PVSCSI_REG_OFFSET_COMMAND_DATA = 0x4,
+ PVSCSI_REG_OFFSET_COMMAND_STATUS = 0x8,
+ PVSCSI_REG_OFFSET_LAST_STS_0 = 0x100,
+ PVSCSI_REG_OFFSET_LAST_STS_1 = 0x104,
+ PVSCSI_REG_OFFSET_LAST_STS_2 = 0x108,
+ PVSCSI_REG_OFFSET_LAST_STS_3 = 0x10c,
+ PVSCSI_REG_OFFSET_INTR_STATUS = 0x100c,
+ PVSCSI_REG_OFFSET_INTR_MASK = 0x2010,
+ PVSCSI_REG_OFFSET_KICK_NON_RW_IO = 0x3014,
+ PVSCSI_REG_OFFSET_DEBUG = 0x3018,
+ PVSCSI_REG_OFFSET_KICK_RW_IO = 0x4018,
+};
+
+/*
+ * Virtual h/w commands.
+ */
+
+enum PVSCSICommands {
+ PVSCSI_CMD_FIRST = 0, /* has to be first */
+
+ PVSCSI_CMD_ADAPTER_RESET = 1,
+ PVSCSI_CMD_ISSUE_SCSI = 2,
+ PVSCSI_CMD_SETUP_RINGS = 3,
+ PVSCSI_CMD_RESET_BUS = 4,
+ PVSCSI_CMD_RESET_DEVICE = 5,
+ PVSCSI_CMD_ABORT_CMD = 6,
+ PVSCSI_CMD_CONFIG = 7,
+ PVSCSI_CMD_SETUP_MSG_RING = 8,
+ PVSCSI_CMD_DEVICE_UNPLUG = 9,
+
+ PVSCSI_CMD_LAST = 10 /* has to be last */
+};
+
+#define PVSCSI_COMMAND_PROCESSING_SUCCEEDED (0)
+#define PVSCSI_COMMAND_PROCESSING_FAILED (-1)
+#define PVSCSI_COMMAND_NOT_ENOUGH_DATA (-2)
+
+/*
+ * Command descriptor for PVSCSI_CMD_RESET_DEVICE --
+ */
+
+struct PVSCSICmdDescResetDevice {
+ uint32_t target;
+ uint8_t lun[8];
+} QEMU_PACKED;
+
+typedef struct PVSCSICmdDescResetDevice PVSCSICmdDescResetDevice;
+
+/*
+ * Command descriptor for PVSCSI_CMD_ABORT_CMD --
+ *
+ * - currently does not support specifying the LUN.
+ * - pad should be 0.
+ */
+
+struct PVSCSICmdDescAbortCmd {
+ uint64_t context;
+ uint32_t target;
+ uint32_t pad;
+} QEMU_PACKED;
+
+typedef struct PVSCSICmdDescAbortCmd PVSCSICmdDescAbortCmd;
+
+/*
+ * Command descriptor for PVSCSI_CMD_SETUP_RINGS --
+ *
+ * Notes:
+ * - reqRingNumPages and cmpRingNumPages need to be power of two.
+ * - reqRingNumPages and cmpRingNumPages need to be different from 0,
+ * - reqRingNumPages and cmpRingNumPages need to be inferior to
+ * PVSCSI_SETUP_RINGS_MAX_NUM_PAGES.
+ */
+
+#define PVSCSI_SETUP_RINGS_MAX_NUM_PAGES 32
+struct PVSCSICmdDescSetupRings {
+ uint32_t reqRingNumPages;
+ uint32_t cmpRingNumPages;
+ uint64_t ringsStatePPN;
+ uint64_t reqRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
+ uint64_t cmpRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
+} QEMU_PACKED;
+
+typedef struct PVSCSICmdDescSetupRings PVSCSICmdDescSetupRings;
+
+/*
+ * Command descriptor for PVSCSI_CMD_SETUP_MSG_RING --
+ *
+ * Notes:
+ * - this command was not supported in the initial revision of the h/w
+ * interface. Before using it, you need to check that it is supported by
+ * writing PVSCSI_CMD_SETUP_MSG_RING to the 'command' register, then
+ * immediately after read the 'command status' register:
+ * * a value of -1 means that the cmd is NOT supported,
+ * * a value != -1 means that the cmd IS supported.
+ * If it's supported the 'command status' register should return:
+ * sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t).
+ * - this command should be issued _after_ the usual SETUP_RINGS so that the
+ * RingsState page is already setup. If not, the command is a nop.
+ * - numPages needs to be a power of two,
+ * - numPages needs to be different from 0,
+ * - pad should be zero.
+ */
+
+#define PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES 16
+
+struct PVSCSICmdDescSetupMsgRing {
+ uint32_t numPages;
+ uint32_t pad;
+ uint64_t ringPPNs[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
+} QEMU_PACKED;
+
+typedef struct PVSCSICmdDescSetupMsgRing PVSCSICmdDescSetupMsgRing;
+
+enum PVSCSIMsgType {
+ PVSCSI_MSG_DEV_ADDED = 0,
+ PVSCSI_MSG_DEV_REMOVED = 1,
+ PVSCSI_MSG_LAST = 2,
+};
+
+/*
+ * Msg descriptor.
+ *
+ * sizeof(struct PVSCSIRingMsgDesc) == 128.
+ *
+ * - type is of type enum PVSCSIMsgType.
+ * - the content of args depend on the type of event being delivered.
+ */
+
+struct PVSCSIRingMsgDesc {
+ uint32_t type;
+ uint32_t args[31];
+} QEMU_PACKED;
+
+typedef struct PVSCSIRingMsgDesc PVSCSIRingMsgDesc;
+
+struct PVSCSIMsgDescDevStatusChanged {
+ uint32_t type; /* PVSCSI_MSG_DEV _ADDED / _REMOVED */
+ uint32_t bus;
+ uint32_t target;
+ uint8_t lun[8];
+ uint32_t pad[27];
+} QEMU_PACKED;
+
+typedef struct PVSCSIMsgDescDevStatusChanged PVSCSIMsgDescDevStatusChanged;
+
+/*
+ * Rings state.
+ *
+ * - the fields:
+ * . msgProdIdx,
+ * . msgConsIdx,
+ * . msgNumEntriesLog2,
+ * .. are only used once the SETUP_MSG_RING cmd has been issued.
+ * - 'pad' helps to ensure that the msg related fields are on their own
+ * cache-line.
+ */
+
+struct PVSCSIRingsState {
+ uint32_t reqProdIdx;
+ uint32_t reqConsIdx;
+ uint32_t reqNumEntriesLog2;
+
+ uint32_t cmpProdIdx;
+ uint32_t cmpConsIdx;
+ uint32_t cmpNumEntriesLog2;
+
+ uint8_t pad[104];
+
+ uint32_t msgProdIdx;
+ uint32_t msgConsIdx;
+ uint32_t msgNumEntriesLog2;
+} QEMU_PACKED;
+
+typedef struct PVSCSIRingsState PVSCSIRingsState;
+
+/*
+ * Request descriptor.
+ *
+ * sizeof(RingReqDesc) = 128
+ *
+ * - context: is a unique identifier of a command. It could normally be any
+ * 64bit value, however we currently store it in the serialNumber variable
+ * of struct SCSI_Command, so we have the following restrictions due to the
+ * way this field is handled in the vmkernel storage stack:
+ * * this value can't be 0,
+ * * the upper 32bit need to be 0 since serialNumber is as a uint32_t.
+ * Currently tracked as PR 292060.
+ * - dataLen: contains the total number of bytes that need to be transferred.
+ * - dataAddr:
+ * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is set: dataAddr is the PA of the first
+ * s/g table segment, each s/g segment is entirely contained on a single
+ * page of physical memory,
+ * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is NOT set, then dataAddr is the PA of
+ * the buffer used for the DMA transfer,
+ * - flags:
+ * * PVSCSI_FLAG_CMD_WITH_SG_LIST: see dataAddr above,
+ * * PVSCSI_FLAG_CMD_DIR_NONE: no DMA involved,
+ * * PVSCSI_FLAG_CMD_DIR_TOHOST: transfer from device to main memory,
+ * * PVSCSI_FLAG_CMD_DIR_TODEVICE: transfer from main memory to device,
+ * * PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB: reserved to handle CDBs larger than
+ * 16bytes. To be specified.
+ * - vcpuHint: vcpuId of the processor that will be most likely waiting for the
+ * completion of the i/o. For guest OSes that use lowest priority message
+ * delivery mode (such as windows), we use this "hint" to deliver the
+ * completion action to the proper vcpu. For now, we can use the vcpuId of
+ * the processor that initiated the i/o as a likely candidate for the vcpu
+ * that will be waiting for the completion..
+ * - bus should be 0: we currently only support bus 0 for now.
+ * - unused should be zero'd.
+ */
+
+#define PVSCSI_FLAG_CMD_WITH_SG_LIST (1 << 0)
+#define PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB (1 << 1)
+#define PVSCSI_FLAG_CMD_DIR_NONE (1 << 2)
+#define PVSCSI_FLAG_CMD_DIR_TOHOST (1 << 3)
+#define PVSCSI_FLAG_CMD_DIR_TODEVICE (1 << 4)
+
+#define PVSCSI_KNOWN_FLAGS \
+ (PVSCSI_FLAG_CMD_WITH_SG_LIST | \
+ PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB | \
+ PVSCSI_FLAG_CMD_DIR_NONE | \
+ PVSCSI_FLAG_CMD_DIR_TOHOST | \
+ PVSCSI_FLAG_CMD_DIR_TODEVICE)
+
+struct PVSCSIRingReqDesc {
+ uint64_t context;
+ uint64_t dataAddr;
+ uint64_t dataLen;
+ uint64_t senseAddr;
+ uint32_t senseLen;
+ uint32_t flags;
+ uint8_t cdb[16];
+ uint8_t cdbLen;
+ uint8_t lun[8];
+ uint8_t tag;
+ uint8_t bus;
+ uint8_t target;
+ uint8_t vcpuHint;
+ uint8_t unused[59];
+} QEMU_PACKED;
+
+typedef struct PVSCSIRingReqDesc PVSCSIRingReqDesc;
+
+/*
+ * Scatter-gather list management.
+ *
+ * As described above, when PVSCSI_FLAG_CMD_WITH_SG_LIST is set in the
+ * RingReqDesc.flags, then RingReqDesc.dataAddr is the PA of the first s/g
+ * table segment.
+ *
+ * - each segment of the s/g table contain a succession of struct
+ * PVSCSISGElement.
+ * - each segment is entirely contained on a single physical page of memory.
+ * - a "chain" s/g element has the flag PVSCSI_SGE_FLAG_CHAIN_ELEMENT set in
+ * PVSCSISGElement.flags and in this case:
+ * * addr is the PA of the next s/g segment,
+ * * length is undefined, assumed to be 0.
+ */
+
+struct PVSCSISGElement {
+ uint64_t addr;
+ uint32_t length;
+ uint32_t flags;
+} QEMU_PACKED;
+
+typedef struct PVSCSISGElement PVSCSISGElement;
+
+/*
+ * Completion descriptor.
+ *
+ * sizeof(RingCmpDesc) = 32
+ *
+ * - context: identifier of the command. The same thing that was specified
+ * under "context" as part of struct RingReqDesc at initiation time,
+ * - dataLen: number of bytes transferred for the actual i/o operation,
+ * - senseLen: number of bytes written into the sense buffer,
+ * - hostStatus: adapter status,
+ * - scsiStatus: device status,
+ * - pad should be zero.
+ */
+
+struct PVSCSIRingCmpDesc {
+ uint64_t context;
+ uint64_t dataLen;
+ uint32_t senseLen;
+ uint16_t hostStatus;
+ uint16_t scsiStatus;
+ uint32_t pad[2];
+} QEMU_PACKED;
+
+typedef struct PVSCSIRingCmpDesc PVSCSIRingCmpDesc;
+
+/*
+ * Interrupt status / IRQ bits.
+ */
+
+#define PVSCSI_INTR_CMPL_0 (1 << 0)
+#define PVSCSI_INTR_CMPL_1 (1 << 1)
+#define PVSCSI_INTR_CMPL_MASK MASK(2)
+
+#define PVSCSI_INTR_MSG_0 (1 << 2)
+#define PVSCSI_INTR_MSG_1 (1 << 3)
+#define PVSCSI_INTR_MSG_MASK (MASK(2) << 2)
+
+#define PVSCSI_INTR_ALL_SUPPORTED MASK(4)
+
+/*
+ * Number of MSI-X vectors supported.
+ */
+#define PVSCSI_MAX_INTRS 24
+
+/*
+ * Enumeration of supported MSI-X vectors
+ */
+#define PVSCSI_VECTOR_COMPLETION 0
+
+/*
+ * Misc constants for the rings.
+ */
+
+#define PVSCSI_MAX_NUM_PAGES_REQ_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
+#define PVSCSI_MAX_NUM_PAGES_CMP_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
+#define PVSCSI_MAX_NUM_PAGES_MSG_RING PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES
+
+#define PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE \
+ (VMW_PAGE_SIZE / sizeof(struct PVSCSIRingReqDesc))
+
+#define PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE \
+ (VMW_PAGE_SIZE / sizeof(PVSCSIRingCmpDesc))
+
+#define PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE \
+ (VMW_PAGE_SIZE / sizeof(PVSCSIRingMsgDesc))
+
+#define PVSCSI_MAX_REQ_QUEUE_DEPTH \
+ (PVSCSI_MAX_NUM_PAGES_REQ_RING * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE)
+
+#define PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES 1
+#define PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES 1
+#define PVSCSI_MEM_SPACE_MISC_NUM_PAGES 2
+#define PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES 2
+#define PVSCSI_MEM_SPACE_MSIX_NUM_PAGES 2
+
+enum PVSCSIMemSpace {
+ PVSCSI_MEM_SPACE_COMMAND_PAGE = 0,
+ PVSCSI_MEM_SPACE_INTR_STATUS_PAGE = 1,
+ PVSCSI_MEM_SPACE_MISC_PAGE = 2,
+ PVSCSI_MEM_SPACE_KICK_IO_PAGE = 4,
+ PVSCSI_MEM_SPACE_MSIX_TABLE_PAGE = 6,
+ PVSCSI_MEM_SPACE_MSIX_PBA_PAGE = 7,
+};
+
+#define PVSCSI_MEM_SPACE_NUM_PAGES \
+ (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_MISC_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_MSIX_NUM_PAGES)
+
+#define PVSCSI_MEM_SPACE_SIZE (PVSCSI_MEM_SPACE_NUM_PAGES * VMW_PAGE_SIZE)
+
+#endif /* VMW_PVSCSI_H */
diff --git a/trace-events b/trace-events
index 412f7e4..66037a1 100644
--- a/trace-events
+++ b/trace-events
@@ -761,6 +761,42 @@ pc87312_info_ide(uint32_t base) "base 0x%x"
pc87312_info_parallel(uint32_t base, uint32_t irq) "base 0x%x, irq %u"
pc87312_info_serial(int n, uint32_t base, uint32_t irq) "id=%d, base 0x%x, irq %u"
+# hw/pvscsi.c
+pvscsi_rings_mgr_init_data(uint32_t txr_len_log2, uint32_t rxr_len_log2) "TX/RX rings logarithms set to %d/%d"
+pvscsi_rings_mgr_init_msg(uint32_t len_log2) "MSG ring logarithm set to %d"
+pvscsi_rings_mgr_flush_cmp_ring(uint64_t filled_cmp_ptr) "new production counter of completion ring is 0x%"PRIx64""
+pvscsi_rings_mgr_flush_msg_ring(uint64_t filled_cmp_ptr) "new production counter of message ring is 0x%"PRIx64""
+pvscsi_update_irq_level(bool raise, uint64_t mask, uint64_t status) "interrupt level set to %d (MASK: 0x%"PRIx64", STATUS: 0x%"PRIx64")"
+pvscsi_update_irq_msi(void) "sending MSI notification"
+pvscsi_cmp_ring_put(unsigned long addr) "got completion descriptor 0x%lx"
+pvscsi_msg_ring_put(unsigned long addr) "got message descriptor 0x%lx"
+pvscsi_complete_request(uint64_t context, uint64_t len, uint8_t sense_key) "completion: ctx: 0x%"PRIx64", len: 0x%"PRIx64", sense key: %u"
+pvscsi_get_sg_list(int nsg, size_t size) "get SG list: depth: %u, size: %lu"
+pvscsi_get_next_sg_elem(uint32_t flags) "unknown flags in SG element (val: 0x%x)"
+pvscsi_command_complete_not_found(uint32_t tag) "can't find request for tag 0x%x"
+pvscsi_command_complete_data_run(void) "not all data required for command transferred"
+pvscsi_command_complete_sense_len(int len) "sense information length is %d bytes"
+pvscsi_convert_sglist(uint64_t context, unsigned long addr, uint32_t resid) "element: ctx: 0x%"PRIx64" addr: 0x%lx, len: %ul"
+pvscsi_process_req_descr(uint8_t cmd, uint64_t ctx) "SCSI cmd 0x%x, ctx: 0x%"PRIx64""
+pvscsi_process_req_descr_unknown_device(void) "command directed to unknown device rejected"
+pvscsi_process_req_descr_invalid_dir(void) "command with invalid transfer direction rejected"
+pvscsi_process_io(unsigned long addr) "got descriptor 0x%lx"
+pvscsi_on_cmd_noimpl(const char* cmd) "unimplemented command %s ignored"
+pvscsi_on_cmd_reset_dev(uint32_t tgt, int lun, void* dev) "PVSCSI_CMD_RESET_DEVICE[target %u lun %d (dev 0x%p)]"
+pvscsi_on_cmd_arrived(const char* cmd) "command %s arrived"
+pvscsi_on_cmd_abort(uint64_t ctx, uint32_t tgt) "command PVSCSI_CMD_ABORT_CMD for ctx 0x%"PRIx64", target %u"
+pvscsi_on_cmd_unknown(uint64_t cmd_id) "unknown command %"PRIx64""
+pvscsi_on_cmd_unknown_data(uint32_t data) "data for unknown command 0x:%x"
+pvscsi_io_write(const char* cmd, uint64_t val) "%s write: %"PRIx64""
+pvscsi_io_write_unknown(unsigned long addr, unsigned sz, uint64_t val) "unknown write address: 0x%lx size: %u bytes value: 0x%"PRIx64""
+pvscsi_io_read(const char* cmd, uint64_t status) "%s read: 0x%"PRIx64""
+pvscsi_io_read_unknown(unsigned long addr, unsigned sz) "unknown read address: 0x%lx size: %u bytes"
+pvscsi_init_msi_fail(int res) "failed to initialize MSI, error %d"
+pvscsi_state(const char* state) "starting %s ..."
+pvscsi_register(void) "PVSCSI QEMU device emulation registered"
+pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s page: %"PRIx64""
+pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages: %u"
+
# xen-all.c
xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1 V6] VMWare PVSCSI paravirtual device implementation
2013-04-08 18:39 ` [Qemu-devel] [PATCH 1/1 " Dmitry Fleytman
@ 2013-04-10 9:33 ` Paolo Bonzini
2013-04-18 9:38 ` Dmitry Fleytman
0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2013-04-10 9:33 UTC (permalink / raw)
To: Dmitry Fleytman; +Cc: Yan Vugenfirer, qemu-devel, Anthony Liguori
Just a few comments, many of them aesthetic.
Il 08/04/2013 20:39, Dmitry Fleytman ha scritto:
> Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
> Signed-off-by: Yan Vugenfirer <yan@daynix.com>
> ---
> default-configs/pci.mak | 1 +
> docs/specs/pvscsi-spec.txt | 92 ++++
> hw/Makefile.objs | 1 +
> hw/pci/pci.h | 1 +
> hw/pvscsi.c | 1194 ++++++++++++++++++++++++++++++++++++++++++++
> hw/vmw_pvscsi.h | 434 ++++++++++++++++
> trace-events | 36 ++
> 7 files changed, 1759 insertions(+)
> create mode 100644 docs/specs/pvscsi-spec.txt
> create mode 100644 hw/pvscsi.c
> create mode 100644 hw/vmw_pvscsi.h
>
> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
> index ce56d58..3f8375c 100644
> --- a/default-configs/pci.mak
> +++ b/default-configs/pci.mak
> @@ -10,6 +10,7 @@ CONFIG_EEPRO100_PCI=y
> CONFIG_PCNET_PCI=y
> CONFIG_PCNET_COMMON=y
> CONFIG_LSI_SCSI_PCI=y
> +CONFIG_PVSCSI_SCSI_PCI=y
> CONFIG_MEGASAS_SCSI_PCI=y
> CONFIG_RTL8139_PCI=y
> CONFIG_E1000_PCI=y
> diff --git a/docs/specs/pvscsi-spec.txt b/docs/specs/pvscsi-spec.txt
> new file mode 100644
> index 0000000..b2c3a55
> --- /dev/null
> +++ b/docs/specs/pvscsi-spec.txt
> @@ -0,0 +1,92 @@
> +General Description
> +===================
> +
> +This document describes VMWare PVSCSI device interface specification.
> +Created by Dmitry Fleytman (dmitry@daynix.com), Daynix Computing LTD.
> +Based on source code of PVSCSI Linux driver from kernel 3.0.4
> +
> +PVSCSI Device Interface Overview
> +================================
> +
> +The interface is based on memory area shared between hypervisor and VM.
> +Memory area is obtained by driver as device IO memory resource of
> +PVSCSI_MEM_SPACE_SIZE length.
> +The shared memory consists of registers area and rings area.
> +The registers area is used to raise hypervisor interrupts and issue device
> +commands. The rings area is used to transfer data descriptors and SCSI
> +commands from VM to hypervisor and to transfer messages produced by
> +hypervisor to VM. Data itself is transferred via virtual scatter-gather DMA.
> +
> +PVSCSI Device Registers
> +=======================
> +
> +Registers area length is 1 page (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES).
> +Registers area structure is described by PVSCSIRegOffset enumeration.
The length of the registers area is ... The structure of the registers
area is described by the PVSCSIRegOffset enum.
> +There are registers to issue device command (with optional short data),
> +issue device interrupt, control interrupts masking.
> +
> +PVSCSI Device Rings
> +===================
> +
> +There are three rings in shared memory:
> +
> + 1. Request ring (struct PVSCSIRingReqDesc *req_ring)
> + - ring for OS to device requests
> + 2. Completion ring (struct PVSCSIRingCmpDesc *cmp_ring)
> + - ring for device request completions
> + 3. Message ring (struct PVSCSIRingMsgDesc *msg_ring)
> + - ring for messages from device.
> + This ring is optional and may be not configured.
and the guest might not configure it.
> +There is a control area (struct PVSCSIRingsState *rings_state) used to control
> +rings operation.
> +
> +PVSCSI Device to Host Interrupts
> +================================
> +There are following interrupt types supported by PVSCSI device:
> + 1. Completion interrupts (completion ring notifications):
> + PVSCSI_INTR_CMPL_0
> + PVSCSI_INTR_CMPL_1
> + 2. Message interrupts (message ring notifications):
> + PVSCSI_INTR_MSG_0
> + PVSCSI_INTR_MSG_1
> +
> +Interrupts are controlled via PVSCSI_REG_OFFSET_INTR_MASK register
> +Bit set means interrupt enabled, bit cleared - disabled
> +
> +Interrupt modes supported are legacy, MSI and MSI-X
> +In case of legacy interrupts register PVSCSI_REG_OFFSET_INTR_STATUS
> +used to verify interrupt arrival and to clear interrupt state
> +Interrupts are cleared by writing processed bits back
> +to interrupt status register.
In case of legacy interrupts, register PVSCSI_REG_OFFSET_INTR_STATUS
is used to check which interrupt has arrived. Interrupts are
acknowledged when the corresponding bit is written to the interrupt
status register.
> +
> +PVSCSI Device Operation Sequences
> +=================================
> +
> +1. Startup sequence:
> + a. Issue PVSCSI_CMD_ADAPTER_RESET command;
> + aa. Windows driver reads interrupt status register here;
> + b. Issue PVSCSI_CMD_SETUP_MSG_RING command with no additional data,
> + check status and disable device messages if error returned;
> + (Omitted if device messages disabled by driver configuration)
Can you add a boolean property to enable/disable the message ring?
> + c. Issue PVSCSI_CMD_SETUP_RINGS command, provide rings configuration
> + as struct PVSCSICmdDescSetupRings;
> + d. Issue PVSCSI_CMD_SETUP_MSG_RING command again, provide
> + rings configuration as struct PVSCSICmdDescSetupMsgRing;
> + e. Unmask completion and message (if device messages enabled) interrupts.
> +
> +2. Shutdown sequences
> + a. Mask interrupts;
> + b. Flush request ring using PVSCSI_REG_OFFSET_KICK_NON_RW_IO;
> + c. Issue PVSCSI_CMD_ADAPTER_RESET command.
> +
> +3. Send request
> + a. Fill next free request ring descriptor;
> + b. Issue PVSCSI_REG_OFFSET_KICK_RW_IO for R/W operations;
> + or PVSCSI_REG_OFFSET_KICK_NON_RW_IO for other operations.
> +
> +4. Abort command
> + a. Issue PVSCSI_CMD_ABORT_CMD command;
> +
> +5. Request completion processing
> + a. Upon completion interrupt arrival process completion
> + and message (if enabled) rings.
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index d0b2ecb..6e43763 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -130,6 +130,7 @@ common-obj-$(CONFIG_OPENCORES_ETH) += opencores_eth.o
> # SCSI layer
> common-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
> common-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
> +common-obj-$(CONFIG_PVSCSI_SCSI_PCI) += pvscsi.o
> common-obj-$(CONFIG_ESP) += esp.o
> common-obj-$(CONFIG_ESP_PCI) += esp-pci.o
>
> diff --git a/hw/pci/pci.h b/hw/pci/pci.h
> index 9ea67a3..1767fe5 100644
> --- a/hw/pci/pci.h
> +++ b/hw/pci/pci.h
> @@ -59,6 +59,7 @@
> #define PCI_DEVICE_ID_VMWARE_SVGA 0x0710
> #define PCI_DEVICE_ID_VMWARE_NET 0x0720
> #define PCI_DEVICE_ID_VMWARE_SCSI 0x0730
> +#define PCI_DEVICE_ID_VMWARE_PVSCSI 0x07C0
> #define PCI_DEVICE_ID_VMWARE_IDE 0x1729
> #define PCI_DEVICE_ID_VMWARE_VMXNET3 0x07B0
>
> diff --git a/hw/pvscsi.c b/hw/pvscsi.c
> new file mode 100644
> index 0000000..4c66671
> --- /dev/null
> +++ b/hw/pvscsi.c
> @@ -0,0 +1,1194 @@
> +/*
> + * QEMU VMWARE PVSCSI paravirtual SCSI bus
> + *
> + * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
> + *
> + * Developed by Daynix Computing LTD (http://www.daynix.com)
> + *
> + * Based on implementation by Paolo Bonzini
> + * http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00729.html
> + *
> + * Authors:
> + * Paolo Bonzini <pbonzini@redhat.com>
> + * Dmitry Fleytman <dmitry@daynix.com>
> + * Yan Vugenfirer <yan@daynix.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + * NOTE about MSI-X:
> + * MSI-X support has been removed for the moment because it leads Windows OS
> + * to crash on startup. The crash happens because Windows driver requires
> + * MSI-X shared memory to be part of the same BAR used for rings state
> + * registers, etc. This is not supported by QEMU infrastructure so separate
> + * BAR created from MSI-X purposes. Windows driver fails to deal with 2 BARs.
> + *
> + */
> +
> +#include "scsi-defs.h"
> +#include "hw/scsi.h"
> +#include "hw/pci/msi.h"
> +#include "vmw_pvscsi.h"
> +#include "trace.h"
> +
> +
> +#define PVSCSI_MSI_OFFSET (0x50)
> +#define PVSCSI_USE_64BIT (true)
> +#define PVSCSI_PER_VECTOR_MASK (false)
> +
> +#define PVSCSI_MAX_DEVS (64)
> +#define PVSCSI_MSIX_NUM_VECTORS (1)
> +
> +#define PVSCSI_MAX_CMD_DATA_WORDS \
> + (sizeof(PVSCSICmdDescSetupRings)/sizeof(uint32_t))
> +
> +#define RS_GET_FIELD(rs_pa, field) \
> + (ldl_le_phys(rs_pa + offsetof(struct PVSCSIRingsState, field)))
> +#define RS_SET_FIELD(rs_pa, field, val) \
> + (stl_le_phys(rs_pa + offsetof(struct PVSCSIRingsState, field), val))
> +
> +#define TYPE_PVSCSI "pvscsi"
> +#define PVSCSI(obj) OBJECT_CHECK(PVSCSIState, (obj), TYPE_PVSCSI)
> +
> +typedef struct PVSCSIRingsMgr {
PVSCSIRingInfo, or just put it in PVSCSIState.
> + uint64_t rs_pa;
> + uint32_t txr_len_mask;
> + uint32_t rxr_len_mask;
> + uint32_t msg_len_mask;
> + uint64_t req_ring_pages_pa[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> + uint64_t cmp_ring_pages_pa[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> + uint64_t msg_ring_pages_pa[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
> + uint64_t consumed_ptr;
> + uint64_t filled_cmp_ptr;
> + uint64_t filled_msg_ptr;
> +} PVSCSIRingsMgr;
> +
> +typedef struct PVSCSISGState {
> + hwaddr elemAddr;
> + hwaddr dataAddr;
> + uint32_t resid;
> +} PVSCSISGState;
> +
> +typedef QTAILQ_HEAD(, PVSCSIRequest) PVSCSIRequestList;
> +
> +typedef struct {
> + PCIDevice parent_obj;
> + MemoryRegion io_space;
> + SCSIBus bus;
> + QEMUBH *completion_worker;
> + PVSCSIRequestList pending_queue;
> + PVSCSIRequestList completion_queue;
> +
> + uint64_t reg_interrupt_status; /* Interrupt status register value */
> + uint64_t reg_interrupt_enabled; /* Interrupt mask register value */
> + uint64_t reg_command_status; /* Command status register value */
> +
> + /* Command data adoption mechanism */
> + uint64_t curr_cmd; /* Last command arrived */
> + uint32_t curr_cmd_data_cntr; /* Amount of data for last command */
> +
> + /* Collector for current command data */
> + uint32_t curr_cmd_data[PVSCSI_MAX_CMD_DATA_WORDS];
> +
> + uint8_t rings_info_valid; /* Whether data rings initialized */
> + uint8_t msg_ring_info_valid; /* Whether message ring initialized */
> +
> + uint8_t msi_used; /* Whether MSI support was installed successfully */
> +
> + PVSCSIRingsMgr rings; /* Data transfer rings manager */
> +} PVSCSIState;
> +
> +typedef struct PVSCSIRequest {
> + SCSIRequest *sreq;
> + PVSCSIState *dev;
> + uint8_t sense_key;
> + uint8_t completed;
> + int lun;
> + QEMUSGList sgl;
> + PVSCSISGState sg;
> + struct PVSCSIRingReqDesc req;
> + struct PVSCSIRingCmpDesc cmp;
> + QTAILQ_ENTRY(PVSCSIRequest) next;
> +} PVSCSIRequest;
This needs to be serialized and loaded back if you want to support
migration with rerror=stop/werror=stop. See how it's done in
virtio-scsi. To test it, you can make a disk readonly with blockdev
while QEMU is running, migrate to file, make it read-write again and
migrate back from the file.
This is not blocking the patch though.
> +/* Integer binary logarithm */
> +static int
> +pvscsi_log2(uint32_t input)
> +{
> + int log = 0;
> + assert(input > 0);
> + while (input >> ++log) {
> + }
> + return log;
> +}
> +
> +static void
> +pvscsi_rings_mgr_init_data(PVSCSIRingsMgr *m, PVSCSICmdDescSetupRings *ri)
s/pvscsi_rings_mgr/pvscsi_ring/g
> +{
> + int i;
> + uint32_t txr_len_log2, rxr_len_log2;
> + uint32_t req_ring_size, cmp_ring_size;
> + m->rs_pa = ri->ringsStatePPN << VMW_PAGE_SHIFT;
> +
> + req_ring_size = ri->reqRingNumPages * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> + cmp_ring_size = ri->cmpRingNumPages * PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> + txr_len_log2 = pvscsi_log2(req_ring_size - 1);
> + rxr_len_log2 = pvscsi_log2(cmp_ring_size - 1);
> +
> + m->txr_len_mask = MASK(txr_len_log2);
> + m->rxr_len_mask = MASK(rxr_len_log2);
> +
> + m->consumed_ptr = 0;
> + m->filled_cmp_ptr = 0;
> +
> + for (i = 0; i < ri->reqRingNumPages; i++) {
> + m->req_ring_pages_pa[i] = ri->reqRingPPNs[i] << VMW_PAGE_SHIFT;
> + }
> +
> + for (i = 0; i < ri->cmpRingNumPages; i++) {
> + m->cmp_ring_pages_pa[i] = ri->cmpRingPPNs[i] << VMW_PAGE_SHIFT;
> + }
> +
> + RS_SET_FIELD(m->rs_pa, reqProdIdx, 0);
> + RS_SET_FIELD(m->rs_pa, reqConsIdx, 0);
> + RS_SET_FIELD(m->rs_pa, reqNumEntriesLog2, txr_len_log2);
> +
> + RS_SET_FIELD(m->rs_pa, cmpProdIdx, 0);
> + RS_SET_FIELD(m->rs_pa, cmpConsIdx, 0);
> + RS_SET_FIELD(m->rs_pa, cmpNumEntriesLog2, rxr_len_log2);
> +
> + trace_pvscsi_rings_mgr_init_data(txr_len_log2, rxr_len_log2);
> +
> + /* Flush ring state page changes */
> + smp_wmb();
> +}
> +
> +static void
> +pvscsi_rings_mgr_init_msg(PVSCSIRingsMgr *m, PVSCSICmdDescSetupMsgRing *ri)
> +{
> + int i;
> + uint32_t len_log2;
> + uint32_t ring_size;
> +
> + ring_size = ri->numPages * PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
> + len_log2 = pvscsi_log2(ring_size - 1);
> +
> + m->msg_len_mask = MASK(len_log2);
> +
> + m->filled_msg_ptr = 0;
> +
> + for (i = 0; i < ri->numPages; i++) {
> + m->msg_ring_pages_pa[i] = ri->ringPPNs[i] << VMW_PAGE_SHIFT;
> + }
> +
> + RS_SET_FIELD(m->rs_pa, msgProdIdx, 0);
> + RS_SET_FIELD(m->rs_pa, msgConsIdx, 0);
> + RS_SET_FIELD(m->rs_pa, msgNumEntriesLog2, len_log2);
> +
> + trace_pvscsi_rings_mgr_init_msg(len_log2);
> +
> + /* Flush ring state page changes */
> + smp_wmb();
> +}
> +
> +static void
> +pvscsi_rings_mgr_cleanup(PVSCSIRingsMgr *mgr)
> +{
> + mgr->rs_pa = 0;
> + mgr->txr_len_mask = 0;
> + mgr->rxr_len_mask = 0;
> + mgr->msg_len_mask = 0;
> + mgr->consumed_ptr = 0;
> + mgr->filled_cmp_ptr = 0;
> + mgr->filled_msg_ptr = 0;
> + memset(mgr->req_ring_pages_pa, 0, sizeof(mgr->req_ring_pages_pa));
> + memset(mgr->cmp_ring_pages_pa, 0, sizeof(mgr->cmp_ring_pages_pa));
> + memset(mgr->msg_ring_pages_pa, 0, sizeof(mgr->msg_ring_pages_pa));
> +}
> +
> +static hwaddr
> +pvscsi_rings_mgr_pop_req_descr(PVSCSIRingsMgr *mgr)
> +{
> + uint32_t ready_ptr = RS_GET_FIELD(mgr->rs_pa, reqProdIdx);
> +
> + if (ready_ptr != mgr->consumed_ptr) {
> + uint32_t next_ready_ptr =
> + mgr->consumed_ptr++ & mgr->txr_len_mask;
> + uint32_t next_ready_page =
> + next_ready_ptr / PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> + uint32_t inpage_idx =
> + next_ready_ptr % PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> +
> + return mgr->req_ring_pages_pa[next_ready_page] +
> + inpage_idx * sizeof(PVSCSIRingReqDesc);
> + } else {
> + return 0;
> + }
> +}
> +
> +static void
> +pvscsi_rings_mgr_flush_req_ring(PVSCSIRingsMgr *mgr)
pvscsi_ring_flush_req, and likewise for other functions that have _ring
at the end.
> +{
> + RS_SET_FIELD(mgr->rs_pa, reqConsIdx, mgr->consumed_ptr);
> +}
> +
> +static hwaddr
> +pvscsi_rings_mgr_pop_cmp_descr(PVSCSIRingsMgr *mgr)
> +{
> + /*
> + * According to Linux driver code it explicitly verifies that number
> + * of requests being processed by device is less then the size of
> + * completion queue, so device may omit completion queue overflow
> + * conditions check. We assume that this is true for other (Windows)
> + * drivers as well.
> + */
> +
> + uint32_t free_cmp_ptr =
> + mgr->filled_cmp_ptr++ & mgr->rxr_len_mask;
> + uint32_t free_cmp_page =
> + free_cmp_ptr / PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> + uint32_t inpage_idx =
> + free_cmp_ptr % PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> + return mgr->cmp_ring_pages_pa[free_cmp_page] +
> + inpage_idx * sizeof(PVSCSIRingCmpDesc);
> +}
> +
> +static hwaddr
> +pvscsi_rings_mgr_pop_msg_descr(PVSCSIRingsMgr *mgr)
> +{
> + uint32_t free_msg_ptr =
> + mgr->filled_msg_ptr++ & mgr->msg_len_mask;
> + uint32_t free_msg_page =
> + free_msg_ptr / PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
> + uint32_t inpage_idx =
> + free_msg_ptr % PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
> + return mgr->msg_ring_pages_pa[free_msg_page] +
> + inpage_idx * sizeof(PVSCSIRingMsgDesc);
> +}
> +
> +static void
> +pvscsi_rings_mgr_flush_cmp_ring(PVSCSIRingsMgr *mgr)
> +{
> + /* Flush descriptor changes */
> + smp_wmb();
> +
> + trace_pvscsi_rings_mgr_flush_cmp_ring(mgr->filled_cmp_ptr);
> +
> + RS_SET_FIELD(mgr->rs_pa, cmpProdIdx, mgr->filled_cmp_ptr);
> +}
> +
> +static bool
> +pvscsi_rings_mgr_msg_has_room(PVSCSIRingsMgr *mgr)
> +{
> + uint32_t prodIdx = RS_GET_FIELD(mgr->rs_pa, msgProdIdx);
> + uint32_t consIdx = RS_GET_FIELD(mgr->rs_pa, msgConsIdx);
> +
> + return (prodIdx - consIdx) < (mgr->msg_len_mask + 1);
> +}
> +
> +static void
> +pvscsi_rings_mgr_flush_msg_ring(PVSCSIRingsMgr *mgr)
> +{
> + /* Flush descriptor changes */
> + smp_wmb();
> +
> + trace_pvscsi_rings_mgr_flush_msg_ring(mgr->filled_msg_ptr);
> +
> + RS_SET_FIELD(mgr->rs_pa, msgProdIdx, mgr->filled_msg_ptr);
> +}
> +
> +static void
> +pvscsi_reset_state(PVSCSIState *s)
> +{
> + s->curr_cmd = PVSCSI_CMD_FIRST;
> + s->curr_cmd_data_cntr = 0;
> + s->reg_command_status = PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> + s->reg_interrupt_status = 0;
> + pvscsi_rings_mgr_cleanup(&s->rings);
> + s->rings_info_valid = FALSE;
> + s->msg_ring_info_valid = FALSE;
> + QTAILQ_INIT(&s->pending_queue);
> + QTAILQ_INIT(&s->completion_queue);
> +}
> +
> +static void
> +pvscsi_free_queue(PVSCSIRequestList *req_list)
This shouldn't be needed.
> +{
> + PVSCSIRequest *pvscsi_req;
> +
> + while (!QTAILQ_EMPTY(req_list)) {
> + pvscsi_req = QTAILQ_FIRST(req_list);
> + QTAILQ_REMOVE(req_list, pvscsi_req, next);
> + g_free(pvscsi_req);
> + }
> +}
> +
> +static void
> +pvscsi_reset_adapter(PVSCSIState *s)
> +{
> + qbus_reset_all_fn(&s->bus);
> + pvscsi_free_queue(&s->completion_queue);
> + assert(QTAILQ_EMPTY(&s->pending_queue));
> + pvscsi_reset_state(s);
> +}
> +
> +static void
> +pvscsi_update_irq_status(PVSCSIState *s)
> +{
> + PCIDevice *d = PCI_DEVICE(s);
> + bool should_raise = s->reg_interrupt_enabled & s->reg_interrupt_status;
> +
> + trace_pvscsi_update_irq_level(should_raise, s->reg_interrupt_enabled,
> + s->reg_interrupt_status);
> +
> + if (s->msi_used && msi_enabled(d)) {
> + if (should_raise) {
> + trace_pvscsi_update_irq_msi();
> + msi_notify(d, PVSCSI_VECTOR_COMPLETION);
> + }
> + return;
> + }
> +
> + qemu_set_irq(d->irq[0], !!should_raise);
> +}
> +
> +static void
> +pvscsi_raise_completion_interrupt(PVSCSIState *s)
> +{
> + s->reg_interrupt_status |= PVSCSI_INTR_CMPL_0;
Did you find out how you're supposed to use PVSCSI_INTR_CMPL_1?
> + /* Memory barrier to flush interrupt status register changes*/
> + smp_wmb();
> +
> + pvscsi_update_irq_status(s);
> +}
> +
> +static void
> +pvscsi_raise_message_interrupt(PVSCSIState *s)
> +{
> + s->reg_interrupt_status |= PVSCSI_INTR_MSG_0;
> +
> + /* Memory barrier to flush interrupt status register changes*/
> + smp_wmb();
> +
> + pvscsi_update_irq_status(s);
> +}
> +
> +static void
> +pvscsi_cmp_ring_put(PVSCSIState *s, struct PVSCSIRingCmpDesc *cmp_desc)
> +{
> + hwaddr cmp_descr_pa;
> +
> + cmp_descr_pa = pvscsi_rings_mgr_pop_cmp_descr(&s->rings);
> + trace_pvscsi_cmp_ring_put(cmp_descr_pa);
> + cpu_physical_memory_write(cmp_descr_pa, (void *)cmp_desc,
> + sizeof(*cmp_desc));
> +}
> +
> +static void
> +pvscsi_msg_ring_put(PVSCSIState *s, struct PVSCSIRingMsgDesc *msg_desc)
> +{
> + hwaddr msg_descr_pa;
> +
> + msg_descr_pa = pvscsi_rings_mgr_pop_msg_descr(&s->rings);
> + trace_pvscsi_msg_ring_put(msg_descr_pa);
> + cpu_physical_memory_write(msg_descr_pa, (void *)msg_desc,
> + sizeof(*msg_desc));
> +}
> +
> +static void
> +pvscsi_process_completion_queue(void *opaque)
> +{
> + PVSCSIState *s = opaque;
> + PVSCSIRequest *pvscsi_req;
> + bool has_completed = false;
> +
> + while (!QTAILQ_EMPTY(&s->completion_queue)) {
> + pvscsi_req = QTAILQ_FIRST(&s->completion_queue);
> + QTAILQ_REMOVE(&s->completion_queue, pvscsi_req, next);
> + pvscsi_cmp_ring_put(s, &pvscsi_req->cmp);
> + g_free(pvscsi_req);
> + has_completed++;
> + }
> +
> + if (has_completed) {
> + pvscsi_rings_mgr_flush_cmp_ring(&s->rings);
> + pvscsi_raise_completion_interrupt(s);
> + }
> +}
> +
> +static void
> +pvscsi_schedule_completion_processing(PVSCSIState *s)
> +{
> + /* Try putting more complete requests on the ring. */
> + if (!QTAILQ_EMPTY(&s->completion_queue)) {
> + qemu_bh_schedule(s->completion_worker);
> + }
> +}
> +
> +static void
> +pvscsi_complete_request(PVSCSIState *s, PVSCSIRequest *r)
> +{
> + assert(!r->completed);
> +
> + trace_pvscsi_complete_request(r->cmp.context, r->cmp.dataLen,
> + r->sense_key);
> + if (r->sreq != NULL) {
> + scsi_req_unref(r->sreq);
> + r->sreq = NULL;
> + }
> + r->completed = 1;
> + QTAILQ_REMOVE(&s->pending_queue, r, next);
> + QTAILQ_INSERT_TAIL(&s->completion_queue, r, next);
> + pvscsi_schedule_completion_processing(s);
> +}
> +
> +static QEMUSGList *pvscsi_get_sg_list(SCSIRequest *r)
> +{
> + PVSCSIRequest *req = r->hba_private;
> +
> + trace_pvscsi_get_sg_list(req->sgl.nsg, req->sgl.size);
> +
> + return &req->sgl;
> +}
> +
> +static void
> +pvscsi_get_next_sg_elem(PVSCSISGState *sg)
> +{
> + struct PVSCSISGElement elem;
> +
> + for (;; sg->elemAddr = elem.addr) {
Please remove the for loop altogether.
> + cpu_physical_memory_read(sg->elemAddr, (void *)&elem, sizeof(elem));
> + if ((elem.flags & ~PVSCSI_KNOWN_FLAGS) != 0) {
> + /*
> + * There is PVSCSI_SGE_FLAG_CHAIN_ELEMENT flag described in
> + * header file but its value is unknown. This flag requires
> + * additional processing, so we put warning here to catch it
> + * some day and make proper implementation
> + */
> + trace_pvscsi_get_next_sg_elem(elem.flags);
> + }
> + break;
> + }
> +
> + sg->elemAddr += sizeof(elem);
> + sg->dataAddr = elem.addr;
> + sg->resid = elem.length;
> +}
> +
> +static void
> +pvscsi_write_sense(PVSCSIRequest *r, uint8_t *sense, int len)
> +{
> + r->cmp.senseLen = MIN(r->req.senseLen, len);
> + r->sense_key = sense[2];
The key is in sense[1] if bit 1 of sense[0] is 1. See scsi_build_sense
in hw/scsi/scsi-bus.c.
> + cpu_physical_memory_write(r->req.senseAddr, sense, r->cmp.senseLen);
> +}
> +
> +static void
> +pvscsi_command_complete(SCSIRequest *req, uint32_t status, size_t resid)
> +{
> + PVSCSIRequest *pvscsi_req = req->hba_private;
> + PVSCSIState *s = pvscsi_req->dev;
> +
> + if (!pvscsi_req) {
> + trace_pvscsi_command_complete_not_found(req->tag);
> + return;
> + }
> +
> + if (resid) {
> + /* Short transfer. */
> + trace_pvscsi_command_complete_data_run();
> + pvscsi_req->cmp.hostStatus = BTSTAT_DATARUN;
> + }
> +
> + pvscsi_req->cmp.scsiStatus = status;
> + if (pvscsi_req->cmp.scsiStatus == CHECK_CONDITION) {
> + uint8_t sense[SCSI_SENSE_BUF_SIZE];
> + int sense_len =
> + scsi_req_get_sense(pvscsi_req->sreq, sense, sizeof(sense));
> +
> + trace_pvscsi_command_complete_sense_len(sense_len);
> + pvscsi_write_sense(pvscsi_req, sense, sense_len);
> + }
> + qemu_sglist_destroy(&pvscsi_req->sgl);
> + pvscsi_complete_request(s, pvscsi_req);
> +}
> +
> +static void
> +pvscsi_send_msg(PVSCSIState *s, SCSIDevice *dev, uint32_t msg_type)
> +{
> + if (s->msg_ring_info_valid && pvscsi_rings_mgr_msg_has_room(&s->rings)) {
> + PVSCSIMsgDescDevStatusChanged msg = {0};
> +
> + msg.type = msg_type;
> + msg.bus = dev->channel;
> + msg.target = dev->id;
> + msg.lun[1] = dev->lun;
> +
> + pvscsi_msg_ring_put(s, (PVSCSIRingMsgDesc *)&msg);
> + pvscsi_rings_mgr_flush_msg_ring(&s->rings);
> + pvscsi_raise_message_interrupt(s);
> + }
> +}
> +
> +static void
> +pvscsi_hotplug(SCSIBus *bus, SCSIDevice *dev)
> +{
> + PVSCSIState *s = container_of(bus, PVSCSIState, bus);
> + pvscsi_send_msg(s, dev, PVSCSI_MSG_DEV_ADDED);
> +}
> +
> +static void
> +pvscsi_hot_unplug(SCSIBus *bus, SCSIDevice *dev)
> +{
> + PVSCSIState *s = container_of(bus, PVSCSIState, bus);
> + pvscsi_send_msg(s, dev, PVSCSI_MSG_DEV_REMOVED);
> +}
> +
> +static void
> +pvscsi_request_cancelled(SCSIRequest *req)
> +{
> + PVSCSIRequest *pvscsi_req = req->hba_private;
> + PVSCSIState *s = pvscsi_req->dev;
> +
> + if (pvscsi_req->cmp.hostStatus == BTSTAT_SUCCESS) {
> + pvscsi_req->cmp.hostStatus = BTSTAT_ABORTQUEUE;
> + }
virtio-scsi has a "resetting" field and sets BTSTAT_BUSRESET if it is
one. Perhaps you can do the same.
> + pvscsi_complete_request(s, pvscsi_req);
> +}
> +
> +static SCSIDevice*
> +pvscsi_device_find(PVSCSIState *s, int channel, int target,
> + uint8_t *requested_lun, uint8_t *target_lun)
> +{
> + if (requested_lun[0] || requested_lun[2] || requested_lun[3] ||
> + requested_lun[4] || requested_lun[5] || requested_lun[6] ||
> + requested_lun[7] || (target > PVSCSI_MAX_DEVS)) {
> + return NULL;
> + } else {
> + *target_lun = requested_lun[1];
> + return scsi_device_find(&s->bus, channel, target, *target_lun);
> + }
> +}
> +
> +static PVSCSIRequest *
> +pvscsi_queue_pending_descriptor(PVSCSIState *s, SCSIDevice **d,
> + struct PVSCSIRingReqDesc *descr)
> +{
> + PVSCSIRequest *pvscsi_req;
> + uint8_t lun;
> +
> + pvscsi_req = g_malloc0(sizeof(*pvscsi_req));
> + pvscsi_req->dev = s;
> + pvscsi_req->req = *descr;
> + pvscsi_req->cmp.context = pvscsi_req->req.context;
> + QTAILQ_INSERT_TAIL(&s->pending_queue, pvscsi_req, next);
> +
> + *d = pvscsi_device_find(s, descr->bus, descr->target, descr->lun, &lun);
> + if (!*d) {
> + return pvscsi_req;
> + }
> +
> + pvscsi_req->lun = lun;
> + return pvscsi_req;
> +}
> +
> +static void
> +pvscsi_convert_sglist(PVSCSIRequest *r)
> +{
> + int chunk_size;
> + uint64_t data_length = r->req.dataLen;
> + PVSCSISGState sg = r->sg;
> + while (data_length) {
> + while (!sg.resid) {
> + pvscsi_get_next_sg_elem(&sg);
> + trace_pvscsi_convert_sglist(r->req.context, r->sg.dataAddr,
> + r->sg.resid);
> + }
> + assert(data_length > 0);
> + chunk_size = MIN((unsigned) data_length, sg.resid);
> + if (chunk_size) {
> + qemu_sglist_add(&r->sgl, sg.dataAddr, chunk_size);
> + }
> +
> + sg.dataAddr += chunk_size;
> + data_length -= chunk_size;
> + sg.resid -= chunk_size;
> + }
> +}
> +
> +static void
> +pvscsi_build_sglist(PVSCSIState *s, PVSCSIRequest *r)
> +{
> + PCIDevice *d = PCI_DEVICE(s);
> +
> + qemu_sglist_init(&r->sgl, 1, pci_dma_context(d));
> + if (r->req.flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
> + pvscsi_convert_sglist(r);
> + } else {
> + qemu_sglist_add(&r->sgl, r->req.dataAddr, r->req.dataLen);
> + }
> +}
> +
> +static void
> +pvscsi_process_request_descriptor(PVSCSIState *s,
> + struct PVSCSIRingReqDesc *descr)
> +{
> + SCSIDevice *d;
> + PVSCSIRequest *r = pvscsi_queue_pending_descriptor(s, &d, descr);
> + int64_t n;
> +
> + trace_pvscsi_process_req_descr(descr->cdb[0], descr->context);
> +
> + if (!d) {
> + r->cmp.hostStatus = BTSTAT_SELTIMEO;
> + trace_pvscsi_process_req_descr_unknown_device();
> + pvscsi_complete_request(s, r);
> + return;
> + }
> +
> + if (descr->flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
> + r->sg.elemAddr = descr->dataAddr;
> + }
> +
> + r->sreq = scsi_req_new(d, descr->context, r->lun, descr->cdb, r);
> + if (r->sreq->cmd.mode == SCSI_XFER_FROM_DEV &&
> + (descr->flags & PVSCSI_FLAG_CMD_DIR_TODEVICE)) {
> + r->cmp.hostStatus = BTSTAT_BADMSG;
> + trace_pvscsi_process_req_descr_invalid_dir();
> + scsi_req_cancel(r->sreq);
> + return;
> + }
> + if (r->sreq->cmd.mode == SCSI_XFER_TO_DEV &&
> + (descr->flags & PVSCSI_FLAG_CMD_DIR_TOHOST)) {
> + r->cmp.hostStatus = BTSTAT_BADMSG;
> + trace_pvscsi_process_req_descr_invalid_dir();
> + scsi_req_cancel(r->sreq);
> + return;
> + }
> +
> + pvscsi_build_sglist(s, r);
> + n = scsi_req_enqueue(r->sreq);
> +
> + if (n) {
> + scsi_req_continue(r->sreq);
> + }
> +}
> +
> +static void
> +pvscsi_process_io(PVSCSIState *s)
> +{
> + PVSCSIRingReqDesc descr;
> + hwaddr next_descr_pa;
> +
> + assert(s->rings_info_valid);
> + while ((next_descr_pa = pvscsi_rings_mgr_pop_req_descr(&s->rings)) != 0) {
> +
> + /* Only read after production index verification */
> + smp_rmb();
> +
> + trace_pvscsi_process_io(next_descr_pa);
> + cpu_physical_memory_read(next_descr_pa, &descr, sizeof(descr));
> + pvscsi_process_request_descriptor(s, &descr);
> + }
> +
> + pvscsi_rings_mgr_flush_req_ring(&s->rings);
> +}
> +
> +static void
> +pvscsi_dbg_dump_tx_rings_config(PVSCSICmdDescSetupRings *rc)
> +{
> + int i;
> + trace_pvscsi_tx_rings_ppn("Rings State", rc->ringsStatePPN);
> +
> + trace_pvscsi_tx_rings_num_pages("Request Ring", rc->reqRingNumPages);
> + for (i = 0; i < rc->reqRingNumPages; i++) {
> + trace_pvscsi_tx_rings_ppn("Request Ring", rc->reqRingPPNs[i]);
> + }
> +
> + trace_pvscsi_tx_rings_num_pages("Confirm Ring", rc->cmpRingNumPages);
> + for (i = 0; i < rc->cmpRingNumPages; i++) {
> + trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->reqRingPPNs[i]);
> + }
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_config(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_CONFIG");
> + return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_unplug(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_DEVICE_UNPLUG");
> + return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> +static uint64_t
> +pvscsi_on_issue_scsi(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_ISSUE_SCSI");
> + return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_setup_rings(PVSCSIState *s)
> +{
> + PVSCSICmdDescSetupRings *rc =
> + (PVSCSICmdDescSetupRings *) s->curr_cmd_data;
> +
> + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_RINGS");
> +
> + pvscsi_dbg_dump_tx_rings_config(rc);
> + pvscsi_rings_mgr_init_data(&s->rings, rc);
> + s->rings_info_valid = TRUE;
> + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_abort(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_abort(
> + ((struct PVSCSICmdDescAbortCmd *) s->curr_cmd_data)->context,
> + ((struct PVSCSICmdDescAbortCmd *) s->curr_cmd_data)->target);
> + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
You need to call scsi_req_cancel here before returning.
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_unknown(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_unknown_data(s->curr_cmd_data[0]);
> + return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_reset_device(PVSCSIState *s)
> +{
> + uint8_t target_lun = 0;
> + struct PVSCSICmdDescResetDevice *cmd =
> + (struct PVSCSICmdDescResetDevice *) s->curr_cmd_data;
> + SCSIDevice *sdev;
> +
> + sdev = pvscsi_device_find(s, 0, cmd->target, cmd->lun, &target_lun);
> +
> + trace_pvscsi_on_cmd_reset_dev(cmd->target, (int) target_lun, sdev);
> +
> + if (sdev != NULL) {
> + device_reset(&sdev->qdev);
> + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> + }
> +
> + return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_reset_bus(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_RESET_BUS");
> +
> + qbus_reset_all_fn(&s->bus);
> + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_setup_msg_ring(PVSCSIState *s)
> +{
> + PVSCSICmdDescSetupMsgRing *rc =
> + (PVSCSICmdDescSetupMsgRing *) s->curr_cmd_data;
> +
> + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_MSG_RING");
> +
> + if (s->rings_info_valid) {
> + pvscsi_rings_mgr_init_msg(&s->rings, rc);
> + s->msg_ring_info_valid = TRUE;
> + }
> + return sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t);
> +}
> +
> +static uint64_t
> +pvscsi_on_cmd_adapter_reset(PVSCSIState *s)
> +{
> + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_ADAPTER_RESET");
> +
> + pvscsi_reset_adapter(s);
> + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> +}
> +
> +static const struct {
> + int data_size;
> + uint64_t (*handler_fn)(PVSCSIState *s);
> +} pvscsi_commands[] = {
> + [PVSCSI_CMD_FIRST] = {
> + .data_size = 0,
> + .handler_fn = pvscsi_on_cmd_unknown,
> + },
> +
> + /* Not implemented, data size defined based on what arrives on windows */
> + [PVSCSI_CMD_CONFIG] = {
> + .data_size = 6 * sizeof(uint32_t),
> + .handler_fn = pvscsi_on_cmd_config,
> + },
> +
> + /* Command not implemented, data size is unknown */
> + [PVSCSI_CMD_ISSUE_SCSI] = {
> + .data_size = 0,
> + .handler_fn = pvscsi_on_issue_scsi,
> + },
> +
> + /* Command not implemented, data size is unknown */
> + [PVSCSI_CMD_DEVICE_UNPLUG] = {
> + .data_size = 0,
> + .handler_fn = pvscsi_on_cmd_unplug,
> + },
> +
> + [PVSCSI_CMD_SETUP_RINGS] = {
> + .data_size = sizeof(PVSCSICmdDescSetupRings),
> + .handler_fn = pvscsi_on_cmd_setup_rings,
> + },
> +
> + [PVSCSI_CMD_RESET_DEVICE] = {
> + .data_size = sizeof(struct PVSCSICmdDescResetDevice),
> + .handler_fn = pvscsi_on_cmd_reset_device,
> + },
> +
> + [PVSCSI_CMD_RESET_BUS] = {
> + .data_size = 0,
> + .handler_fn = pvscsi_on_cmd_reset_bus,
> + },
> +
> + [PVSCSI_CMD_SETUP_MSG_RING] = {
> + .data_size = sizeof(PVSCSICmdDescSetupMsgRing),
> + .handler_fn = pvscsi_on_cmd_setup_msg_ring,
> + },
> +
> + [PVSCSI_CMD_ADAPTER_RESET] = {
> + .data_size = 0,
> + .handler_fn = pvscsi_on_cmd_adapter_reset,
> + },
> +
> + [PVSCSI_CMD_ABORT_CMD] = {
> + .data_size = sizeof(struct PVSCSICmdDescAbortCmd),
> + .handler_fn = pvscsi_on_cmd_abort,
> + },
> +};
> +
> +static void
> +pvscsi_do_command_processing(PVSCSIState *s)
> +{
> + size_t bytes_arrived = s->curr_cmd_data_cntr * sizeof(uint32_t);
> +
> + assert(s->curr_cmd < PVSCSI_CMD_LAST);
> + if (bytes_arrived >= pvscsi_commands[s->curr_cmd].data_size) {
> + s->reg_command_status = pvscsi_commands[s->curr_cmd].handler_fn(s);
> + s->curr_cmd = PVSCSI_CMD_FIRST;
> + s->curr_cmd_data_cntr = 0;
> + }
> +}
> +
> +static void
> +pvscsi_on_command_data(PVSCSIState *s, uint32_t value)
> +{
> + size_t bytes_arrived = s->curr_cmd_data_cntr * sizeof(uint32_t);
> +
> + assert(bytes_arrived < sizeof(s->curr_cmd_data));
> + s->curr_cmd_data[s->curr_cmd_data_cntr++] = value;
> +
> + pvscsi_do_command_processing(s);
> +}
> +
> +static void
> +pvscsi_on_command(PVSCSIState *s, uint64_t cmd_id)
> +{
> + if ((cmd_id > PVSCSI_CMD_FIRST) && (cmd_id < PVSCSI_CMD_LAST)) {
> + s->curr_cmd = cmd_id;
> + } else {
> + s->curr_cmd = PVSCSI_CMD_FIRST;
> + trace_pvscsi_on_cmd_unknown(cmd_id);
> + }
> +
> + s->curr_cmd_data_cntr = 0;
> + s->reg_command_status = PVSCSI_COMMAND_NOT_ENOUGH_DATA;
> +
> + pvscsi_do_command_processing(s);
> +}
> +
> +static void
> +pvscsi_io_write(void *opaque, hwaddr addr,
> + uint64_t val, unsigned size)
> +{
> + PVSCSIState *s = opaque;
> +
> + switch (addr) {
> + case PVSCSI_REG_OFFSET_COMMAND:
> + pvscsi_on_command(s, val);
> + break;
> +
> + case PVSCSI_REG_OFFSET_COMMAND_DATA:
> + pvscsi_on_command_data(s, (uint32_t) val);
> + break;
> +
> + case PVSCSI_REG_OFFSET_INTR_STATUS:
> + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_INTR_STATUS", val);
> + s->reg_interrupt_status &= ~val;
> + pvscsi_update_irq_status(s);
> + pvscsi_schedule_completion_processing(s);
> + break;
> +
> + case PVSCSI_REG_OFFSET_INTR_MASK:
> + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_INTR_MASK", val);
> + s->reg_interrupt_enabled = val;
> + pvscsi_update_irq_status(s);
> + break;
> +
> + case PVSCSI_REG_OFFSET_KICK_NON_RW_IO:
> + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_KICK_NON_RW_IO", val);
> + pvscsi_process_io(s);
> + break;
> +
> + case PVSCSI_REG_OFFSET_KICK_RW_IO:
> + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_KICK_RW_IO", val);
> + pvscsi_process_io(s);
> + break;
> +
> + case PVSCSI_REG_OFFSET_DEBUG:
> + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_DEBUG", val);
> + break;
> +
> + default:
> + trace_pvscsi_io_write_unknown(addr, size, val);
> + break;
> + }
> +
> +}
> +
> +static uint64_t
> +pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
> +{
> + PVSCSIState *s = opaque;
> +
> + switch (addr) {
> + case PVSCSI_REG_OFFSET_INTR_STATUS:
> + trace_pvscsi_io_read("PVSCSI_REG_OFFSET_INTR_STATUS",
> + s->reg_interrupt_status);
> + return s->reg_interrupt_status;
> +
> + case PVSCSI_REG_OFFSET_INTR_MASK:
> + trace_pvscsi_io_read("PVSCSI_REG_OFFSET_INTR_MASK",
> + s->reg_interrupt_status);
> + return s->reg_interrupt_enabled;
> +
> + case PVSCSI_REG_OFFSET_COMMAND_STATUS:
> + trace_pvscsi_io_read("PVSCSI_REG_OFFSET_COMMAND_STATUS",
> + s->reg_interrupt_status);
> + return s->reg_command_status;
> +
> + default:
> + trace_pvscsi_io_read_unknown(addr, size);
> + return 0;
> + }
> +}
> +
> +
> +static bool
> +pvscsi_init_msi(PVSCSIState *s)
> +{
> + int res;
> + PCIDevice *d = PCI_DEVICE(s);
> +
> + res = msi_init(d, PVSCSI_MSI_OFFSET, PVSCSI_MSIX_NUM_VECTORS,
> + PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK);
> + if (res < 0) {
> + trace_pvscsi_init_msi_fail(res);
> + s->msi_used = false;
> + } else {
> + s->msi_used = true;
> + }
> +
> + return s->msi_used;
> +}
> +
> +static void
> +pvscsi_cleanup_msi(PVSCSIState *s)
> +{
> + PCIDevice *d = PCI_DEVICE(s);
> +
> + if (s->msi_used) {
> + msi_uninit(d);
> + }
> +}
> +
> +static const MemoryRegionOps pvscsi_ops = {
> + .read = pvscsi_io_read,
> + .write = pvscsi_io_write,
> + .endianness = DEVICE_LITTLE_ENDIAN,
> + .impl = {
> + .min_access_size = 4,
> + .max_access_size = 4,
> + },
> +};
> +
> +static const struct SCSIBusInfo pvscsi_scsi_info = {
> + .tcq = true,
> + .max_target = PVSCSI_MAX_DEVS,
> + .max_channel = 0,
> + .max_lun = 0,
> +
> + .get_sg_list = pvscsi_get_sg_list,
> + .complete = pvscsi_command_complete,
> + .cancel = pvscsi_request_cancelled,
> + .hotplug = pvscsi_hotplug,
> + .hot_unplug = pvscsi_hot_unplug,
> +};
> +
> +static int
> +pvscsi_init(PCIDevice *pci_dev)
> +{
> + PVSCSIState *s = PVSCSI(pci_dev);
> +
> + trace_pvscsi_state("init");
> +
> + /* PCI subsystem ID */
> + pci_dev->config[PCI_SUBSYSTEM_ID] = 0x00;
> + pci_dev->config[PCI_SUBSYSTEM_ID + 1] = 0x10;
> +
> + /* PCI latency timer = 255 */
> + pci_dev->config[PCI_LATENCY_TIMER] = 0xff;
> +
> + /* Interrupt pin A */
> + pci_config_set_interrupt_pin(pci_dev->config, 1);
> +
> + memory_region_init_io(&s->io_space, &pvscsi_ops, s,
> + "pvscsi-io", PVSCSI_MEM_SPACE_SIZE);
> + pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->io_space);
> +
> + pvscsi_init_msi(s);
> +
> + s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
> + if (!s->completion_worker) {
> + pvscsi_cleanup_msi(s);
> + memory_region_destroy(&s->io_space);
> + return -ENOMEM;
> + }
> +
> + scsi_bus_new(&s->bus, &pci_dev->qdev, &pvscsi_scsi_info);
> + pvscsi_reset_state(s);
> +
> + return 0;
> +}
> +
> +static void
> +pvscsi_uninit(PCIDevice *pci_dev)
> +{
> + PVSCSIState *s = PVSCSI(pci_dev);
> +
> + trace_pvscsi_state("uninit");
> + qemu_bh_delete(s->completion_worker);
> +
> + pvscsi_cleanup_msi(s);
> +
> + memory_region_destroy(&s->io_space);
> +}
> +
> +static void
> +pvscsi_reset(DeviceState *dev)
> +{
> + PCIDevice *d = PCI_DEVICE(dev);
> + PVSCSIState *s = PVSCSI(d);
> +
> + trace_pvscsi_state("reset");
> + pvscsi_reset_adapter(s);
> +}
> +
> +static void
> +pvscsi_pre_save(void *opaque)
> +{
> + PVSCSIState *s = (PVSCSIState *) opaque;
> +
> + trace_pvscsi_state("presave");
> +
> + assert(QTAILQ_EMPTY(&s->pending_queue));
> + assert(QTAILQ_EMPTY(&s->completion_queue));
If you implement request serialization, you can still assert that the
completion queue is empty. The pending queue will be reconstructed by
the load_request callbacks.
> +}
> +
> +static int
> +pvscsi_post_load(void *opaque, int version_id)
> +{
> + trace_pvscsi_state("postload");
> + return 0;
> +}
> +
> +static const VMStateDescription vmstate_pvscsi = {
> + .name = TYPE_PVSCSI,
> + .version_id = 0,
> + .minimum_version_id = 0,
> + .minimum_version_id_old = 0,
> + .pre_save = pvscsi_pre_save,
> + .post_load = pvscsi_post_load,
> + .fields = (VMStateField[]) {
> + VMSTATE_PCI_DEVICE(parent_obj, PVSCSIState),
> + VMSTATE_UINT8(msi_used, PVSCSIState),
> + VMSTATE_UINT64(reg_interrupt_status, PVSCSIState),
> + VMSTATE_UINT64(reg_interrupt_enabled, PVSCSIState),
> + VMSTATE_UINT64(reg_command_status, PVSCSIState),
> + VMSTATE_UINT64(curr_cmd, PVSCSIState),
> + VMSTATE_UINT32(curr_cmd_data_cntr, PVSCSIState),
> + VMSTATE_UINT32_ARRAY(curr_cmd_data, PVSCSIState,
> + ARRAY_SIZE(((PVSCSIState *)NULL)->curr_cmd_data)),
> + VMSTATE_UINT8(rings_info_valid, PVSCSIState),
> + VMSTATE_UINT8(msg_ring_info_valid, PVSCSIState),
> +
> + VMSTATE_UINT64(rings.rs_pa, PVSCSIState),
> + VMSTATE_UINT32(rings.txr_len_mask, PVSCSIState),
> + VMSTATE_UINT32(rings.rxr_len_mask, PVSCSIState),
> + VMSTATE_UINT64_ARRAY(rings.req_ring_pages_pa, PVSCSIState,
> + PVSCSI_SETUP_RINGS_MAX_NUM_PAGES),
> + VMSTATE_UINT64_ARRAY(rings.cmp_ring_pages_pa, PVSCSIState,
> + PVSCSI_SETUP_RINGS_MAX_NUM_PAGES),
> + VMSTATE_UINT64(rings.consumed_ptr, PVSCSIState),
> + VMSTATE_UINT64(rings.filled_cmp_ptr, PVSCSIState),
> +
> + VMSTATE_END_OF_LIST()
> + }
> +};
> +
> +static void
> +pvscsi_write_config(PCIDevice *pci, uint32_t addr, uint32_t val, int len)
> +{
> + pci_default_write_config(pci, addr, val, len);
> + msi_write_config(pci, addr, val, len);
> +}
> +
> +static void pvscsi_class_init(ObjectClass *klass, void *data)
> +{
> + DeviceClass *dc = DEVICE_CLASS(klass);
> + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> + k->init = pvscsi_init;
> + k->exit = pvscsi_uninit;
> + k->vendor_id = PCI_VENDOR_ID_VMWARE;
> + k->device_id = PCI_DEVICE_ID_VMWARE_PVSCSI;
> + k->class_id = PCI_CLASS_STORAGE_SCSI;
> + k->subsystem_id = 0x1000;
> + dc->reset = pvscsi_reset;
> + dc->vmsd = &vmstate_pvscsi;
> + k->config_write = pvscsi_write_config;
> +}
> +
> +static const TypeInfo pvscsi_info = {
> + .name = "pvscsi",
> + .parent = TYPE_PCI_DEVICE,
> + .instance_size = sizeof(PVSCSIState),
> + .class_init = pvscsi_class_init,
> +};
> +
> +static void
> +pvscsi_register_types(void)
> +{
> + type_register_static(&pvscsi_info);
> +
> + trace_pvscsi_register();
> +}
> +
> +type_init(pvscsi_register_types);
> diff --git a/hw/vmw_pvscsi.h b/hw/vmw_pvscsi.h
> new file mode 100644
> index 0000000..17fcf66
> --- /dev/null
> +++ b/hw/vmw_pvscsi.h
> @@ -0,0 +1,434 @@
> +/*
> + * VMware PVSCSI header file
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT. See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * Maintained by: Arvind Kumar <arvindkumar@vmware.com>
> + *
> + */
> +
> +#ifndef VMW_PVSCSI_H
> +#define VMW_PVSCSI_H
> +
> +#define VMW_PAGE_SIZE (4096)
> +#define VMW_PAGE_SHIFT (12)
> +
> +#define MASK(n) ((1 << (n)) - 1) /* make an n-bit mask */
> +
> +/*
> + * host adapter status/error codes
> + */
> +enum HostBusAdapterStatus {
> + BTSTAT_SUCCESS = 0x00, /* CCB complete normally with no errors */
> + BTSTAT_LINKED_COMMAND_COMPLETED = 0x0a,
> + BTSTAT_LINKED_COMMAND_COMPLETED_WITH_FLAG = 0x0b,
> + BTSTAT_DATA_UNDERRUN = 0x0c,
> + BTSTAT_SELTIMEO = 0x11, /* SCSI selection timeout */
> + BTSTAT_DATARUN = 0x12, /* data overrun/underrun */
> + BTSTAT_BUSFREE = 0x13, /* unexpected bus free */
> + BTSTAT_INVPHASE = 0x14, /* invalid bus phase or sequence */
> + /* requested by target */
> + BTSTAT_LUNMISMATCH = 0x17, /* linked CCB has different LUN */
> + /* from first CCB */
> + BTSTAT_SENSFAILED = 0x1b, /* auto request sense failed */
> + BTSTAT_TAGREJECT = 0x1c, /* SCSI II tagged queueing message */
> + /* rejected by target */
> + BTSTAT_BADMSG = 0x1d, /* unsupported message received by */
> + /* the host adapter */
> + BTSTAT_HAHARDWARE = 0x20, /* host adapter hardware failed */
> + BTSTAT_NORESPONSE = 0x21, /* target did not respond to SCSI ATN, */
> + /* sent a SCSI RST */
> + BTSTAT_SENTRST = 0x22, /* host adapter asserted a SCSI RST */
> + BTSTAT_RECVRST = 0x23, /* other SCSI devices asserted a SCSI RST */
> + BTSTAT_DISCONNECT = 0x24, /* target device reconnected improperly */
> + /* (w/o tag) */
> + BTSTAT_BUSRESET = 0x25, /* host adapter issued BUS device reset */
> + BTSTAT_ABORTQUEUE = 0x26, /* abort queue generated */
> + BTSTAT_HASOFTWARE = 0x27, /* host adapter software error */
> + BTSTAT_HATIMEOUT = 0x30, /* host adapter hardware timeout error */
> + BTSTAT_SCSIPARITY = 0x34, /* SCSI parity error detected */
> +};
> +
> +/*
> + * Register offsets.
> + *
> + * These registers are accessible both via i/o space and mm i/o.
> + */
> +
> +enum PVSCSIRegOffset {
> + PVSCSI_REG_OFFSET_COMMAND = 0x0,
> + PVSCSI_REG_OFFSET_COMMAND_DATA = 0x4,
> + PVSCSI_REG_OFFSET_COMMAND_STATUS = 0x8,
> + PVSCSI_REG_OFFSET_LAST_STS_0 = 0x100,
> + PVSCSI_REG_OFFSET_LAST_STS_1 = 0x104,
> + PVSCSI_REG_OFFSET_LAST_STS_2 = 0x108,
> + PVSCSI_REG_OFFSET_LAST_STS_3 = 0x10c,
> + PVSCSI_REG_OFFSET_INTR_STATUS = 0x100c,
> + PVSCSI_REG_OFFSET_INTR_MASK = 0x2010,
> + PVSCSI_REG_OFFSET_KICK_NON_RW_IO = 0x3014,
> + PVSCSI_REG_OFFSET_DEBUG = 0x3018,
> + PVSCSI_REG_OFFSET_KICK_RW_IO = 0x4018,
> +};
> +
> +/*
> + * Virtual h/w commands.
> + */
> +
> +enum PVSCSICommands {
> + PVSCSI_CMD_FIRST = 0, /* has to be first */
> +
> + PVSCSI_CMD_ADAPTER_RESET = 1,
> + PVSCSI_CMD_ISSUE_SCSI = 2,
> + PVSCSI_CMD_SETUP_RINGS = 3,
> + PVSCSI_CMD_RESET_BUS = 4,
> + PVSCSI_CMD_RESET_DEVICE = 5,
> + PVSCSI_CMD_ABORT_CMD = 6,
> + PVSCSI_CMD_CONFIG = 7,
> + PVSCSI_CMD_SETUP_MSG_RING = 8,
> + PVSCSI_CMD_DEVICE_UNPLUG = 9,
> +
> + PVSCSI_CMD_LAST = 10 /* has to be last */
> +};
> +
> +#define PVSCSI_COMMAND_PROCESSING_SUCCEEDED (0)
> +#define PVSCSI_COMMAND_PROCESSING_FAILED (-1)
> +#define PVSCSI_COMMAND_NOT_ENOUGH_DATA (-2)
> +
> +/*
> + * Command descriptor for PVSCSI_CMD_RESET_DEVICE --
> + */
> +
> +struct PVSCSICmdDescResetDevice {
> + uint32_t target;
> + uint8_t lun[8];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSICmdDescResetDevice PVSCSICmdDescResetDevice;
> +
> +/*
> + * Command descriptor for PVSCSI_CMD_ABORT_CMD --
> + *
> + * - currently does not support specifying the LUN.
> + * - pad should be 0.
> + */
> +
> +struct PVSCSICmdDescAbortCmd {
> + uint64_t context;
> + uint32_t target;
> + uint32_t pad;
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSICmdDescAbortCmd PVSCSICmdDescAbortCmd;
> +
> +/*
> + * Command descriptor for PVSCSI_CMD_SETUP_RINGS --
> + *
> + * Notes:
> + * - reqRingNumPages and cmpRingNumPages need to be power of two.
> + * - reqRingNumPages and cmpRingNumPages need to be different from 0,
> + * - reqRingNumPages and cmpRingNumPages need to be inferior to
> + * PVSCSI_SETUP_RINGS_MAX_NUM_PAGES.
> + */
> +
> +#define PVSCSI_SETUP_RINGS_MAX_NUM_PAGES 32
> +struct PVSCSICmdDescSetupRings {
> + uint32_t reqRingNumPages;
> + uint32_t cmpRingNumPages;
> + uint64_t ringsStatePPN;
> + uint64_t reqRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> + uint64_t cmpRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSICmdDescSetupRings PVSCSICmdDescSetupRings;
> +
> +/*
> + * Command descriptor for PVSCSI_CMD_SETUP_MSG_RING --
> + *
> + * Notes:
> + * - this command was not supported in the initial revision of the h/w
> + * interface. Before using it, you need to check that it is supported by
> + * writing PVSCSI_CMD_SETUP_MSG_RING to the 'command' register, then
> + * immediately after read the 'command status' register:
> + * * a value of -1 means that the cmd is NOT supported,
> + * * a value != -1 means that the cmd IS supported.
> + * If it's supported the 'command status' register should return:
> + * sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t).
> + * - this command should be issued _after_ the usual SETUP_RINGS so that the
> + * RingsState page is already setup. If not, the command is a nop.
> + * - numPages needs to be a power of two,
> + * - numPages needs to be different from 0,
> + * - pad should be zero.
> + */
> +
> +#define PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES 16
> +
> +struct PVSCSICmdDescSetupMsgRing {
> + uint32_t numPages;
> + uint32_t pad;
> + uint64_t ringPPNs[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSICmdDescSetupMsgRing PVSCSICmdDescSetupMsgRing;
> +
> +enum PVSCSIMsgType {
> + PVSCSI_MSG_DEV_ADDED = 0,
> + PVSCSI_MSG_DEV_REMOVED = 1,
> + PVSCSI_MSG_LAST = 2,
> +};
> +
> +/*
> + * Msg descriptor.
> + *
> + * sizeof(struct PVSCSIRingMsgDesc) == 128.
> + *
> + * - type is of type enum PVSCSIMsgType.
> + * - the content of args depend on the type of event being delivered.
> + */
> +
> +struct PVSCSIRingMsgDesc {
> + uint32_t type;
> + uint32_t args[31];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSIRingMsgDesc PVSCSIRingMsgDesc;
> +
> +struct PVSCSIMsgDescDevStatusChanged {
> + uint32_t type; /* PVSCSI_MSG_DEV _ADDED / _REMOVED */
> + uint32_t bus;
> + uint32_t target;
> + uint8_t lun[8];
> + uint32_t pad[27];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSIMsgDescDevStatusChanged PVSCSIMsgDescDevStatusChanged;
> +
> +/*
> + * Rings state.
> + *
> + * - the fields:
> + * . msgProdIdx,
> + * . msgConsIdx,
> + * . msgNumEntriesLog2,
> + * .. are only used once the SETUP_MSG_RING cmd has been issued.
> + * - 'pad' helps to ensure that the msg related fields are on their own
> + * cache-line.
> + */
> +
> +struct PVSCSIRingsState {
> + uint32_t reqProdIdx;
> + uint32_t reqConsIdx;
> + uint32_t reqNumEntriesLog2;
> +
> + uint32_t cmpProdIdx;
> + uint32_t cmpConsIdx;
> + uint32_t cmpNumEntriesLog2;
> +
> + uint8_t pad[104];
> +
> + uint32_t msgProdIdx;
> + uint32_t msgConsIdx;
> + uint32_t msgNumEntriesLog2;
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSIRingsState PVSCSIRingsState;
> +
> +/*
> + * Request descriptor.
> + *
> + * sizeof(RingReqDesc) = 128
> + *
> + * - context: is a unique identifier of a command. It could normally be any
> + * 64bit value, however we currently store it in the serialNumber variable
> + * of struct SCSI_Command, so we have the following restrictions due to the
> + * way this field is handled in the vmkernel storage stack:
> + * * this value can't be 0,
> + * * the upper 32bit need to be 0 since serialNumber is as a uint32_t.
> + * Currently tracked as PR 292060.
> + * - dataLen: contains the total number of bytes that need to be transferred.
> + * - dataAddr:
> + * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is set: dataAddr is the PA of the first
> + * s/g table segment, each s/g segment is entirely contained on a single
> + * page of physical memory,
> + * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is NOT set, then dataAddr is the PA of
> + * the buffer used for the DMA transfer,
> + * - flags:
> + * * PVSCSI_FLAG_CMD_WITH_SG_LIST: see dataAddr above,
> + * * PVSCSI_FLAG_CMD_DIR_NONE: no DMA involved,
> + * * PVSCSI_FLAG_CMD_DIR_TOHOST: transfer from device to main memory,
> + * * PVSCSI_FLAG_CMD_DIR_TODEVICE: transfer from main memory to device,
> + * * PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB: reserved to handle CDBs larger than
> + * 16bytes. To be specified.
> + * - vcpuHint: vcpuId of the processor that will be most likely waiting for the
> + * completion of the i/o. For guest OSes that use lowest priority message
> + * delivery mode (such as windows), we use this "hint" to deliver the
> + * completion action to the proper vcpu. For now, we can use the vcpuId of
> + * the processor that initiated the i/o as a likely candidate for the vcpu
> + * that will be waiting for the completion..
> + * - bus should be 0: we currently only support bus 0 for now.
> + * - unused should be zero'd.
> + */
> +
> +#define PVSCSI_FLAG_CMD_WITH_SG_LIST (1 << 0)
> +#define PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB (1 << 1)
> +#define PVSCSI_FLAG_CMD_DIR_NONE (1 << 2)
> +#define PVSCSI_FLAG_CMD_DIR_TOHOST (1 << 3)
> +#define PVSCSI_FLAG_CMD_DIR_TODEVICE (1 << 4)
> +
> +#define PVSCSI_KNOWN_FLAGS \
> + (PVSCSI_FLAG_CMD_WITH_SG_LIST | \
> + PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB | \
> + PVSCSI_FLAG_CMD_DIR_NONE | \
> + PVSCSI_FLAG_CMD_DIR_TOHOST | \
> + PVSCSI_FLAG_CMD_DIR_TODEVICE)
> +
> +struct PVSCSIRingReqDesc {
> + uint64_t context;
> + uint64_t dataAddr;
> + uint64_t dataLen;
> + uint64_t senseAddr;
> + uint32_t senseLen;
> + uint32_t flags;
> + uint8_t cdb[16];
> + uint8_t cdbLen;
> + uint8_t lun[8];
> + uint8_t tag;
> + uint8_t bus;
> + uint8_t target;
> + uint8_t vcpuHint;
> + uint8_t unused[59];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSIRingReqDesc PVSCSIRingReqDesc;
> +
> +/*
> + * Scatter-gather list management.
> + *
> + * As described above, when PVSCSI_FLAG_CMD_WITH_SG_LIST is set in the
> + * RingReqDesc.flags, then RingReqDesc.dataAddr is the PA of the first s/g
> + * table segment.
> + *
> + * - each segment of the s/g table contain a succession of struct
> + * PVSCSISGElement.
> + * - each segment is entirely contained on a single physical page of memory.
> + * - a "chain" s/g element has the flag PVSCSI_SGE_FLAG_CHAIN_ELEMENT set in
> + * PVSCSISGElement.flags and in this case:
> + * * addr is the PA of the next s/g segment,
> + * * length is undefined, assumed to be 0.
> + */
> +
> +struct PVSCSISGElement {
> + uint64_t addr;
> + uint32_t length;
> + uint32_t flags;
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSISGElement PVSCSISGElement;
> +
> +/*
> + * Completion descriptor.
> + *
> + * sizeof(RingCmpDesc) = 32
> + *
> + * - context: identifier of the command. The same thing that was specified
> + * under "context" as part of struct RingReqDesc at initiation time,
> + * - dataLen: number of bytes transferred for the actual i/o operation,
> + * - senseLen: number of bytes written into the sense buffer,
> + * - hostStatus: adapter status,
> + * - scsiStatus: device status,
> + * - pad should be zero.
> + */
> +
> +struct PVSCSIRingCmpDesc {
> + uint64_t context;
> + uint64_t dataLen;
> + uint32_t senseLen;
> + uint16_t hostStatus;
> + uint16_t scsiStatus;
> + uint32_t pad[2];
> +} QEMU_PACKED;
> +
> +typedef struct PVSCSIRingCmpDesc PVSCSIRingCmpDesc;
> +
> +/*
> + * Interrupt status / IRQ bits.
> + */
> +
> +#define PVSCSI_INTR_CMPL_0 (1 << 0)
> +#define PVSCSI_INTR_CMPL_1 (1 << 1)
> +#define PVSCSI_INTR_CMPL_MASK MASK(2)
> +
> +#define PVSCSI_INTR_MSG_0 (1 << 2)
> +#define PVSCSI_INTR_MSG_1 (1 << 3)
> +#define PVSCSI_INTR_MSG_MASK (MASK(2) << 2)
> +
> +#define PVSCSI_INTR_ALL_SUPPORTED MASK(4)
> +
> +/*
> + * Number of MSI-X vectors supported.
> + */
> +#define PVSCSI_MAX_INTRS 24
> +
> +/*
> + * Enumeration of supported MSI-X vectors
> + */
> +#define PVSCSI_VECTOR_COMPLETION 0
> +
> +/*
> + * Misc constants for the rings.
> + */
> +
> +#define PVSCSI_MAX_NUM_PAGES_REQ_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
> +#define PVSCSI_MAX_NUM_PAGES_CMP_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
> +#define PVSCSI_MAX_NUM_PAGES_MSG_RING PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES
> +
> +#define PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE \
> + (VMW_PAGE_SIZE / sizeof(struct PVSCSIRingReqDesc))
> +
> +#define PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE \
> + (VMW_PAGE_SIZE / sizeof(PVSCSIRingCmpDesc))
> +
> +#define PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE \
> + (VMW_PAGE_SIZE / sizeof(PVSCSIRingMsgDesc))
> +
> +#define PVSCSI_MAX_REQ_QUEUE_DEPTH \
> + (PVSCSI_MAX_NUM_PAGES_REQ_RING * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE)
> +
> +#define PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES 1
> +#define PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES 1
> +#define PVSCSI_MEM_SPACE_MISC_NUM_PAGES 2
> +#define PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES 2
> +#define PVSCSI_MEM_SPACE_MSIX_NUM_PAGES 2
> +
> +enum PVSCSIMemSpace {
> + PVSCSI_MEM_SPACE_COMMAND_PAGE = 0,
> + PVSCSI_MEM_SPACE_INTR_STATUS_PAGE = 1,
> + PVSCSI_MEM_SPACE_MISC_PAGE = 2,
> + PVSCSI_MEM_SPACE_KICK_IO_PAGE = 4,
> + PVSCSI_MEM_SPACE_MSIX_TABLE_PAGE = 6,
> + PVSCSI_MEM_SPACE_MSIX_PBA_PAGE = 7,
> +};
> +
> +#define PVSCSI_MEM_SPACE_NUM_PAGES \
> + (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES + \
> + PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES + \
> + PVSCSI_MEM_SPACE_MISC_NUM_PAGES + \
> + PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES + \
> + PVSCSI_MEM_SPACE_MSIX_NUM_PAGES)
> +
> +#define PVSCSI_MEM_SPACE_SIZE (PVSCSI_MEM_SPACE_NUM_PAGES * VMW_PAGE_SIZE)
> +
> +#endif /* VMW_PVSCSI_H */
> diff --git a/trace-events b/trace-events
> index 412f7e4..66037a1 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -761,6 +761,42 @@ pc87312_info_ide(uint32_t base) "base 0x%x"
> pc87312_info_parallel(uint32_t base, uint32_t irq) "base 0x%x, irq %u"
> pc87312_info_serial(int n, uint32_t base, uint32_t irq) "id=%d, base 0x%x, irq %u"
>
> +# hw/pvscsi.c
> +pvscsi_rings_mgr_init_data(uint32_t txr_len_log2, uint32_t rxr_len_log2) "TX/RX rings logarithms set to %d/%d"
> +pvscsi_rings_mgr_init_msg(uint32_t len_log2) "MSG ring logarithm set to %d"
> +pvscsi_rings_mgr_flush_cmp_ring(uint64_t filled_cmp_ptr) "new production counter of completion ring is 0x%"PRIx64""
> +pvscsi_rings_mgr_flush_msg_ring(uint64_t filled_cmp_ptr) "new production counter of message ring is 0x%"PRIx64""
> +pvscsi_update_irq_level(bool raise, uint64_t mask, uint64_t status) "interrupt level set to %d (MASK: 0x%"PRIx64", STATUS: 0x%"PRIx64")"
> +pvscsi_update_irq_msi(void) "sending MSI notification"
> +pvscsi_cmp_ring_put(unsigned long addr) "got completion descriptor 0x%lx"
> +pvscsi_msg_ring_put(unsigned long addr) "got message descriptor 0x%lx"
> +pvscsi_complete_request(uint64_t context, uint64_t len, uint8_t sense_key) "completion: ctx: 0x%"PRIx64", len: 0x%"PRIx64", sense key: %u"
> +pvscsi_get_sg_list(int nsg, size_t size) "get SG list: depth: %u, size: %lu"
> +pvscsi_get_next_sg_elem(uint32_t flags) "unknown flags in SG element (val: 0x%x)"
> +pvscsi_command_complete_not_found(uint32_t tag) "can't find request for tag 0x%x"
> +pvscsi_command_complete_data_run(void) "not all data required for command transferred"
> +pvscsi_command_complete_sense_len(int len) "sense information length is %d bytes"
> +pvscsi_convert_sglist(uint64_t context, unsigned long addr, uint32_t resid) "element: ctx: 0x%"PRIx64" addr: 0x%lx, len: %ul"
> +pvscsi_process_req_descr(uint8_t cmd, uint64_t ctx) "SCSI cmd 0x%x, ctx: 0x%"PRIx64""
> +pvscsi_process_req_descr_unknown_device(void) "command directed to unknown device rejected"
> +pvscsi_process_req_descr_invalid_dir(void) "command with invalid transfer direction rejected"
> +pvscsi_process_io(unsigned long addr) "got descriptor 0x%lx"
> +pvscsi_on_cmd_noimpl(const char* cmd) "unimplemented command %s ignored"
> +pvscsi_on_cmd_reset_dev(uint32_t tgt, int lun, void* dev) "PVSCSI_CMD_RESET_DEVICE[target %u lun %d (dev 0x%p)]"
> +pvscsi_on_cmd_arrived(const char* cmd) "command %s arrived"
> +pvscsi_on_cmd_abort(uint64_t ctx, uint32_t tgt) "command PVSCSI_CMD_ABORT_CMD for ctx 0x%"PRIx64", target %u"
> +pvscsi_on_cmd_unknown(uint64_t cmd_id) "unknown command %"PRIx64""
> +pvscsi_on_cmd_unknown_data(uint32_t data) "data for unknown command 0x:%x"
> +pvscsi_io_write(const char* cmd, uint64_t val) "%s write: %"PRIx64""
> +pvscsi_io_write_unknown(unsigned long addr, unsigned sz, uint64_t val) "unknown write address: 0x%lx size: %u bytes value: 0x%"PRIx64""
> +pvscsi_io_read(const char* cmd, uint64_t status) "%s read: 0x%"PRIx64""
> +pvscsi_io_read_unknown(unsigned long addr, unsigned sz) "unknown read address: 0x%lx size: %u bytes"
> +pvscsi_init_msi_fail(int res) "failed to initialize MSI, error %d"
> +pvscsi_state(const char* state) "starting %s ..."
> +pvscsi_register(void) "PVSCSI QEMU device emulation registered"
> +pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s page: %"PRIx64""
> +pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages: %u"
> +
> # xen-all.c
> xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
> xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1 V6] VMWare PVSCSI paravirtual device implementation
2013-04-10 9:33 ` Paolo Bonzini
@ 2013-04-18 9:38 ` Dmitry Fleytman
2013-04-18 10:54 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Fleytman @ 2013-04-18 9:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Yan Vugenfirer, qemu-devel, Anthony Liguori
[-- Attachment #1: Type: text/plain, Size: 68888 bytes --]
On Wed, Apr 10, 2013 at 12:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Just a few comments, many of them aesthetic.
>
> Il 08/04/2013 20:39, Dmitry Fleytman ha scritto:
> > Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
> > Signed-off-by: Yan Vugenfirer <yan@daynix.com>
> > ---
> > default-configs/pci.mak | 1 +
> > docs/specs/pvscsi-spec.txt | 92 ++++
> > hw/Makefile.objs | 1 +
> > hw/pci/pci.h | 1 +
> > hw/pvscsi.c | 1194
> ++++++++++++++++++++++++++++++++++++++++++++
> > hw/vmw_pvscsi.h | 434 ++++++++++++++++
> > trace-events | 36 ++
> > 7 files changed, 1759 insertions(+)
> > create mode 100644 docs/specs/pvscsi-spec.txt
> > create mode 100644 hw/pvscsi.c
> > create mode 100644 hw/vmw_pvscsi.h
> >
> > diff --git a/default-configs/pci.mak b/default-configs/pci.mak
> > index ce56d58..3f8375c 100644
> > --- a/default-configs/pci.mak
> > +++ b/default-configs/pci.mak
> > @@ -10,6 +10,7 @@ CONFIG_EEPRO100_PCI=y
> > CONFIG_PCNET_PCI=y
> > CONFIG_PCNET_COMMON=y
> > CONFIG_LSI_SCSI_PCI=y
> > +CONFIG_PVSCSI_SCSI_PCI=y
> > CONFIG_MEGASAS_SCSI_PCI=y
> > CONFIG_RTL8139_PCI=y
> > CONFIG_E1000_PCI=y
> > diff --git a/docs/specs/pvscsi-spec.txt b/docs/specs/pvscsi-spec.txt
> > new file mode 100644
> > index 0000000..b2c3a55
> > --- /dev/null
> > +++ b/docs/specs/pvscsi-spec.txt
> > @@ -0,0 +1,92 @@
> > +General Description
> > +===================
> > +
> > +This document describes VMWare PVSCSI device interface specification.
> > +Created by Dmitry Fleytman (dmitry@daynix.com), Daynix Computing LTD.
> > +Based on source code of PVSCSI Linux driver from kernel 3.0.4
> > +
> > +PVSCSI Device Interface Overview
> > +================================
> > +
> > +The interface is based on memory area shared between hypervisor and VM.
> > +Memory area is obtained by driver as device IO memory resource of
> > +PVSCSI_MEM_SPACE_SIZE length.
> > +The shared memory consists of registers area and rings area.
> > +The registers area is used to raise hypervisor interrupts and issue
> device
> > +commands. The rings area is used to transfer data descriptors and SCSI
> > +commands from VM to hypervisor and to transfer messages produced by
> > +hypervisor to VM. Data itself is transferred via virtual scatter-gather
> DMA.
> > +
> > +PVSCSI Device Registers
> > +=======================
> > +
> > +Registers area length is 1 page (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES).
> > +Registers area structure is described by PVSCSIRegOffset enumeration.
>
> The length of the registers area is ... The structure of the registers
> area is described by the PVSCSIRegOffset enum.
>
Fixed.
>
> > +There are registers to issue device command (with optional short data),
> > +issue device interrupt, control interrupts masking.
> > +
> > +PVSCSI Device Rings
> > +===================
> > +
> > +There are three rings in shared memory:
> > +
> > + 1. Request ring (struct PVSCSIRingReqDesc *req_ring)
> > + - ring for OS to device requests
> > + 2. Completion ring (struct PVSCSIRingCmpDesc *cmp_ring)
> > + - ring for device request completions
> > + 3. Message ring (struct PVSCSIRingMsgDesc *msg_ring)
> > + - ring for messages from device.
> > + This ring is optional and may be not configured.
>
> and the guest might not configure it.
>
>
Fixed.
> > +There is a control area (struct PVSCSIRingsState *rings_state) used to
> control
> > +rings operation.
> > +
> > +PVSCSI Device to Host Interrupts
> > +================================
> > +There are following interrupt types supported by PVSCSI device:
> > + 1. Completion interrupts (completion ring notifications):
> > + PVSCSI_INTR_CMPL_0
> > + PVSCSI_INTR_CMPL_1
> > + 2. Message interrupts (message ring notifications):
> > + PVSCSI_INTR_MSG_0
> > + PVSCSI_INTR_MSG_1
> > +
> > +Interrupts are controlled via PVSCSI_REG_OFFSET_INTR_MASK register
> > +Bit set means interrupt enabled, bit cleared - disabled
> > +
> > +Interrupt modes supported are legacy, MSI and MSI-X
> > +In case of legacy interrupts register PVSCSI_REG_OFFSET_INTR_STATUS
> > +used to verify interrupt arrival and to clear interrupt state
> > +Interrupts are cleared by writing processed bits back
> > +to interrupt status register.
>
> In case of legacy interrupts, register PVSCSI_REG_OFFSET_INTR_STATUS
> is used to check which interrupt has arrived. Interrupts are
> acknowledged when the corresponding bit is written to the interrupt
> status register.
>
>
Fixed.
> > +
> > +PVSCSI Device Operation Sequences
> > +=================================
> > +
> > +1. Startup sequence:
> > + a. Issue PVSCSI_CMD_ADAPTER_RESET command;
> > + aa. Windows driver reads interrupt status register here;
> > + b. Issue PVSCSI_CMD_SETUP_MSG_RING command with no additional data,
> > + check status and disable device messages if error returned;
> > + (Omitted if device messages disabled by driver configuration)
>
> Can you add a boolean property to enable/disable the message ring?
>
>
Done.
> > + c. Issue PVSCSI_CMD_SETUP_RINGS command, provide rings
> configuration
> > + as struct PVSCSICmdDescSetupRings;
> > + d. Issue PVSCSI_CMD_SETUP_MSG_RING command again, provide
> > + rings configuration as struct PVSCSICmdDescSetupMsgRing;
> > + e. Unmask completion and message (if device messages enabled)
> interrupts.
> > +
> > +2. Shutdown sequences
> > + a. Mask interrupts;
> > + b. Flush request ring using PVSCSI_REG_OFFSET_KICK_NON_RW_IO;
> > + c. Issue PVSCSI_CMD_ADAPTER_RESET command.
> > +
> > +3. Send request
> > + a. Fill next free request ring descriptor;
> > + b. Issue PVSCSI_REG_OFFSET_KICK_RW_IO for R/W operations;
> > + or PVSCSI_REG_OFFSET_KICK_NON_RW_IO for other operations.
> > +
> > +4. Abort command
> > + a. Issue PVSCSI_CMD_ABORT_CMD command;
> > +
> > +5. Request completion processing
> > + a. Upon completion interrupt arrival process completion
> > + and message (if enabled) rings.
> > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> > index d0b2ecb..6e43763 100644
> > --- a/hw/Makefile.objs
> > +++ b/hw/Makefile.objs
> > @@ -130,6 +130,7 @@ common-obj-$(CONFIG_OPENCORES_ETH) += opencores_eth.o
> > # SCSI layer
> > common-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
> > common-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
> > +common-obj-$(CONFIG_PVSCSI_SCSI_PCI) += pvscsi.o
> > common-obj-$(CONFIG_ESP) += esp.o
> > common-obj-$(CONFIG_ESP_PCI) += esp-pci.o
> >
> > diff --git a/hw/pci/pci.h b/hw/pci/pci.h
> > index 9ea67a3..1767fe5 100644
> > --- a/hw/pci/pci.h
> > +++ b/hw/pci/pci.h
> > @@ -59,6 +59,7 @@
> > #define PCI_DEVICE_ID_VMWARE_SVGA 0x0710
> > #define PCI_DEVICE_ID_VMWARE_NET 0x0720
> > #define PCI_DEVICE_ID_VMWARE_SCSI 0x0730
> > +#define PCI_DEVICE_ID_VMWARE_PVSCSI 0x07C0
> > #define PCI_DEVICE_ID_VMWARE_IDE 0x1729
> > #define PCI_DEVICE_ID_VMWARE_VMXNET3 0x07B0
> >
> > diff --git a/hw/pvscsi.c b/hw/pvscsi.c
> > new file mode 100644
> > index 0000000..4c66671
> > --- /dev/null
> > +++ b/hw/pvscsi.c
> > @@ -0,0 +1,1194 @@
> > +/*
> > + * QEMU VMWARE PVSCSI paravirtual SCSI bus
> > + *
> > + * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
> > + *
> > + * Developed by Daynix Computing LTD (http://www.daynix.com)
> > + *
> > + * Based on implementation by Paolo Bonzini
> > + * http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00729.html
> > + *
> > + * Authors:
> > + * Paolo Bonzini <pbonzini@redhat.com>
> > + * Dmitry Fleytman <dmitry@daynix.com>
> > + * Yan Vugenfirer <yan@daynix.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + * NOTE about MSI-X:
> > + * MSI-X support has been removed for the moment because it leads
> Windows OS
> > + * to crash on startup. The crash happens because Windows driver
> requires
> > + * MSI-X shared memory to be part of the same BAR used for rings state
> > + * registers, etc. This is not supported by QEMU infrastructure so
> separate
> > + * BAR created from MSI-X purposes. Windows driver fails to deal with 2
> BARs.
> > + *
> > + */
> > +
> > +#include "scsi-defs.h"
> > +#include "hw/scsi.h"
> > +#include "hw/pci/msi.h"
> > +#include "vmw_pvscsi.h"
> > +#include "trace.h"
> > +
> > +
> > +#define PVSCSI_MSI_OFFSET (0x50)
> > +#define PVSCSI_USE_64BIT (true)
> > +#define PVSCSI_PER_VECTOR_MASK (false)
> > +
> > +#define PVSCSI_MAX_DEVS (64)
> > +#define PVSCSI_MSIX_NUM_VECTORS (1)
> > +
> > +#define PVSCSI_MAX_CMD_DATA_WORDS \
> > + (sizeof(PVSCSICmdDescSetupRings)/sizeof(uint32_t))
> > +
> > +#define RS_GET_FIELD(rs_pa, field) \
> > + (ldl_le_phys(rs_pa + offsetof(struct PVSCSIRingsState, field)))
> > +#define RS_SET_FIELD(rs_pa, field, val) \
> > + (stl_le_phys(rs_pa + offsetof(struct PVSCSIRingsState, field), val))
> > +
> > +#define TYPE_PVSCSI "pvscsi"
> > +#define PVSCSI(obj) OBJECT_CHECK(PVSCSIState, (obj), TYPE_PVSCSI)
> > +
> > +typedef struct PVSCSIRingsMgr {
>
> PVSCSIRingInfo, or just put it in PVSCSIState.
>
>
Renamed.
> > + uint64_t rs_pa;
> > + uint32_t txr_len_mask;
> > + uint32_t rxr_len_mask;
> > + uint32_t msg_len_mask;
> > + uint64_t
> req_ring_pages_pa[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> > + uint64_t
> cmp_ring_pages_pa[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> > + uint64_t
> msg_ring_pages_pa[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
> > + uint64_t consumed_ptr;
> > + uint64_t filled_cmp_ptr;
> > + uint64_t filled_msg_ptr;
> > +} PVSCSIRingsMgr;
> > +
> > +typedef struct PVSCSISGState {
> > + hwaddr elemAddr;
> > + hwaddr dataAddr;
> > + uint32_t resid;
> > +} PVSCSISGState;
> > +
> > +typedef QTAILQ_HEAD(, PVSCSIRequest) PVSCSIRequestList;
> > +
> > +typedef struct {
> > + PCIDevice parent_obj;
> > + MemoryRegion io_space;
> > + SCSIBus bus;
> > + QEMUBH *completion_worker;
> > + PVSCSIRequestList pending_queue;
> > + PVSCSIRequestList completion_queue;
> > +
> > + uint64_t reg_interrupt_status; /* Interrupt status register
> value */
> > + uint64_t reg_interrupt_enabled; /* Interrupt mask register
> value */
> > + uint64_t reg_command_status; /* Command status register
> value */
> > +
> > + /* Command data adoption mechanism */
> > + uint64_t curr_cmd; /* Last command arrived
> */
> > + uint32_t curr_cmd_data_cntr; /* Amount of data for last
> command */
> > +
> > + /* Collector for current command data */
> > + uint32_t curr_cmd_data[PVSCSI_MAX_CMD_DATA_WORDS];
> > +
> > + uint8_t rings_info_valid; /* Whether data rings
> initialized */
> > + uint8_t msg_ring_info_valid; /* Whether message ring
> initialized */
> > +
> > + uint8_t msi_used; /* Whether MSI support was installed
> successfully */
> > +
> > + PVSCSIRingsMgr rings; /* Data transfer rings manager
> */
> > +} PVSCSIState;
> > +
> > +typedef struct PVSCSIRequest {
> > + SCSIRequest *sreq;
> > + PVSCSIState *dev;
> > + uint8_t sense_key;
> > + uint8_t completed;
> > + int lun;
> > + QEMUSGList sgl;
> > + PVSCSISGState sg;
> > + struct PVSCSIRingReqDesc req;
> > + struct PVSCSIRingCmpDesc cmp;
> > + QTAILQ_ENTRY(PVSCSIRequest) next;
> > +} PVSCSIRequest;
>
> This needs to be serialized and loaded back if you want to support
> migration with rerror=stop/werror=stop. See how it's done in
> virtio-scsi. To test it, you can make a disk readonly with blockdev
> while QEMU is running, migrate to file, make it read-write again and
> migrate back from the file.
>
> This is not blocking the patch though.
>
>
>
We'd prefer to leave this out of scope at current stage.
> > +/* Integer binary logarithm */
> > +static int
> > +pvscsi_log2(uint32_t input)
> > +{
> > + int log = 0;
> > + assert(input > 0);
> > + while (input >> ++log) {
> > + }
> > + return log;
> > +}
> > +
> > +static void
> > +pvscsi_rings_mgr_init_data(PVSCSIRingsMgr *m, PVSCSICmdDescSetupRings
> *ri)
>
> s/pvscsi_rings_mgr/pvscsi_ring/g
>
>
Done :)
> > +{
> > + int i;
> > + uint32_t txr_len_log2, rxr_len_log2;
> > + uint32_t req_ring_size, cmp_ring_size;
> > + m->rs_pa = ri->ringsStatePPN << VMW_PAGE_SHIFT;
> > +
> > + req_ring_size = ri->reqRingNumPages *
> PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> > + cmp_ring_size = ri->cmpRingNumPages *
> PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> > + txr_len_log2 = pvscsi_log2(req_ring_size - 1);
> > + rxr_len_log2 = pvscsi_log2(cmp_ring_size - 1);
> > +
> > + m->txr_len_mask = MASK(txr_len_log2);
> > + m->rxr_len_mask = MASK(rxr_len_log2);
> > +
> > + m->consumed_ptr = 0;
> > + m->filled_cmp_ptr = 0;
> > +
> > + for (i = 0; i < ri->reqRingNumPages; i++) {
> > + m->req_ring_pages_pa[i] = ri->reqRingPPNs[i] << VMW_PAGE_SHIFT;
> > + }
> > +
> > + for (i = 0; i < ri->cmpRingNumPages; i++) {
> > + m->cmp_ring_pages_pa[i] = ri->cmpRingPPNs[i] << VMW_PAGE_SHIFT;
> > + }
> > +
> > + RS_SET_FIELD(m->rs_pa, reqProdIdx, 0);
> > + RS_SET_FIELD(m->rs_pa, reqConsIdx, 0);
> > + RS_SET_FIELD(m->rs_pa, reqNumEntriesLog2, txr_len_log2);
> > +
> > + RS_SET_FIELD(m->rs_pa, cmpProdIdx, 0);
> > + RS_SET_FIELD(m->rs_pa, cmpConsIdx, 0);
> > + RS_SET_FIELD(m->rs_pa, cmpNumEntriesLog2, rxr_len_log2);
> > +
> > + trace_pvscsi_rings_mgr_init_data(txr_len_log2, rxr_len_log2);
> > +
> > + /* Flush ring state page changes */
> > + smp_wmb();
> > +}
> > +
> > +static void
> > +pvscsi_rings_mgr_init_msg(PVSCSIRingsMgr *m, PVSCSICmdDescSetupMsgRing
> *ri)
> > +{
> > + int i;
> > + uint32_t len_log2;
> > + uint32_t ring_size;
> > +
> > + ring_size = ri->numPages * PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
> > + len_log2 = pvscsi_log2(ring_size - 1);
> > +
> > + m->msg_len_mask = MASK(len_log2);
> > +
> > + m->filled_msg_ptr = 0;
> > +
> > + for (i = 0; i < ri->numPages; i++) {
> > + m->msg_ring_pages_pa[i] = ri->ringPPNs[i] << VMW_PAGE_SHIFT;
> > + }
> > +
> > + RS_SET_FIELD(m->rs_pa, msgProdIdx, 0);
> > + RS_SET_FIELD(m->rs_pa, msgConsIdx, 0);
> > + RS_SET_FIELD(m->rs_pa, msgNumEntriesLog2, len_log2);
> > +
> > + trace_pvscsi_rings_mgr_init_msg(len_log2);
> > +
> > + /* Flush ring state page changes */
> > + smp_wmb();
> > +}
> > +
> > +static void
> > +pvscsi_rings_mgr_cleanup(PVSCSIRingsMgr *mgr)
> > +{
> > + mgr->rs_pa = 0;
> > + mgr->txr_len_mask = 0;
> > + mgr->rxr_len_mask = 0;
> > + mgr->msg_len_mask = 0;
> > + mgr->consumed_ptr = 0;
> > + mgr->filled_cmp_ptr = 0;
> > + mgr->filled_msg_ptr = 0;
> > + memset(mgr->req_ring_pages_pa, 0, sizeof(mgr->req_ring_pages_pa));
> > + memset(mgr->cmp_ring_pages_pa, 0, sizeof(mgr->cmp_ring_pages_pa));
> > + memset(mgr->msg_ring_pages_pa, 0, sizeof(mgr->msg_ring_pages_pa));
> > +}
> > +
> > +static hwaddr
> > +pvscsi_rings_mgr_pop_req_descr(PVSCSIRingsMgr *mgr)
> > +{
> > + uint32_t ready_ptr = RS_GET_FIELD(mgr->rs_pa, reqProdIdx);
> > +
> > + if (ready_ptr != mgr->consumed_ptr) {
> > + uint32_t next_ready_ptr =
> > + mgr->consumed_ptr++ & mgr->txr_len_mask;
> > + uint32_t next_ready_page =
> > + next_ready_ptr / PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> > + uint32_t inpage_idx =
> > + next_ready_ptr % PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> > +
> > + return mgr->req_ring_pages_pa[next_ready_page] +
> > + inpage_idx * sizeof(PVSCSIRingReqDesc);
> > + } else {
> > + return 0;
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_rings_mgr_flush_req_ring(PVSCSIRingsMgr *mgr)
>
> pvscsi_ring_flush_req, and likewise for other functions that have _ring
> at the end.
>
>
Renamed.
> > +{
> > + RS_SET_FIELD(mgr->rs_pa, reqConsIdx, mgr->consumed_ptr);
> > +}
> > +
> > +static hwaddr
> > +pvscsi_rings_mgr_pop_cmp_descr(PVSCSIRingsMgr *mgr)
> > +{
> > + /*
> > + * According to Linux driver code it explicitly verifies that number
> > + * of requests being processed by device is less then the size of
> > + * completion queue, so device may omit completion queue overflow
> > + * conditions check. We assume that this is true for other (Windows)
> > + * drivers as well.
> > + */
> > +
> > + uint32_t free_cmp_ptr =
> > + mgr->filled_cmp_ptr++ & mgr->rxr_len_mask;
> > + uint32_t free_cmp_page =
> > + free_cmp_ptr / PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> > + uint32_t inpage_idx =
> > + free_cmp_ptr % PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> > + return mgr->cmp_ring_pages_pa[free_cmp_page] +
> > + inpage_idx * sizeof(PVSCSIRingCmpDesc);
> > +}
> > +
> > +static hwaddr
> > +pvscsi_rings_mgr_pop_msg_descr(PVSCSIRingsMgr *mgr)
> > +{
> > + uint32_t free_msg_ptr =
> > + mgr->filled_msg_ptr++ & mgr->msg_len_mask;
> > + uint32_t free_msg_page =
> > + free_msg_ptr / PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
> > + uint32_t inpage_idx =
> > + free_msg_ptr % PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE;
> > + return mgr->msg_ring_pages_pa[free_msg_page] +
> > + inpage_idx * sizeof(PVSCSIRingMsgDesc);
> > +}
> > +
> > +static void
> > +pvscsi_rings_mgr_flush_cmp_ring(PVSCSIRingsMgr *mgr)
> > +{
> > + /* Flush descriptor changes */
> > + smp_wmb();
> > +
> > + trace_pvscsi_rings_mgr_flush_cmp_ring(mgr->filled_cmp_ptr);
> > +
> > + RS_SET_FIELD(mgr->rs_pa, cmpProdIdx, mgr->filled_cmp_ptr);
> > +}
> > +
> > +static bool
> > +pvscsi_rings_mgr_msg_has_room(PVSCSIRingsMgr *mgr)
> > +{
> > + uint32_t prodIdx = RS_GET_FIELD(mgr->rs_pa, msgProdIdx);
> > + uint32_t consIdx = RS_GET_FIELD(mgr->rs_pa, msgConsIdx);
> > +
> > + return (prodIdx - consIdx) < (mgr->msg_len_mask + 1);
> > +}
> > +
> > +static void
> > +pvscsi_rings_mgr_flush_msg_ring(PVSCSIRingsMgr *mgr)
> > +{
> > + /* Flush descriptor changes */
> > + smp_wmb();
> > +
> > + trace_pvscsi_rings_mgr_flush_msg_ring(mgr->filled_msg_ptr);
> > +
> > + RS_SET_FIELD(mgr->rs_pa, msgProdIdx, mgr->filled_msg_ptr);
> > +}
> > +
> > +static void
> > +pvscsi_reset_state(PVSCSIState *s)
> > +{
> > + s->curr_cmd = PVSCSI_CMD_FIRST;
> > + s->curr_cmd_data_cntr = 0;
> > + s->reg_command_status = PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> > + s->reg_interrupt_status = 0;
> > + pvscsi_rings_mgr_cleanup(&s->rings);
> > + s->rings_info_valid = FALSE;
> > + s->msg_ring_info_valid = FALSE;
> > + QTAILQ_INIT(&s->pending_queue);
> > + QTAILQ_INIT(&s->completion_queue);
> > +}
> > +
> > +static void
> > +pvscsi_free_queue(PVSCSIRequestList *req_list)
>
> This shouldn't be needed.
>
>
Doesn't one need to clear completion queue on reset command arrival from
driver?
> > +{
> > + PVSCSIRequest *pvscsi_req;
> > +
> > + while (!QTAILQ_EMPTY(req_list)) {
> > + pvscsi_req = QTAILQ_FIRST(req_list);
> > + QTAILQ_REMOVE(req_list, pvscsi_req, next);
> > + g_free(pvscsi_req);
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_reset_adapter(PVSCSIState *s)
> > +{
> > + qbus_reset_all_fn(&s->bus);
> > + pvscsi_free_queue(&s->completion_queue);
> > + assert(QTAILQ_EMPTY(&s->pending_queue));
> > + pvscsi_reset_state(s);
> > +}
> > +
> > +static void
> > +pvscsi_update_irq_status(PVSCSIState *s)
> > +{
> > + PCIDevice *d = PCI_DEVICE(s);
> > + bool should_raise = s->reg_interrupt_enabled &
> s->reg_interrupt_status;
> > +
> > + trace_pvscsi_update_irq_level(should_raise,
> s->reg_interrupt_enabled,
> > + s->reg_interrupt_status);
> > +
> > + if (s->msi_used && msi_enabled(d)) {
> > + if (should_raise) {
> > + trace_pvscsi_update_irq_msi();
> > + msi_notify(d, PVSCSI_VECTOR_COMPLETION);
> > + }
> > + return;
> > + }
> > +
> > + qemu_set_irq(d->irq[0], !!should_raise);
> > +}
> > +
> > +static void
> > +pvscsi_raise_completion_interrupt(PVSCSIState *s)
> > +{
> > + s->reg_interrupt_status |= PVSCSI_INTR_CMPL_0;
>
> Did you find out how you're supposed to use PVSCSI_INTR_CMPL_1?
>
>
No :(
> > + /* Memory barrier to flush interrupt status register changes*/
> > + smp_wmb();
> > +
> > + pvscsi_update_irq_status(s);
> > +}
> > +
> > +static void
> > +pvscsi_raise_message_interrupt(PVSCSIState *s)
> > +{
> > + s->reg_interrupt_status |= PVSCSI_INTR_MSG_0;
> > +
> > + /* Memory barrier to flush interrupt status register changes*/
> > + smp_wmb();
> > +
> > + pvscsi_update_irq_status(s);
> > +}
> > +
> > +static void
> > +pvscsi_cmp_ring_put(PVSCSIState *s, struct PVSCSIRingCmpDesc *cmp_desc)
> > +{
> > + hwaddr cmp_descr_pa;
> > +
> > + cmp_descr_pa = pvscsi_rings_mgr_pop_cmp_descr(&s->rings);
> > + trace_pvscsi_cmp_ring_put(cmp_descr_pa);
> > + cpu_physical_memory_write(cmp_descr_pa, (void *)cmp_desc,
> > + sizeof(*cmp_desc));
> > +}
> > +
> > +static void
> > +pvscsi_msg_ring_put(PVSCSIState *s, struct PVSCSIRingMsgDesc *msg_desc)
> > +{
> > + hwaddr msg_descr_pa;
> > +
> > + msg_descr_pa = pvscsi_rings_mgr_pop_msg_descr(&s->rings);
> > + trace_pvscsi_msg_ring_put(msg_descr_pa);
> > + cpu_physical_memory_write(msg_descr_pa, (void *)msg_desc,
> > + sizeof(*msg_desc));
> > +}
> > +
> > +static void
> > +pvscsi_process_completion_queue(void *opaque)
> > +{
> > + PVSCSIState *s = opaque;
> > + PVSCSIRequest *pvscsi_req;
> > + bool has_completed = false;
> > +
> > + while (!QTAILQ_EMPTY(&s->completion_queue)) {
> > + pvscsi_req = QTAILQ_FIRST(&s->completion_queue);
> > + QTAILQ_REMOVE(&s->completion_queue, pvscsi_req, next);
> > + pvscsi_cmp_ring_put(s, &pvscsi_req->cmp);
> > + g_free(pvscsi_req);
> > + has_completed++;
> > + }
> > +
> > + if (has_completed) {
> > + pvscsi_rings_mgr_flush_cmp_ring(&s->rings);
> > + pvscsi_raise_completion_interrupt(s);
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_schedule_completion_processing(PVSCSIState *s)
> > +{
> > + /* Try putting more complete requests on the ring. */
> > + if (!QTAILQ_EMPTY(&s->completion_queue)) {
> > + qemu_bh_schedule(s->completion_worker);
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_complete_request(PVSCSIState *s, PVSCSIRequest *r)
> > +{
> > + assert(!r->completed);
> > +
> > + trace_pvscsi_complete_request(r->cmp.context, r->cmp.dataLen,
> > + r->sense_key);
> > + if (r->sreq != NULL) {
> > + scsi_req_unref(r->sreq);
> > + r->sreq = NULL;
> > + }
> > + r->completed = 1;
> > + QTAILQ_REMOVE(&s->pending_queue, r, next);
> > + QTAILQ_INSERT_TAIL(&s->completion_queue, r, next);
> > + pvscsi_schedule_completion_processing(s);
> > +}
> > +
> > +static QEMUSGList *pvscsi_get_sg_list(SCSIRequest *r)
> > +{
> > + PVSCSIRequest *req = r->hba_private;
> > +
> > + trace_pvscsi_get_sg_list(req->sgl.nsg, req->sgl.size);
> > +
> > + return &req->sgl;
> > +}
> > +
> > +static void
> > +pvscsi_get_next_sg_elem(PVSCSISGState *sg)
> > +{
> > + struct PVSCSISGElement elem;
> > +
> > + for (;; sg->elemAddr = elem.addr) {
>
> Please remove the for loop altogether.
>
Removed.
>
> > + cpu_physical_memory_read(sg->elemAddr, (void *)&elem,
> sizeof(elem));
> > + if ((elem.flags & ~PVSCSI_KNOWN_FLAGS) != 0) {
> > + /*
> > + * There is PVSCSI_SGE_FLAG_CHAIN_ELEMENT flag described in
> > + * header file but its value is unknown. This flag requires
> > + * additional processing, so we put warning here to catch it
> > + * some day and make proper implementation
> > + */
> > + trace_pvscsi_get_next_sg_elem(elem.flags);
> > + }
> > + break;
> > + }
> > +
> > + sg->elemAddr += sizeof(elem);
> > + sg->dataAddr = elem.addr;
> > + sg->resid = elem.length;
> > +}
> > +
> > +static void
> > +pvscsi_write_sense(PVSCSIRequest *r, uint8_t *sense, int len)
> > +{
> > + r->cmp.senseLen = MIN(r->req.senseLen, len);
> > + r->sense_key = sense[2];
>
> The key is in sense[1] if bit 1 of sense[0] is 1. See scsi_build_sense
> in hw/scsi/scsi-bus.c.
>
>
Thanks for clarification. Done.
> > + cpu_physical_memory_write(r->req.senseAddr, sense,
> r->cmp.senseLen);
> > +}
> > +
> > +static void
> > +pvscsi_command_complete(SCSIRequest *req, uint32_t status, size_t resid)
> > +{
> > + PVSCSIRequest *pvscsi_req = req->hba_private;
> > + PVSCSIState *s = pvscsi_req->dev;
> > +
> > + if (!pvscsi_req) {
> > + trace_pvscsi_command_complete_not_found(req->tag);
> > + return;
> > + }
> > +
> > + if (resid) {
> > + /* Short transfer. */
> > + trace_pvscsi_command_complete_data_run();
> > + pvscsi_req->cmp.hostStatus = BTSTAT_DATARUN;
> > + }
> > +
> > + pvscsi_req->cmp.scsiStatus = status;
> > + if (pvscsi_req->cmp.scsiStatus == CHECK_CONDITION) {
> > + uint8_t sense[SCSI_SENSE_BUF_SIZE];
> > + int sense_len =
> > + scsi_req_get_sense(pvscsi_req->sreq, sense, sizeof(sense));
> > +
> > + trace_pvscsi_command_complete_sense_len(sense_len);
> > + pvscsi_write_sense(pvscsi_req, sense, sense_len);
> > + }
> > + qemu_sglist_destroy(&pvscsi_req->sgl);
> > + pvscsi_complete_request(s, pvscsi_req);
> > +}
> > +
> > +static void
> > +pvscsi_send_msg(PVSCSIState *s, SCSIDevice *dev, uint32_t msg_type)
> > +{
> > + if (s->msg_ring_info_valid &&
> pvscsi_rings_mgr_msg_has_room(&s->rings)) {
> > + PVSCSIMsgDescDevStatusChanged msg = {0};
> > +
> > + msg.type = msg_type;
> > + msg.bus = dev->channel;
> > + msg.target = dev->id;
> > + msg.lun[1] = dev->lun;
> > +
> > + pvscsi_msg_ring_put(s, (PVSCSIRingMsgDesc *)&msg);
> > + pvscsi_rings_mgr_flush_msg_ring(&s->rings);
> > + pvscsi_raise_message_interrupt(s);
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_hotplug(SCSIBus *bus, SCSIDevice *dev)
> > +{
> > + PVSCSIState *s = container_of(bus, PVSCSIState, bus);
> > + pvscsi_send_msg(s, dev, PVSCSI_MSG_DEV_ADDED);
> > +}
> > +
> > +static void
> > +pvscsi_hot_unplug(SCSIBus *bus, SCSIDevice *dev)
> > +{
> > + PVSCSIState *s = container_of(bus, PVSCSIState, bus);
> > + pvscsi_send_msg(s, dev, PVSCSI_MSG_DEV_REMOVED);
> > +}
> > +
> > +static void
> > +pvscsi_request_cancelled(SCSIRequest *req)
> > +{
> > + PVSCSIRequest *pvscsi_req = req->hba_private;
> > + PVSCSIState *s = pvscsi_req->dev;
> > +
> > + if (pvscsi_req->cmp.hostStatus == BTSTAT_SUCCESS) {
> > + pvscsi_req->cmp.hostStatus = BTSTAT_ABORTQUEUE;
> > + }
>
> virtio-scsi has a "resetting" field and sets BTSTAT_BUSRESET if it is
> one. Perhaps you can do the same.
>
>
Good idea! Done.
> > + pvscsi_complete_request(s, pvscsi_req);
> > +}
> > +
> > +static SCSIDevice*
> > +pvscsi_device_find(PVSCSIState *s, int channel, int target,
> > + uint8_t *requested_lun, uint8_t *target_lun)
> > +{
> > + if (requested_lun[0] || requested_lun[2] || requested_lun[3] ||
> > + requested_lun[4] || requested_lun[5] || requested_lun[6] ||
> > + requested_lun[7] || (target > PVSCSI_MAX_DEVS)) {
> > + return NULL;
> > + } else {
> > + *target_lun = requested_lun[1];
> > + return scsi_device_find(&s->bus, channel, target, *target_lun);
> > + }
> > +}
> > +
> > +static PVSCSIRequest *
> > +pvscsi_queue_pending_descriptor(PVSCSIState *s, SCSIDevice **d,
> > + struct PVSCSIRingReqDesc *descr)
> > +{
> > + PVSCSIRequest *pvscsi_req;
> > + uint8_t lun;
> > +
> > + pvscsi_req = g_malloc0(sizeof(*pvscsi_req));
> > + pvscsi_req->dev = s;
> > + pvscsi_req->req = *descr;
> > + pvscsi_req->cmp.context = pvscsi_req->req.context;
> > + QTAILQ_INSERT_TAIL(&s->pending_queue, pvscsi_req, next);
> > +
> > + *d = pvscsi_device_find(s, descr->bus, descr->target, descr->lun,
> &lun);
> > + if (!*d) {
> > + return pvscsi_req;
> > + }
> > +
> > + pvscsi_req->lun = lun;
> > + return pvscsi_req;
> > +}
> > +
> > +static void
> > +pvscsi_convert_sglist(PVSCSIRequest *r)
> > +{
> > + int chunk_size;
> > + uint64_t data_length = r->req.dataLen;
> > + PVSCSISGState sg = r->sg;
> > + while (data_length) {
> > + while (!sg.resid) {
> > + pvscsi_get_next_sg_elem(&sg);
> > + trace_pvscsi_convert_sglist(r->req.context, r->sg.dataAddr,
> > + r->sg.resid);
> > + }
> > + assert(data_length > 0);
> > + chunk_size = MIN((unsigned) data_length, sg.resid);
> > + if (chunk_size) {
> > + qemu_sglist_add(&r->sgl, sg.dataAddr, chunk_size);
> > + }
> > +
> > + sg.dataAddr += chunk_size;
> > + data_length -= chunk_size;
> > + sg.resid -= chunk_size;
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_build_sglist(PVSCSIState *s, PVSCSIRequest *r)
> > +{
> > + PCIDevice *d = PCI_DEVICE(s);
> > +
> > + qemu_sglist_init(&r->sgl, 1, pci_dma_context(d));
> > + if (r->req.flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
> > + pvscsi_convert_sglist(r);
> > + } else {
> > + qemu_sglist_add(&r->sgl, r->req.dataAddr, r->req.dataLen);
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_process_request_descriptor(PVSCSIState *s,
> > + struct PVSCSIRingReqDesc *descr)
> > +{
> > + SCSIDevice *d;
> > + PVSCSIRequest *r = pvscsi_queue_pending_descriptor(s, &d, descr);
> > + int64_t n;
> > +
> > + trace_pvscsi_process_req_descr(descr->cdb[0], descr->context);
> > +
> > + if (!d) {
> > + r->cmp.hostStatus = BTSTAT_SELTIMEO;
> > + trace_pvscsi_process_req_descr_unknown_device();
> > + pvscsi_complete_request(s, r);
> > + return;
> > + }
> > +
> > + if (descr->flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
> > + r->sg.elemAddr = descr->dataAddr;
> > + }
> > +
> > + r->sreq = scsi_req_new(d, descr->context, r->lun, descr->cdb, r);
> > + if (r->sreq->cmd.mode == SCSI_XFER_FROM_DEV &&
> > + (descr->flags & PVSCSI_FLAG_CMD_DIR_TODEVICE)) {
> > + r->cmp.hostStatus = BTSTAT_BADMSG;
> > + trace_pvscsi_process_req_descr_invalid_dir();
> > + scsi_req_cancel(r->sreq);
> > + return;
> > + }
> > + if (r->sreq->cmd.mode == SCSI_XFER_TO_DEV &&
> > + (descr->flags & PVSCSI_FLAG_CMD_DIR_TOHOST)) {
> > + r->cmp.hostStatus = BTSTAT_BADMSG;
> > + trace_pvscsi_process_req_descr_invalid_dir();
> > + scsi_req_cancel(r->sreq);
> > + return;
> > + }
> > +
> > + pvscsi_build_sglist(s, r);
> > + n = scsi_req_enqueue(r->sreq);
> > +
> > + if (n) {
> > + scsi_req_continue(r->sreq);
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_process_io(PVSCSIState *s)
> > +{
> > + PVSCSIRingReqDesc descr;
> > + hwaddr next_descr_pa;
> > +
> > + assert(s->rings_info_valid);
> > + while ((next_descr_pa = pvscsi_rings_mgr_pop_req_descr(&s->rings))
> != 0) {
> > +
> > + /* Only read after production index verification */
> > + smp_rmb();
> > +
> > + trace_pvscsi_process_io(next_descr_pa);
> > + cpu_physical_memory_read(next_descr_pa, &descr, sizeof(descr));
> > + pvscsi_process_request_descriptor(s, &descr);
> > + }
> > +
> > + pvscsi_rings_mgr_flush_req_ring(&s->rings);
> > +}
> > +
> > +static void
> > +pvscsi_dbg_dump_tx_rings_config(PVSCSICmdDescSetupRings *rc)
> > +{
> > + int i;
> > + trace_pvscsi_tx_rings_ppn("Rings State", rc->ringsStatePPN);
> > +
> > + trace_pvscsi_tx_rings_num_pages("Request Ring",
> rc->reqRingNumPages);
> > + for (i = 0; i < rc->reqRingNumPages; i++) {
> > + trace_pvscsi_tx_rings_ppn("Request Ring", rc->reqRingPPNs[i]);
> > + }
> > +
> > + trace_pvscsi_tx_rings_num_pages("Confirm Ring",
> rc->cmpRingNumPages);
> > + for (i = 0; i < rc->cmpRingNumPages; i++) {
> > + trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->reqRingPPNs[i]);
> > + }
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_config(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_CONFIG");
> > + return PVSCSI_COMMAND_PROCESSING_FAILED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_unplug(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_DEVICE_UNPLUG");
> > + return PVSCSI_COMMAND_PROCESSING_FAILED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_issue_scsi(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_noimpl("PVSCSI_CMD_ISSUE_SCSI");
> > + return PVSCSI_COMMAND_PROCESSING_FAILED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_setup_rings(PVSCSIState *s)
> > +{
> > + PVSCSICmdDescSetupRings *rc =
> > + (PVSCSICmdDescSetupRings *) s->curr_cmd_data;
> > +
> > + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_RINGS");
> > +
> > + pvscsi_dbg_dump_tx_rings_config(rc);
> > + pvscsi_rings_mgr_init_data(&s->rings, rc);
> > + s->rings_info_valid = TRUE;
> > + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_abort(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_abort(
> > + ((struct PVSCSICmdDescAbortCmd *) s->curr_cmd_data)->context,
> > + ((struct PVSCSICmdDescAbortCmd *) s->curr_cmd_data)->target);
> > + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
>
> You need to call scsi_req_cancel here before returning.
>
>
It turned our that this callback was never implemented. Fixed.
Thanks.
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_unknown(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_unknown_data(s->curr_cmd_data[0]);
> > + return PVSCSI_COMMAND_PROCESSING_FAILED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_reset_device(PVSCSIState *s)
> > +{
> > + uint8_t target_lun = 0;
> > + struct PVSCSICmdDescResetDevice *cmd =
> > + (struct PVSCSICmdDescResetDevice *) s->curr_cmd_data;
> > + SCSIDevice *sdev;
> > +
> > + sdev = pvscsi_device_find(s, 0, cmd->target, cmd->lun, &target_lun);
> > +
> > + trace_pvscsi_on_cmd_reset_dev(cmd->target, (int) target_lun, sdev);
> > +
> > + if (sdev != NULL) {
> > + device_reset(&sdev->qdev);
> > + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> > + }
> > +
> > + return PVSCSI_COMMAND_PROCESSING_FAILED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_reset_bus(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_RESET_BUS");
> > +
> > + qbus_reset_all_fn(&s->bus);
> > + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_setup_msg_ring(PVSCSIState *s)
> > +{
> > + PVSCSICmdDescSetupMsgRing *rc =
> > + (PVSCSICmdDescSetupMsgRing *) s->curr_cmd_data;
> > +
> > + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_MSG_RING");
> > +
> > + if (s->rings_info_valid) {
> > + pvscsi_rings_mgr_init_msg(&s->rings, rc);
> > + s->msg_ring_info_valid = TRUE;
> > + }
> > + return sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t);
> > +}
> > +
> > +static uint64_t
> > +pvscsi_on_cmd_adapter_reset(PVSCSIState *s)
> > +{
> > + trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_ADAPTER_RESET");
> > +
> > + pvscsi_reset_adapter(s);
> > + return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> > +}
> > +
> > +static const struct {
> > + int data_size;
> > + uint64_t (*handler_fn)(PVSCSIState *s);
> > +} pvscsi_commands[] = {
> > + [PVSCSI_CMD_FIRST] = {
> > + .data_size = 0,
> > + .handler_fn = pvscsi_on_cmd_unknown,
> > + },
> > +
> > + /* Not implemented, data size defined based on what arrives on
> windows */
> > + [PVSCSI_CMD_CONFIG] = {
> > + .data_size = 6 * sizeof(uint32_t),
> > + .handler_fn = pvscsi_on_cmd_config,
> > + },
> > +
> > + /* Command not implemented, data size is unknown */
> > + [PVSCSI_CMD_ISSUE_SCSI] = {
> > + .data_size = 0,
> > + .handler_fn = pvscsi_on_issue_scsi,
> > + },
> > +
> > + /* Command not implemented, data size is unknown */
> > + [PVSCSI_CMD_DEVICE_UNPLUG] = {
> > + .data_size = 0,
> > + .handler_fn = pvscsi_on_cmd_unplug,
> > + },
> > +
> > + [PVSCSI_CMD_SETUP_RINGS] = {
> > + .data_size = sizeof(PVSCSICmdDescSetupRings),
> > + .handler_fn = pvscsi_on_cmd_setup_rings,
> > + },
> > +
> > + [PVSCSI_CMD_RESET_DEVICE] = {
> > + .data_size = sizeof(struct PVSCSICmdDescResetDevice),
> > + .handler_fn = pvscsi_on_cmd_reset_device,
> > + },
> > +
> > + [PVSCSI_CMD_RESET_BUS] = {
> > + .data_size = 0,
> > + .handler_fn = pvscsi_on_cmd_reset_bus,
> > + },
> > +
> > + [PVSCSI_CMD_SETUP_MSG_RING] = {
> > + .data_size = sizeof(PVSCSICmdDescSetupMsgRing),
> > + .handler_fn = pvscsi_on_cmd_setup_msg_ring,
> > + },
> > +
> > + [PVSCSI_CMD_ADAPTER_RESET] = {
> > + .data_size = 0,
> > + .handler_fn = pvscsi_on_cmd_adapter_reset,
> > + },
> > +
> > + [PVSCSI_CMD_ABORT_CMD] = {
> > + .data_size = sizeof(struct PVSCSICmdDescAbortCmd),
> > + .handler_fn = pvscsi_on_cmd_abort,
> > + },
> > +};
> > +
> > +static void
> > +pvscsi_do_command_processing(PVSCSIState *s)
> > +{
> > + size_t bytes_arrived = s->curr_cmd_data_cntr * sizeof(uint32_t);
> > +
> > + assert(s->curr_cmd < PVSCSI_CMD_LAST);
> > + if (bytes_arrived >= pvscsi_commands[s->curr_cmd].data_size) {
> > + s->reg_command_status =
> pvscsi_commands[s->curr_cmd].handler_fn(s);
> > + s->curr_cmd = PVSCSI_CMD_FIRST;
> > + s->curr_cmd_data_cntr = 0;
> > + }
> > +}
> > +
> > +static void
> > +pvscsi_on_command_data(PVSCSIState *s, uint32_t value)
> > +{
> > + size_t bytes_arrived = s->curr_cmd_data_cntr * sizeof(uint32_t);
> > +
> > + assert(bytes_arrived < sizeof(s->curr_cmd_data));
> > + s->curr_cmd_data[s->curr_cmd_data_cntr++] = value;
> > +
> > + pvscsi_do_command_processing(s);
> > +}
> > +
> > +static void
> > +pvscsi_on_command(PVSCSIState *s, uint64_t cmd_id)
> > +{
> > + if ((cmd_id > PVSCSI_CMD_FIRST) && (cmd_id < PVSCSI_CMD_LAST)) {
> > + s->curr_cmd = cmd_id;
> > + } else {
> > + s->curr_cmd = PVSCSI_CMD_FIRST;
> > + trace_pvscsi_on_cmd_unknown(cmd_id);
> > + }
> > +
> > + s->curr_cmd_data_cntr = 0;
> > + s->reg_command_status = PVSCSI_COMMAND_NOT_ENOUGH_DATA;
> > +
> > + pvscsi_do_command_processing(s);
> > +}
> > +
> > +static void
> > +pvscsi_io_write(void *opaque, hwaddr addr,
> > + uint64_t val, unsigned size)
> > +{
> > + PVSCSIState *s = opaque;
> > +
> > + switch (addr) {
> > + case PVSCSI_REG_OFFSET_COMMAND:
> > + pvscsi_on_command(s, val);
> > + break;
> > +
> > + case PVSCSI_REG_OFFSET_COMMAND_DATA:
> > + pvscsi_on_command_data(s, (uint32_t) val);
> > + break;
> > +
> > + case PVSCSI_REG_OFFSET_INTR_STATUS:
> > + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_INTR_STATUS", val);
> > + s->reg_interrupt_status &= ~val;
> > + pvscsi_update_irq_status(s);
> > + pvscsi_schedule_completion_processing(s);
> > + break;
> > +
> > + case PVSCSI_REG_OFFSET_INTR_MASK:
> > + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_INTR_MASK", val);
> > + s->reg_interrupt_enabled = val;
> > + pvscsi_update_irq_status(s);
> > + break;
> > +
> > + case PVSCSI_REG_OFFSET_KICK_NON_RW_IO:
> > + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_KICK_NON_RW_IO", val);
> > + pvscsi_process_io(s);
> > + break;
> > +
> > + case PVSCSI_REG_OFFSET_KICK_RW_IO:
> > + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_KICK_RW_IO", val);
> > + pvscsi_process_io(s);
> > + break;
> > +
> > + case PVSCSI_REG_OFFSET_DEBUG:
> > + trace_pvscsi_io_write("PVSCSI_REG_OFFSET_DEBUG", val);
> > + break;
> > +
> > + default:
> > + trace_pvscsi_io_write_unknown(addr, size, val);
> > + break;
> > + }
> > +
> > +}
> > +
> > +static uint64_t
> > +pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
> > +{
> > + PVSCSIState *s = opaque;
> > +
> > + switch (addr) {
> > + case PVSCSI_REG_OFFSET_INTR_STATUS:
> > + trace_pvscsi_io_read("PVSCSI_REG_OFFSET_INTR_STATUS",
> > + s->reg_interrupt_status);
> > + return s->reg_interrupt_status;
> > +
> > + case PVSCSI_REG_OFFSET_INTR_MASK:
> > + trace_pvscsi_io_read("PVSCSI_REG_OFFSET_INTR_MASK",
> > + s->reg_interrupt_status);
> > + return s->reg_interrupt_enabled;
> > +
> > + case PVSCSI_REG_OFFSET_COMMAND_STATUS:
> > + trace_pvscsi_io_read("PVSCSI_REG_OFFSET_COMMAND_STATUS",
> > + s->reg_interrupt_status);
> > + return s->reg_command_status;
> > +
> > + default:
> > + trace_pvscsi_io_read_unknown(addr, size);
> > + return 0;
> > + }
> > +}
> > +
> > +
> > +static bool
> > +pvscsi_init_msi(PVSCSIState *s)
> > +{
> > + int res;
> > + PCIDevice *d = PCI_DEVICE(s);
> > +
> > + res = msi_init(d, PVSCSI_MSI_OFFSET, PVSCSI_MSIX_NUM_VECTORS,
> > + PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK);
> > + if (res < 0) {
> > + trace_pvscsi_init_msi_fail(res);
> > + s->msi_used = false;
> > + } else {
> > + s->msi_used = true;
> > + }
> > +
> > + return s->msi_used;
> > +}
> > +
> > +static void
> > +pvscsi_cleanup_msi(PVSCSIState *s)
> > +{
> > + PCIDevice *d = PCI_DEVICE(s);
> > +
> > + if (s->msi_used) {
> > + msi_uninit(d);
> > + }
> > +}
> > +
> > +static const MemoryRegionOps pvscsi_ops = {
> > + .read = pvscsi_io_read,
> > + .write = pvscsi_io_write,
> > + .endianness = DEVICE_LITTLE_ENDIAN,
> > + .impl = {
> > + .min_access_size = 4,
> > + .max_access_size = 4,
> > + },
> > +};
> > +
> > +static const struct SCSIBusInfo pvscsi_scsi_info = {
> > + .tcq = true,
> > + .max_target = PVSCSI_MAX_DEVS,
> > + .max_channel = 0,
> > + .max_lun = 0,
> > +
> > + .get_sg_list = pvscsi_get_sg_list,
> > + .complete = pvscsi_command_complete,
> > + .cancel = pvscsi_request_cancelled,
> > + .hotplug = pvscsi_hotplug,
> > + .hot_unplug = pvscsi_hot_unplug,
> > +};
> > +
> > +static int
> > +pvscsi_init(PCIDevice *pci_dev)
> > +{
> > + PVSCSIState *s = PVSCSI(pci_dev);
> > +
> > + trace_pvscsi_state("init");
> > +
> > + /* PCI subsystem ID */
> > + pci_dev->config[PCI_SUBSYSTEM_ID] = 0x00;
> > + pci_dev->config[PCI_SUBSYSTEM_ID + 1] = 0x10;
> > +
> > + /* PCI latency timer = 255 */
> > + pci_dev->config[PCI_LATENCY_TIMER] = 0xff;
> > +
> > + /* Interrupt pin A */
> > + pci_config_set_interrupt_pin(pci_dev->config, 1);
> > +
> > + memory_region_init_io(&s->io_space, &pvscsi_ops, s,
> > + "pvscsi-io", PVSCSI_MEM_SPACE_SIZE);
> > + pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
> &s->io_space);
> > +
> > + pvscsi_init_msi(s);
> > +
> > + s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue,
> s);
> > + if (!s->completion_worker) {
> > + pvscsi_cleanup_msi(s);
> > + memory_region_destroy(&s->io_space);
> > + return -ENOMEM;
> > + }
> > +
> > + scsi_bus_new(&s->bus, &pci_dev->qdev, &pvscsi_scsi_info);
> > + pvscsi_reset_state(s);
> > +
> > + return 0;
> > +}
> > +
> > +static void
> > +pvscsi_uninit(PCIDevice *pci_dev)
> > +{
> > + PVSCSIState *s = PVSCSI(pci_dev);
> > +
> > + trace_pvscsi_state("uninit");
> > + qemu_bh_delete(s->completion_worker);
> > +
> > + pvscsi_cleanup_msi(s);
> > +
> > + memory_region_destroy(&s->io_space);
> > +}
> > +
> > +static void
> > +pvscsi_reset(DeviceState *dev)
> > +{
> > + PCIDevice *d = PCI_DEVICE(dev);
> > + PVSCSIState *s = PVSCSI(d);
> > +
> > + trace_pvscsi_state("reset");
> > + pvscsi_reset_adapter(s);
> > +}
> > +
> > +static void
> > +pvscsi_pre_save(void *opaque)
> > +{
> > + PVSCSIState *s = (PVSCSIState *) opaque;
> > +
> > + trace_pvscsi_state("presave");
> > +
> > + assert(QTAILQ_EMPTY(&s->pending_queue));
> > + assert(QTAILQ_EMPTY(&s->completion_queue));
>
> If you implement request serialization, you can still assert that the
> completion queue is empty. The pending queue will be reconstructed by
> the load_request callbacks.
>
> > +}
> > +
> > +static int
> > +pvscsi_post_load(void *opaque, int version_id)
> > +{
> > + trace_pvscsi_state("postload");
> > + return 0;
> > +}
> > +
> > +static const VMStateDescription vmstate_pvscsi = {
> > + .name = TYPE_PVSCSI,
> > + .version_id = 0,
> > + .minimum_version_id = 0,
> > + .minimum_version_id_old = 0,
> > + .pre_save = pvscsi_pre_save,
> > + .post_load = pvscsi_post_load,
> > + .fields = (VMStateField[]) {
> > + VMSTATE_PCI_DEVICE(parent_obj, PVSCSIState),
> > + VMSTATE_UINT8(msi_used, PVSCSIState),
> > + VMSTATE_UINT64(reg_interrupt_status, PVSCSIState),
> > + VMSTATE_UINT64(reg_interrupt_enabled, PVSCSIState),
> > + VMSTATE_UINT64(reg_command_status, PVSCSIState),
> > + VMSTATE_UINT64(curr_cmd, PVSCSIState),
> > + VMSTATE_UINT32(curr_cmd_data_cntr, PVSCSIState),
> > + VMSTATE_UINT32_ARRAY(curr_cmd_data, PVSCSIState,
> > + ARRAY_SIZE(((PVSCSIState
> *)NULL)->curr_cmd_data)),
> > + VMSTATE_UINT8(rings_info_valid, PVSCSIState),
> > + VMSTATE_UINT8(msg_ring_info_valid, PVSCSIState),
> > +
> > + VMSTATE_UINT64(rings.rs_pa, PVSCSIState),
> > + VMSTATE_UINT32(rings.txr_len_mask, PVSCSIState),
> > + VMSTATE_UINT32(rings.rxr_len_mask, PVSCSIState),
> > + VMSTATE_UINT64_ARRAY(rings.req_ring_pages_pa, PVSCSIState,
> > + PVSCSI_SETUP_RINGS_MAX_NUM_PAGES),
> > + VMSTATE_UINT64_ARRAY(rings.cmp_ring_pages_pa, PVSCSIState,
> > + PVSCSI_SETUP_RINGS_MAX_NUM_PAGES),
> > + VMSTATE_UINT64(rings.consumed_ptr, PVSCSIState),
> > + VMSTATE_UINT64(rings.filled_cmp_ptr, PVSCSIState),
> > +
> > + VMSTATE_END_OF_LIST()
> > + }
> > +};
> > +
> > +static void
> > +pvscsi_write_config(PCIDevice *pci, uint32_t addr, uint32_t val, int
> len)
> > +{
> > + pci_default_write_config(pci, addr, val, len);
> > + msi_write_config(pci, addr, val, len);
> > +}
> > +
> > +static void pvscsi_class_init(ObjectClass *klass, void *data)
> > +{
> > + DeviceClass *dc = DEVICE_CLASS(klass);
> > + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > +
> > + k->init = pvscsi_init;
> > + k->exit = pvscsi_uninit;
> > + k->vendor_id = PCI_VENDOR_ID_VMWARE;
> > + k->device_id = PCI_DEVICE_ID_VMWARE_PVSCSI;
> > + k->class_id = PCI_CLASS_STORAGE_SCSI;
> > + k->subsystem_id = 0x1000;
> > + dc->reset = pvscsi_reset;
> > + dc->vmsd = &vmstate_pvscsi;
> > + k->config_write = pvscsi_write_config;
> > +}
> > +
> > +static const TypeInfo pvscsi_info = {
> > + .name = "pvscsi",
> > + .parent = TYPE_PCI_DEVICE,
> > + .instance_size = sizeof(PVSCSIState),
> > + .class_init = pvscsi_class_init,
> > +};
> > +
> > +static void
> > +pvscsi_register_types(void)
> > +{
> > + type_register_static(&pvscsi_info);
> > +
> > + trace_pvscsi_register();
> > +}
> > +
> > +type_init(pvscsi_register_types);
> > diff --git a/hw/vmw_pvscsi.h b/hw/vmw_pvscsi.h
> > new file mode 100644
> > index 0000000..17fcf66
> > --- /dev/null
> > +++ b/hw/vmw_pvscsi.h
> > @@ -0,0 +1,434 @@
> > +/*
> > + * VMware PVSCSI header file
> > + *
> > + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> it
> > + * under the terms of the GNU General Public License as published by the
> > + * Free Software Foundation; version 2 of the License and no later
> version.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> > + * NON INFRINGEMENT. See the GNU General Public License for more
> > + * details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
> USA.
> > + *
> > + * Maintained by: Arvind Kumar <arvindkumar@vmware.com>
> > + *
> > + */
> > +
> > +#ifndef VMW_PVSCSI_H
> > +#define VMW_PVSCSI_H
> > +
> > +#define VMW_PAGE_SIZE (4096)
> > +#define VMW_PAGE_SHIFT (12)
> > +
> > +#define MASK(n) ((1 << (n)) - 1) /* make an n-bit mask */
> > +
> > +/*
> > + * host adapter status/error codes
> > + */
> > +enum HostBusAdapterStatus {
> > + BTSTAT_SUCCESS = 0x00, /* CCB complete normally with no
> errors */
> > + BTSTAT_LINKED_COMMAND_COMPLETED = 0x0a,
> > + BTSTAT_LINKED_COMMAND_COMPLETED_WITH_FLAG = 0x0b,
> > + BTSTAT_DATA_UNDERRUN = 0x0c,
> > + BTSTAT_SELTIMEO = 0x11, /* SCSI selection timeout */
> > + BTSTAT_DATARUN = 0x12, /* data overrun/underrun */
> > + BTSTAT_BUSFREE = 0x13, /* unexpected bus free */
> > + BTSTAT_INVPHASE = 0x14, /* invalid bus phase or sequence */
> > + /* requested by target */
> > + BTSTAT_LUNMISMATCH = 0x17, /* linked CCB has different LUN */
> > + /* from first CCB */
> > + BTSTAT_SENSFAILED = 0x1b, /* auto request sense failed */
> > + BTSTAT_TAGREJECT = 0x1c, /* SCSI II tagged queueing message */
> > + /* rejected by target */
> > + BTSTAT_BADMSG = 0x1d, /* unsupported message received by */
> > + /* the host adapter */
> > + BTSTAT_HAHARDWARE = 0x20, /* host adapter hardware failed */
> > + BTSTAT_NORESPONSE = 0x21, /* target did not respond to SCSI ATN,
> */
> > + /* sent a SCSI RST
> */
> > + BTSTAT_SENTRST = 0x22, /* host adapter asserted a SCSI RST */
> > + BTSTAT_RECVRST = 0x23, /* other SCSI devices asserted a SCSI
> RST */
> > + BTSTAT_DISCONNECT = 0x24, /* target device reconnected
> improperly */
> > + /* (w/o tag)
> */
> > + BTSTAT_BUSRESET = 0x25, /* host adapter issued BUS device
> reset */
> > + BTSTAT_ABORTQUEUE = 0x26, /* abort queue generated */
> > + BTSTAT_HASOFTWARE = 0x27, /* host adapter software error */
> > + BTSTAT_HATIMEOUT = 0x30, /* host adapter hardware timeout error
> */
> > + BTSTAT_SCSIPARITY = 0x34, /* SCSI parity error detected */
> > +};
> > +
> > +/*
> > + * Register offsets.
> > + *
> > + * These registers are accessible both via i/o space and mm i/o.
> > + */
> > +
> > +enum PVSCSIRegOffset {
> > + PVSCSI_REG_OFFSET_COMMAND = 0x0,
> > + PVSCSI_REG_OFFSET_COMMAND_DATA = 0x4,
> > + PVSCSI_REG_OFFSET_COMMAND_STATUS = 0x8,
> > + PVSCSI_REG_OFFSET_LAST_STS_0 = 0x100,
> > + PVSCSI_REG_OFFSET_LAST_STS_1 = 0x104,
> > + PVSCSI_REG_OFFSET_LAST_STS_2 = 0x108,
> > + PVSCSI_REG_OFFSET_LAST_STS_3 = 0x10c,
> > + PVSCSI_REG_OFFSET_INTR_STATUS = 0x100c,
> > + PVSCSI_REG_OFFSET_INTR_MASK = 0x2010,
> > + PVSCSI_REG_OFFSET_KICK_NON_RW_IO = 0x3014,
> > + PVSCSI_REG_OFFSET_DEBUG = 0x3018,
> > + PVSCSI_REG_OFFSET_KICK_RW_IO = 0x4018,
> > +};
> > +
> > +/*
> > + * Virtual h/w commands.
> > + */
> > +
> > +enum PVSCSICommands {
> > + PVSCSI_CMD_FIRST = 0, /* has to be first */
> > +
> > + PVSCSI_CMD_ADAPTER_RESET = 1,
> > + PVSCSI_CMD_ISSUE_SCSI = 2,
> > + PVSCSI_CMD_SETUP_RINGS = 3,
> > + PVSCSI_CMD_RESET_BUS = 4,
> > + PVSCSI_CMD_RESET_DEVICE = 5,
> > + PVSCSI_CMD_ABORT_CMD = 6,
> > + PVSCSI_CMD_CONFIG = 7,
> > + PVSCSI_CMD_SETUP_MSG_RING = 8,
> > + PVSCSI_CMD_DEVICE_UNPLUG = 9,
> > +
> > + PVSCSI_CMD_LAST = 10 /* has to be last */
> > +};
> > +
> > +#define PVSCSI_COMMAND_PROCESSING_SUCCEEDED (0)
> > +#define PVSCSI_COMMAND_PROCESSING_FAILED (-1)
> > +#define PVSCSI_COMMAND_NOT_ENOUGH_DATA (-2)
> > +
> > +/*
> > + * Command descriptor for PVSCSI_CMD_RESET_DEVICE --
> > + */
> > +
> > +struct PVSCSICmdDescResetDevice {
> > + uint32_t target;
> > + uint8_t lun[8];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSICmdDescResetDevice PVSCSICmdDescResetDevice;
> > +
> > +/*
> > + * Command descriptor for PVSCSI_CMD_ABORT_CMD --
> > + *
> > + * - currently does not support specifying the LUN.
> > + * - pad should be 0.
> > + */
> > +
> > +struct PVSCSICmdDescAbortCmd {
> > + uint64_t context;
> > + uint32_t target;
> > + uint32_t pad;
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSICmdDescAbortCmd PVSCSICmdDescAbortCmd;
> > +
> > +/*
> > + * Command descriptor for PVSCSI_CMD_SETUP_RINGS --
> > + *
> > + * Notes:
> > + * - reqRingNumPages and cmpRingNumPages need to be power of two.
> > + * - reqRingNumPages and cmpRingNumPages need to be different from 0,
> > + * - reqRingNumPages and cmpRingNumPages need to be inferior to
> > + * PVSCSI_SETUP_RINGS_MAX_NUM_PAGES.
> > + */
> > +
> > +#define PVSCSI_SETUP_RINGS_MAX_NUM_PAGES 32
> > +struct PVSCSICmdDescSetupRings {
> > + uint32_t reqRingNumPages;
> > + uint32_t cmpRingNumPages;
> > + uint64_t ringsStatePPN;
> > + uint64_t reqRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> > + uint64_t cmpRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSICmdDescSetupRings PVSCSICmdDescSetupRings;
> > +
> > +/*
> > + * Command descriptor for PVSCSI_CMD_SETUP_MSG_RING --
> > + *
> > + * Notes:
> > + * - this command was not supported in the initial revision of the h/w
> > + * interface. Before using it, you need to check that it is supported
> by
> > + * writing PVSCSI_CMD_SETUP_MSG_RING to the 'command' register, then
> > + * immediately after read the 'command status' register:
> > + * * a value of -1 means that the cmd is NOT supported,
> > + * * a value != -1 means that the cmd IS supported.
> > + * If it's supported the 'command status' register should return:
> > + * sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t).
> > + * - this command should be issued _after_ the usual SETUP_RINGS so
> that the
> > + * RingsState page is already setup. If not, the command is a nop.
> > + * - numPages needs to be a power of two,
> > + * - numPages needs to be different from 0,
> > + * - pad should be zero.
> > + */
> > +
> > +#define PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES 16
> > +
> > +struct PVSCSICmdDescSetupMsgRing {
> > + uint32_t numPages;
> > + uint32_t pad;
> > + uint64_t ringPPNs[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSICmdDescSetupMsgRing PVSCSICmdDescSetupMsgRing;
> > +
> > +enum PVSCSIMsgType {
> > + PVSCSI_MSG_DEV_ADDED = 0,
> > + PVSCSI_MSG_DEV_REMOVED = 1,
> > + PVSCSI_MSG_LAST = 2,
> > +};
> > +
> > +/*
> > + * Msg descriptor.
> > + *
> > + * sizeof(struct PVSCSIRingMsgDesc) == 128.
> > + *
> > + * - type is of type enum PVSCSIMsgType.
> > + * - the content of args depend on the type of event being delivered.
> > + */
> > +
> > +struct PVSCSIRingMsgDesc {
> > + uint32_t type;
> > + uint32_t args[31];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSIRingMsgDesc PVSCSIRingMsgDesc;
> > +
> > +struct PVSCSIMsgDescDevStatusChanged {
> > + uint32_t type; /* PVSCSI_MSG_DEV _ADDED / _REMOVED */
> > + uint32_t bus;
> > + uint32_t target;
> > + uint8_t lun[8];
> > + uint32_t pad[27];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSIMsgDescDevStatusChanged
> PVSCSIMsgDescDevStatusChanged;
> > +
> > +/*
> > + * Rings state.
> > + *
> > + * - the fields:
> > + * . msgProdIdx,
> > + * . msgConsIdx,
> > + * . msgNumEntriesLog2,
> > + * .. are only used once the SETUP_MSG_RING cmd has been issued.
> > + * - 'pad' helps to ensure that the msg related fields are on their own
> > + * cache-line.
> > + */
> > +
> > +struct PVSCSIRingsState {
> > + uint32_t reqProdIdx;
> > + uint32_t reqConsIdx;
> > + uint32_t reqNumEntriesLog2;
> > +
> > + uint32_t cmpProdIdx;
> > + uint32_t cmpConsIdx;
> > + uint32_t cmpNumEntriesLog2;
> > +
> > + uint8_t pad[104];
> > +
> > + uint32_t msgProdIdx;
> > + uint32_t msgConsIdx;
> > + uint32_t msgNumEntriesLog2;
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSIRingsState PVSCSIRingsState;
> > +
> > +/*
> > + * Request descriptor.
> > + *
> > + * sizeof(RingReqDesc) = 128
> > + *
> > + * - context: is a unique identifier of a command. It could normally be
> any
> > + * 64bit value, however we currently store it in the serialNumber
> variable
> > + * of struct SCSI_Command, so we have the following restrictions due
> to the
> > + * way this field is handled in the vmkernel storage stack:
> > + * * this value can't be 0,
> > + * * the upper 32bit need to be 0 since serialNumber is as a
> uint32_t.
> > + * Currently tracked as PR 292060.
> > + * - dataLen: contains the total number of bytes that need to be
> transferred.
> > + * - dataAddr:
> > + * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is set: dataAddr is the PA of
> the first
> > + * s/g table segment, each s/g segment is entirely contained on a
> single
> > + * page of physical memory,
> > + * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is NOT set, then dataAddr is the
> PA of
> > + * the buffer used for the DMA transfer,
> > + * - flags:
> > + * * PVSCSI_FLAG_CMD_WITH_SG_LIST: see dataAddr above,
> > + * * PVSCSI_FLAG_CMD_DIR_NONE: no DMA involved,
> > + * * PVSCSI_FLAG_CMD_DIR_TOHOST: transfer from device to main memory,
> > + * * PVSCSI_FLAG_CMD_DIR_TODEVICE: transfer from main memory to
> device,
> > + * * PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB: reserved to handle CDBs larger
> than
> > + * 16bytes. To be specified.
> > + * - vcpuHint: vcpuId of the processor that will be most likely waiting
> for the
> > + * completion of the i/o. For guest OSes that use lowest priority
> message
> > + * delivery mode (such as windows), we use this "hint" to deliver the
> > + * completion action to the proper vcpu. For now, we can use the
> vcpuId of
> > + * the processor that initiated the i/o as a likely candidate for the
> vcpu
> > + * that will be waiting for the completion..
> > + * - bus should be 0: we currently only support bus 0 for now.
> > + * - unused should be zero'd.
> > + */
> > +
> > +#define PVSCSI_FLAG_CMD_WITH_SG_LIST (1 << 0)
> > +#define PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB (1 << 1)
> > +#define PVSCSI_FLAG_CMD_DIR_NONE (1 << 2)
> > +#define PVSCSI_FLAG_CMD_DIR_TOHOST (1 << 3)
> > +#define PVSCSI_FLAG_CMD_DIR_TODEVICE (1 << 4)
> > +
> > +#define PVSCSI_KNOWN_FLAGS \
> > + (PVSCSI_FLAG_CMD_WITH_SG_LIST | \
> > + PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB | \
> > + PVSCSI_FLAG_CMD_DIR_NONE | \
> > + PVSCSI_FLAG_CMD_DIR_TOHOST | \
> > + PVSCSI_FLAG_CMD_DIR_TODEVICE)
> > +
> > +struct PVSCSIRingReqDesc {
> > + uint64_t context;
> > + uint64_t dataAddr;
> > + uint64_t dataLen;
> > + uint64_t senseAddr;
> > + uint32_t senseLen;
> > + uint32_t flags;
> > + uint8_t cdb[16];
> > + uint8_t cdbLen;
> > + uint8_t lun[8];
> > + uint8_t tag;
> > + uint8_t bus;
> > + uint8_t target;
> > + uint8_t vcpuHint;
> > + uint8_t unused[59];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSIRingReqDesc PVSCSIRingReqDesc;
> > +
> > +/*
> > + * Scatter-gather list management.
> > + *
> > + * As described above, when PVSCSI_FLAG_CMD_WITH_SG_LIST is set in the
> > + * RingReqDesc.flags, then RingReqDesc.dataAddr is the PA of the first
> s/g
> > + * table segment.
> > + *
> > + * - each segment of the s/g table contain a succession of struct
> > + * PVSCSISGElement.
> > + * - each segment is entirely contained on a single physical page of
> memory.
> > + * - a "chain" s/g element has the flag PVSCSI_SGE_FLAG_CHAIN_ELEMENT
> set in
> > + * PVSCSISGElement.flags and in this case:
> > + * * addr is the PA of the next s/g segment,
> > + * * length is undefined, assumed to be 0.
> > + */
> > +
> > +struct PVSCSISGElement {
> > + uint64_t addr;
> > + uint32_t length;
> > + uint32_t flags;
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSISGElement PVSCSISGElement;
> > +
> > +/*
> > + * Completion descriptor.
> > + *
> > + * sizeof(RingCmpDesc) = 32
> > + *
> > + * - context: identifier of the command. The same thing that was
> specified
> > + * under "context" as part of struct RingReqDesc at initiation time,
> > + * - dataLen: number of bytes transferred for the actual i/o operation,
> > + * - senseLen: number of bytes written into the sense buffer,
> > + * - hostStatus: adapter status,
> > + * - scsiStatus: device status,
> > + * - pad should be zero.
> > + */
> > +
> > +struct PVSCSIRingCmpDesc {
> > + uint64_t context;
> > + uint64_t dataLen;
> > + uint32_t senseLen;
> > + uint16_t hostStatus;
> > + uint16_t scsiStatus;
> > + uint32_t pad[2];
> > +} QEMU_PACKED;
> > +
> > +typedef struct PVSCSIRingCmpDesc PVSCSIRingCmpDesc;
> > +
> > +/*
> > + * Interrupt status / IRQ bits.
> > + */
> > +
> > +#define PVSCSI_INTR_CMPL_0 (1 << 0)
> > +#define PVSCSI_INTR_CMPL_1 (1 << 1)
> > +#define PVSCSI_INTR_CMPL_MASK MASK(2)
> > +
> > +#define PVSCSI_INTR_MSG_0 (1 << 2)
> > +#define PVSCSI_INTR_MSG_1 (1 << 3)
> > +#define PVSCSI_INTR_MSG_MASK (MASK(2) << 2)
> > +
> > +#define PVSCSI_INTR_ALL_SUPPORTED MASK(4)
> > +
> > +/*
> > + * Number of MSI-X vectors supported.
> > + */
> > +#define PVSCSI_MAX_INTRS 24
> > +
> > +/*
> > + * Enumeration of supported MSI-X vectors
> > + */
> > +#define PVSCSI_VECTOR_COMPLETION 0
> > +
> > +/*
> > + * Misc constants for the rings.
> > + */
> > +
> > +#define PVSCSI_MAX_NUM_PAGES_REQ_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
> > +#define PVSCSI_MAX_NUM_PAGES_CMP_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
> > +#define PVSCSI_MAX_NUM_PAGES_MSG_RING
> PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES
> > +
> > +#define PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE \
> > + (VMW_PAGE_SIZE / sizeof(struct PVSCSIRingReqDesc))
> > +
> > +#define PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE \
> > + (VMW_PAGE_SIZE / sizeof(PVSCSIRingCmpDesc))
> > +
> > +#define PVSCSI_MAX_NUM_MSG_ENTRIES_PER_PAGE \
> > + (VMW_PAGE_SIZE / sizeof(PVSCSIRingMsgDesc))
> > +
> > +#define PVSCSI_MAX_REQ_QUEUE_DEPTH \
> > + (PVSCSI_MAX_NUM_PAGES_REQ_RING *
> PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE)
> > +
> > +#define PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES 1
> > +#define PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES 1
> > +#define PVSCSI_MEM_SPACE_MISC_NUM_PAGES 2
> > +#define PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES 2
> > +#define PVSCSI_MEM_SPACE_MSIX_NUM_PAGES 2
> > +
> > +enum PVSCSIMemSpace {
> > + PVSCSI_MEM_SPACE_COMMAND_PAGE = 0,
> > + PVSCSI_MEM_SPACE_INTR_STATUS_PAGE = 1,
> > + PVSCSI_MEM_SPACE_MISC_PAGE = 2,
> > + PVSCSI_MEM_SPACE_KICK_IO_PAGE = 4,
> > + PVSCSI_MEM_SPACE_MSIX_TABLE_PAGE = 6,
> > + PVSCSI_MEM_SPACE_MSIX_PBA_PAGE = 7,
> > +};
> > +
> > +#define PVSCSI_MEM_SPACE_NUM_PAGES \
> > + (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES + \
> > + PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES + \
> > + PVSCSI_MEM_SPACE_MISC_NUM_PAGES + \
> > + PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES + \
> > + PVSCSI_MEM_SPACE_MSIX_NUM_PAGES)
> > +
> > +#define PVSCSI_MEM_SPACE_SIZE (PVSCSI_MEM_SPACE_NUM_PAGES *
> VMW_PAGE_SIZE)
> > +
> > +#endif /* VMW_PVSCSI_H */
> > diff --git a/trace-events b/trace-events
> > index 412f7e4..66037a1 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -761,6 +761,42 @@ pc87312_info_ide(uint32_t base) "base 0x%x"
> > pc87312_info_parallel(uint32_t base, uint32_t irq) "base 0x%x, irq %u"
> > pc87312_info_serial(int n, uint32_t base, uint32_t irq) "id=%d, base
> 0x%x, irq %u"
> >
> > +# hw/pvscsi.c
> > +pvscsi_rings_mgr_init_data(uint32_t txr_len_log2, uint32_t
> rxr_len_log2) "TX/RX rings logarithms set to %d/%d"
> > +pvscsi_rings_mgr_init_msg(uint32_t len_log2) "MSG ring logarithm set to
> %d"
> > +pvscsi_rings_mgr_flush_cmp_ring(uint64_t filled_cmp_ptr) "new
> production counter of completion ring is 0x%"PRIx64""
> > +pvscsi_rings_mgr_flush_msg_ring(uint64_t filled_cmp_ptr) "new
> production counter of message ring is 0x%"PRIx64""
> > +pvscsi_update_irq_level(bool raise, uint64_t mask, uint64_t status)
> "interrupt level set to %d (MASK: 0x%"PRIx64", STATUS: 0x%"PRIx64")"
> > +pvscsi_update_irq_msi(void) "sending MSI notification"
> > +pvscsi_cmp_ring_put(unsigned long addr) "got completion descriptor
> 0x%lx"
> > +pvscsi_msg_ring_put(unsigned long addr) "got message descriptor 0x%lx"
> > +pvscsi_complete_request(uint64_t context, uint64_t len, uint8_t
> sense_key) "completion: ctx: 0x%"PRIx64", len: 0x%"PRIx64", sense key: %u"
> > +pvscsi_get_sg_list(int nsg, size_t size) "get SG list: depth: %u, size:
> %lu"
> > +pvscsi_get_next_sg_elem(uint32_t flags) "unknown flags in SG element
> (val: 0x%x)"
> > +pvscsi_command_complete_not_found(uint32_t tag) "can't find request for
> tag 0x%x"
> > +pvscsi_command_complete_data_run(void) "not all data required for
> command transferred"
> > +pvscsi_command_complete_sense_len(int len) "sense information length is
> %d bytes"
> > +pvscsi_convert_sglist(uint64_t context, unsigned long addr, uint32_t
> resid) "element: ctx: 0x%"PRIx64" addr: 0x%lx, len: %ul"
> > +pvscsi_process_req_descr(uint8_t cmd, uint64_t ctx) "SCSI cmd 0x%x,
> ctx: 0x%"PRIx64""
> > +pvscsi_process_req_descr_unknown_device(void) "command directed to
> unknown device rejected"
> > +pvscsi_process_req_descr_invalid_dir(void) "command with invalid
> transfer direction rejected"
> > +pvscsi_process_io(unsigned long addr) "got descriptor 0x%lx"
> > +pvscsi_on_cmd_noimpl(const char* cmd) "unimplemented command %s ignored"
> > +pvscsi_on_cmd_reset_dev(uint32_t tgt, int lun, void* dev)
> "PVSCSI_CMD_RESET_DEVICE[target %u lun %d (dev 0x%p)]"
> > +pvscsi_on_cmd_arrived(const char* cmd) "command %s arrived"
> > +pvscsi_on_cmd_abort(uint64_t ctx, uint32_t tgt) "command
> PVSCSI_CMD_ABORT_CMD for ctx 0x%"PRIx64", target %u"
> > +pvscsi_on_cmd_unknown(uint64_t cmd_id) "unknown command %"PRIx64""
> > +pvscsi_on_cmd_unknown_data(uint32_t data) "data for unknown command
> 0x:%x"
> > +pvscsi_io_write(const char* cmd, uint64_t val) "%s write: %"PRIx64""
> > +pvscsi_io_write_unknown(unsigned long addr, unsigned sz, uint64_t val)
> "unknown write address: 0x%lx size: %u bytes value: 0x%"PRIx64""
> > +pvscsi_io_read(const char* cmd, uint64_t status) "%s read: 0x%"PRIx64""
> > +pvscsi_io_read_unknown(unsigned long addr, unsigned sz) "unknown read
> address: 0x%lx size: %u bytes"
> > +pvscsi_init_msi_fail(int res) "failed to initialize MSI, error %d"
> > +pvscsi_state(const char* state) "starting %s ..."
> > +pvscsi_register(void) "PVSCSI QEMU device emulation registered"
> > +pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s page:
> %"PRIx64""
> > +pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of
> %s pages: %u"
> > +
> > # xen-all.c
> > xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested:
> %#lx, size %#lx"
> > xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
> >
>
>
[-- Attachment #2: Type: text/html, Size: 86312 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1 V6] VMWare PVSCSI paravirtual device implementation
2013-04-18 9:38 ` Dmitry Fleytman
@ 2013-04-18 10:54 ` Paolo Bonzini
2013-04-18 12:11 ` Dmitry Fleytman
0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2013-04-18 10:54 UTC (permalink / raw)
To: Dmitry Fleytman; +Cc: Yan Vugenfirer, qemu-devel, Anthony Liguori
Il 18/04/2013 11:38, Dmitry Fleytman ha scritto:
>
> > +static void
> > +pvscsi_free_queue(PVSCSIRequestList *req_list)
>
> This shouldn't be needed.
>
>
>
> Doesn't one need to clear completion queue on reset command arrival from
> driver?
>
It should happen in qbus_reset_all. The scsi-disk device will cancel
pending requests, and these will be moved from the pending_queue to the
completion_queue. I think you can call pvscsi_process_completion_queue
instead of pvscsi_free_queue.
Paolo
>
> > +{
> > + PVSCSIRequest *pvscsi_req;
> > +
> > + while (!QTAILQ_EMPTY(req_list)) {
> > + pvscsi_req = QTAILQ_FIRST(req_list);
> > + QTAILQ_REMOVE(req_list, pvscsi_req, next);
> > + g_free(pvscsi_req);
> > + }
> > +}
> > +
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1 V6] VMWare PVSCSI paravirtual device implementation
2013-04-18 10:54 ` Paolo Bonzini
@ 2013-04-18 12:11 ` Dmitry Fleytman
0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Fleytman @ 2013-04-18 12:11 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Yan Vugenfirer, qemu-devel, Anthony Liguori
[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]
On Thu, Apr 18, 2013 at 1:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 18/04/2013 11:38, Dmitry Fleytman ha scritto:
> >
> > > +static void
> > > +pvscsi_free_queue(PVSCSIRequestList *req_list)
> >
> > This shouldn't be needed.
> >
> >
> >
> > Doesn't one need to clear completion queue on reset command arrival from
> > driver?
> >
>
> It should happen in qbus_reset_all. The scsi-disk device will cancel
> pending requests, and these will be moved from the pending_queue to the
> completion_queue. I think you can call pvscsi_process_completion_queue
> instead of pvscsi_free_queue.
>
> Paolo
>
>
Ok, sounds reasonable.
I'll send the updated patch soon.
> >
> > > +{
> > > + PVSCSIRequest *pvscsi_req;
> > > +
> > > + while (!QTAILQ_EMPTY(req_list)) {
> > > + pvscsi_req = QTAILQ_FIRST(req_list);
> > > + QTAILQ_REMOVE(req_list, pvscsi_req, next);
> > > + g_free(pvscsi_req);
> > > + }
> > > +}
> > > +
>
>
[-- Attachment #2: Type: text/html, Size: 1906 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-04-18 12:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-08 18:39 [Qemu-devel] [PATCH 0/1 V6] VMWare PVSCSI paravirtual device implementation Dmitry Fleytman
2013-04-08 18:39 ` [Qemu-devel] [PATCH 1/1 " Dmitry Fleytman
2013-04-10 9:33 ` Paolo Bonzini
2013-04-18 9:38 ` Dmitry Fleytman
2013-04-18 10:54 ` Paolo Bonzini
2013-04-18 12:11 ` Dmitry Fleytman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).