* [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
@ 2015-11-27 14:58 Hannes Reinecke
2015-11-27 14:58 ` [Qemu-devel] [PATCH 1/8] scsi-disk: Add 'port_group' property Hannes Reinecke
` (8 more replies)
0 siblings, 9 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:58 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
Hi all,
here's now an updated version to enable ALUA and simplified
active/passive multipath support for qemu.
This patchset relies on having _two_ block devices configured,
and two SCSI disks pointing to those block devices with the
_same_ 'wwn' property and unique 'port_group' properties.
I know, this is a bit of a nasty hack, but I hope to add
proper multipath support (with several SCSI devices pointing /
linking to the same block device) in the near future.
It also implements a 'alua_policy', which allows for simulating
an 'active/passive' multipath setup.
And for testing I've implemented a 'block_disconnect' HMP command,
which simulates a link failure for the attached devices.
I wouldn't object if someone declares this a gross hack, but with
it I can finally simulate real-life multipath failover and do
some functional multipath-tools testing withouth having to recurse
on using real hardware.
As usual, comments and reviews are welcome.
Hannes Reinecke (8):
scsi-disk: Add 'port_group' property
scsi-disk: Add 'alua_state' property
scsi-disk: Implement 'REPORT TARGET PORT GROUPS'
scsi-disk: Implement 'SET TARGET PORT GROUPS'
scsi-disk: implement ALUA policy
scsi-disk: Allow READ CAPACITY in standby
scsi-disk: Implement 'alua_preferred' option
block: Implement 'block_disconnect' HMP command
block.c | 5 +
block/block-backend.c | 8 +
blockdev.c | 18 ++
hmp-commands.hx | 14 +
hmp.c | 10 +
hmp.h | 1 +
hw/scsi/megasas.c | 4 +
hw/scsi/scsi-bus.c | 20 ++
hw/scsi/scsi-disk.c | 654 +++++++++++++++++++++++++++++++++++++++++
hw/scsi/virtio-scsi.c | 5 +
include/block/block.h | 1 +
include/block/block_int.h | 1 +
include/block/scsi.h | 18 ++
include/hw/scsi/scsi.h | 8 +
include/sysemu/block-backend.h | 1 +
qapi/block-core.json | 21 ++
qmp-commands.hx | 26 ++
17 files changed, 815 insertions(+)
--
1.8.4.5
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 1/8] scsi-disk: Add 'port_group' property
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
@ 2015-11-27 14:58 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 2/8] scsi-disk: Add 'alua_state' property Hannes Reinecke
` (7 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:58 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
Each SCSI target port can have a 'target port group' identifier.
This identifier is used for management software to group individual
I_T_L nexus together eg when assembling a multipath topology.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-disk.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4797d83..f544f43 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -79,6 +79,7 @@ struct SCSIDiskState
uint64_t wwn;
uint64_t port_wwn;
uint16_t port_index;
+ uint16_t port_group;
uint64_t max_unmap_size;
uint64_t max_io_size;
QEMUBH *bh;
@@ -658,6 +659,14 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
stw_be_p(&outbuf[buflen + 2], s->port_index);
buflen += 4;
}
+ if (s->port_group) {
+ outbuf[buflen++] = 0x61; // SAS / Binary
+ outbuf[buflen++] = 0x95; // PIV / Target port / target port group
+ outbuf[buflen++] = 0; // reserved
+ outbuf[buflen++] = 4;
+ stw_be_p(&outbuf[buflen + 2], s->port_group);
+ buflen += 4;
+ }
break;
}
case 0xb0: /* block limits */
@@ -2670,6 +2679,7 @@ static Property scsi_hd_properties[] = {
DEFINE_PROP_UINT64("wwn", SCSIDiskState, wwn, 0),
DEFINE_PROP_UINT64("port_wwn", SCSIDiskState, port_wwn, 0),
DEFINE_PROP_UINT16("port_index", SCSIDiskState, port_index, 0),
+ DEFINE_PROP_UINT16("port_group", SCSIDiskState, port_group, 0),
DEFINE_PROP_UINT64("max_unmap_size", SCSIDiskState, max_unmap_size,
DEFAULT_MAX_UNMAP_SIZE),
DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
@@ -2720,6 +2730,7 @@ static Property scsi_cd_properties[] = {
DEFINE_PROP_UINT64("wwn", SCSIDiskState, wwn, 0),
DEFINE_PROP_UINT64("port_wwn", SCSIDiskState, port_wwn, 0),
DEFINE_PROP_UINT16("port_index", SCSIDiskState, port_index, 0),
+ DEFINE_PROP_UINT16("port_group", SCSIDiskState, port_group, 0),
DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
DEFAULT_MAX_IO_SIZE),
DEFINE_PROP_END_OF_LIST(),
@@ -2785,6 +2796,7 @@ static Property scsi_disk_properties[] = {
DEFINE_PROP_UINT64("wwn", SCSIDiskState, wwn, 0),
DEFINE_PROP_UINT64("port_wwn", SCSIDiskState, port_wwn, 0),
DEFINE_PROP_UINT16("port_index", SCSIDiskState, port_index, 0),
+ DEFINE_PROP_UINT16("port_group", SCSIDiskState, port_group, 0),
DEFINE_PROP_UINT64("max_unmap_size", SCSIDiskState, max_unmap_size,
DEFAULT_MAX_UNMAP_SIZE),
DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 2/8] scsi-disk: Add 'alua_state' property
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
2015-11-27 14:58 ` [Qemu-devel] [PATCH 1/8] scsi-disk: Add 'port_group' property Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 3/8] scsi-disk: Implement 'REPORT TARGET PORT GROUPS' Hannes Reinecke
` (6 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
To support asymmetric logical unit access (ALUA) we need to store
the ALUA state in the device structure.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-bus.c | 20 +++++++++
hw/scsi/scsi-disk.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++
include/block/scsi.h | 13 ++++++
include/hw/scsi/scsi.h | 8 ++++
4 files changed, 158 insertions(+)
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index fd1171e..56a4d33 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1294,6 +1294,21 @@ const struct SCSISense sense_code_LUN_NOT_READY = {
.key = NOT_READY, .asc = 0x04, .ascq = 0x03
};
+/* LUN not ready, asymmetric access state transition */
+const struct SCSISense sense_code_STATE_TRANSITION = {
+ .key = NOT_READY, .asc = 0x04, .ascq = 0x0a
+};
+
+/* LUN not ready, target port in standby state */
+const struct SCSISense sense_code_STATE_STANDBY = {
+ .key = NOT_READY, .asc = 0x04, .ascq = 0x0b
+};
+
+/* LUN not ready, target port in unavailable state */
+const struct SCSISense sense_code_STATE_UNAVAILABLE = {
+ .key = NOT_READY, .asc = 0x04, .ascq = 0x0c
+};
+
/* LUN not ready, Medium not present */
const struct SCSISense sense_code_NO_MEDIUM = {
.key = NOT_READY, .asc = 0x3a, .ascq = 0x00
@@ -1409,6 +1424,11 @@ const struct SCSISense sense_code_DEVICE_INTERNAL_RESET = {
.key = UNIT_ATTENTION, .asc = 0x29, .ascq = 0x04
};
+/* Unit attention, Asymmetric Access State changed */
+const struct SCSISense sense_code_ASYMMETRIC_ACCESS_STATE_CHANGED = {
+ .key = UNIT_ATTENTION, .asc = 0x2a, .ascq = 0x06
+};
+
/* Data Protection, Write Protected */
const struct SCSISense sense_code_WRITE_PROTECTED = {
.key = DATA_PROTECT, .asc = 0x27, .ascq = 0x00
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f544f43..583cacd 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -33,6 +33,7 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
#include "hw/scsi/scsi.h"
#include "block/scsi.h"
#include "sysemu/sysemu.h"
+#include "qapi/visitor.h"
#include "sysemu/block-backend.h"
#include "sysemu/blockdev.h"
#include "hw/block/block.h"
@@ -80,6 +81,7 @@ struct SCSIDiskState
uint64_t port_wwn;
uint16_t port_index;
uint16_t port_group;
+ uint8_t alua_state;
uint64_t max_unmap_size;
uint64_t max_io_size;
QEMUBH *bh;
@@ -1890,6 +1892,63 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
break;
}
+ if ((s->alua_state & 0x0f) != ALUA_STATE_ACTIVE_OPTIMIZED &&
+ (s->alua_state & 0x0f) != ALUA_STATE_ACTIVE_NON_OPTIMIZED) {
+ bool standby_allowed = true;
+ bool unavailable_allowed = true;
+ bool transition_allowed = true;
+
+ switch(req->cmd.buf[0]) {
+ case MAINTENANCE_IN:
+ if ((req->cmd.buf[1] & 31) != MI_REPORT_TARGET_PORT_GROUPS) {
+ transition_allowed = false;
+ unavailable_allowed = false;
+ standby_allowed = false;
+ }
+ /* Fallthrough */
+ case INQUIRY:
+ case REPORT_LUNS:
+ case REQUEST_SENSE:
+ break;
+ case MAINTENANCE_OUT:
+ transition_allowed = false;
+ break;
+ case PERSISTENT_RESERVE_IN:
+ case PERSISTENT_RESERVE_OUT:
+ case LOG_SENSE:
+ case LOG_SELECT:
+ case MODE_SENSE:
+ case MODE_SENSE_10:
+ case MODE_SELECT:
+ case MODE_SELECT_10:
+ case RECEIVE_DIAGNOSTIC:
+ case SEND_DIAGNOSTIC:
+ transition_allowed = false;
+ unavailable_allowed = false;
+ break;
+ default:
+ transition_allowed = false;
+ unavailable_allowed = false;
+ standby_allowed = false;
+ break;
+ }
+ if ((s->alua_state & 0x0f) == ALUA_STATE_STANDBY &&
+ !standby_allowed) {
+ scsi_check_condition(r, SENSE_CODE(STATE_STANDBY));
+ return 0;
+ }
+ if ((s->alua_state & 0x0f) == ALUA_STATE_UNAVAILABLE &&
+ !unavailable_allowed) {
+ scsi_check_condition(r, SENSE_CODE(STATE_UNAVAILABLE));
+ return 0;
+ }
+ if ((s->alua_state & 0x0f) == ALUA_STATE_TRANSITION &&
+ !transition_allowed) {
+ scsi_check_condition(r, SENSE_CODE(STATE_TRANSITION));
+ return 0;
+ }
+ }
+
/*
* FIXME: we shouldn't return anything bigger than 4k, but the code
* requires the buffer to be as big as req->cmd.xfer in several
@@ -2156,6 +2215,21 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, uint8_t *buf)
return 0;
}
+ switch ((s->alua_state & 0x0f)) {
+ case ALUA_STATE_ACTIVE_OPTIMIZED:
+ case ALUA_STATE_ACTIVE_NON_OPTIMIZED:
+ break;
+ case ALUA_STATE_STANDBY:
+ scsi_check_condition(r, SENSE_CODE(STATE_STANDBY));
+ return 0;
+ case ALUA_STATE_UNAVAILABLE:
+ scsi_check_condition(r, SENSE_CODE(STATE_UNAVAILABLE));
+ return 0;
+ case ALUA_STATE_TRANSITION:
+ scsi_check_condition(r, SENSE_CODE(STATE_TRANSITION));
+ return 0;
+ }
+
len = scsi_data_cdb_xfer(r->req.cmd.buf);
switch (command) {
case READ_6:
@@ -2819,11 +2893,54 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_scsi_disk_state;
}
+static void scsi_disk_get_alua_state(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+ SCSIDiskState *s = OBJECT_CHECK(SCSIDiskState, obj, "scsi-disk");
+ uint8_t alua_state = s->alua_state & 0x0f;
+
+ visit_type_uint8(v, &alua_state, name, errp);
+}
+
+static void scsi_disk_set_alua_state(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+ SCSIDiskState *s = OBJECT_CHECK(SCSIDiskState, obj, "scsi-disk");
+ uint8_t alua_state;
+ Error *local_err = NULL;
+
+ visit_type_uint8(v, &alua_state, name, &local_err);
+ if (local_err) {
+ goto out;
+ }
+ if (alua_state > 3) {
+ error_setg(&local_err, "Invalid ALUA state %d\n", alua_state);
+ goto out;
+ }
+
+ s->alua_state = alua_state;
+ scsi_device_set_ua(&s->qdev, SENSE_CODE(ASYMMETRIC_ACCESS_STATE_CHANGED));
+
+out:
+ if (local_err) {
+ error_propagate(errp, local_err);
+ }
+}
+
+static void scsi_disk_instance_initfn(Object *obj)
+{
+ object_property_add(obj, "alua_state", "uint8",
+ scsi_disk_get_alua_state,
+ scsi_disk_set_alua_state, NULL, NULL, NULL);
+ object_property_set_int(obj, ALUA_STATE_ACTIVE_OPTIMIZED, "alua_state", NULL);
+}
+
static const TypeInfo scsi_disk_info = {
.name = "scsi-disk",
.parent = TYPE_SCSI_DEVICE,
.instance_size = sizeof(SCSIDiskState),
.class_init = scsi_disk_class_initfn,
+ .instance_init = scsi_disk_instance_initfn,
};
static void scsi_disk_register_types(void)
diff --git a/include/block/scsi.h b/include/block/scsi.h
index a311341..a9d0f64 100644
--- a/include/block/scsi.h
+++ b/include/block/scsi.h
@@ -151,6 +151,11 @@ const char *scsi_command_name(uint8_t cmd);
#define SAI_READ_CAPACITY_16 0x10
/*
+ * MAINTENANCE IN subcodes
+ */
+#define MI_REPORT_TARGET_PORT_GROUPS 0xa
+
+/*
* READ POSITION service action codes
*/
#define SHORT_FORM_BLOCK_ID 0x00
@@ -306,4 +311,12 @@ const char *scsi_command_name(uint8_t cmd);
#define MMC_PROFILE_HDDVD_RW_DL 0x005A
#define MMC_PROFILE_INVALID 0xFFFF
+#define ALUA_STATE_ACTIVE_OPTIMIZED 0x0
+#define ALUA_STATE_ACTIVE_NON_OPTIMIZED 0x1
+#define ALUA_STATE_STANDBY 0x2
+#define ALUA_STATE_UNAVAILABLE 0x3
+#define ALUA_STATE_LBA_DEPENDENT 0x4
+#define ALUA_STATE_OFFLINE 0xE
+#define ALUA_STATE_TRANSITION 0xF
+
#endif
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index cdaf0f8..76f3e49 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -185,6 +185,12 @@ void scsi_bus_legacy_handle_cmdline(SCSIBus *bus, Error **errp);
extern const struct SCSISense sense_code_NO_SENSE;
/* LUN not ready, Manual intervention required */
extern const struct SCSISense sense_code_LUN_NOT_READY;
+/* LUN not ready, asymmetric access state transition */
+extern const struct SCSISense sense_code_STATE_TRANSITION;
+/* LUN not ready, Target Port in standby state */
+extern const struct SCSISense sense_code_STATE_STANDBY;
+/* LUN not ready, Target Port in unavailable state */
+extern const struct SCSISense sense_code_STATE_UNAVAILABLE;
/* LUN not ready, Medium not present */
extern const struct SCSISense sense_code_NO_MEDIUM;
/* LUN not ready, medium removal prevented */
@@ -231,6 +237,8 @@ extern const struct SCSISense sense_code_MEDIUM_CHANGED;
extern const struct SCSISense sense_code_REPORTED_LUNS_CHANGED;
/* Unit attention, Device internal reset */
extern const struct SCSISense sense_code_DEVICE_INTERNAL_RESET;
+/* Unit attention, Asymmetric Access State changed */
+extern const struct SCSISense sense_code_ASYMMETRIC_ACCESS_STATE_CHANGED;
/* Data Protection, Write Protected */
extern const struct SCSISense sense_code_WRITE_PROTECTED;
/* Data Protection, Space Allocation Failed Write Protect */
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 3/8] scsi-disk: Implement 'REPORT TARGET PORT GROUPS'
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
2015-11-27 14:58 ` [Qemu-devel] [PATCH 1/8] scsi-disk: Add 'port_group' property Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 2/8] scsi-disk: Add 'alua_state' property Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 4/8] scsi-disk: Implement 'SET " Hannes Reinecke
` (5 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
Implement support for REPORT TARGET PORT GROUPS scsi command.
Note that target port groups are referenced per SCSI wwn ,
which might be connected to different hosts. So we need to
walk the entire qtree to find all eligible SCSI devices.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-disk.c | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 158 insertions(+), 1 deletion(-)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 583cacd..8dabed3 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -795,6 +795,11 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
outbuf[4] = 36 - 5;
}
+ /* Enable TGPS bit */
+ if (s->wwn) {
+ outbuf[5] = 0x10;
+ }
+
/* Sync data transfer and TCQ. */
outbuf[7] = 0x10 | (req->bus->info->tcq ? 0x02 : 0);
return buflen;
@@ -1819,6 +1824,97 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
scsi_write_same_complete, data);
}
+typedef struct PortGroupEnumerate {
+ int numgrp;
+ uint64_t wwn;
+ uint16_t grp[16];
+ uint8_t alua_state[16];
+ uint16_t alua_mask;
+} PortGroupEnumerate;
+
+static void qbus_enumerate_port_group(PortGroupEnumerate *, BusState *);
+
+static void qdev_enumerate_port_group(PortGroupEnumerate *pg, DeviceState *dev)
+{
+ BusState *child;
+
+ if (!strcmp(object_get_typename(OBJECT(dev->parent_bus)), TYPE_SCSI_BUS)) {
+ SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
+ DPRINTF("wwn 0x%" PRIx64 " pg %u state %x\n",
+ s->wwn, s->port_group, (s->alua_state & 0x0f));
+ if (s->wwn == pg->wwn) {
+ bool pg_found = false;
+ int i;
+
+ for (i = 0; i < pg->numgrp; i++) {
+ if (pg->grp[i] == s->port_group) {
+ pg_found = true;
+ break;
+ }
+ }
+ if (!pg_found) {
+ pg->grp[pg->numgrp] = s->port_group;
+ pg->alua_state[pg->numgrp] = s->alua_state;
+ pg->alua_mask |= 1 << (s->alua_state & 0x0f);
+ pg->numgrp++;
+ }
+ }
+ }
+ QLIST_FOREACH(child, &dev->child_bus, sibling) {
+ qbus_enumerate_port_group(pg, child);
+ }
+}
+
+static void qbus_enumerate_port_group(PortGroupEnumerate *pg, BusState *bus)
+{
+ BusChild *kid;
+
+ QTAILQ_FOREACH(kid, &bus->children, sibling) {
+ DeviceState *dev = kid->child;
+ qdev_enumerate_port_group(pg, dev);
+ }
+}
+
+typedef struct PortDescEnumerate {
+ int numdesc;
+ uint64_t wwn;
+ uint16_t port_group;
+ uint8_t *desc;
+} PortDescEnumerate;
+
+static void qbus_enumerate_port_desc(PortDescEnumerate *, BusState *);
+
+static void qdev_enumerate_port_desc(PortDescEnumerate *pd, DeviceState *dev)
+{
+ BusState *child;
+
+ if (!strcmp(object_get_typename(OBJECT(dev->parent_bus)), TYPE_SCSI_BUS)) {
+ SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
+ if (s->wwn == pd->wwn &&
+ s->port_group == pd->port_group) {
+ pd->desc[0] = 0;
+ pd->desc[1] = 0;
+ pd->desc[2] = (s->port_index >> 8) & 0xff;
+ pd->desc[3] = s->port_index & 0xff;
+ pd->desc += 4;
+ pd->numdesc++;
+ }
+ }
+ QLIST_FOREACH(child, &dev->child_bus, sibling) {
+ qbus_enumerate_port_desc(pd, child);
+ }
+}
+
+static void qbus_enumerate_port_desc(PortDescEnumerate *pd, BusState *bus)
+{
+ BusChild *kid;
+
+ QTAILQ_FOREACH(kid, &bus->children, sibling) {
+ DeviceState *dev = kid->child;
+ qdev_enumerate_port_desc(pd, dev);
+ }
+}
+
static void scsi_disk_emulate_write_data(SCSIRequest *req)
{
SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -1860,6 +1956,54 @@ static void scsi_disk_emulate_write_data(SCSIRequest *req)
}
}
+static int scsi_emulate_report_target_port_groups(SCSIDiskState *s, uint8_t *inbuf)
+{
+ uint8_t *p = inbuf;
+ PortGroupEnumerate pg;
+ PortDescEnumerate pd;
+ int buflen = 0, i;
+
+ if (!s->wwn) {
+ return -1;
+ }
+
+ pg.numgrp = 0;
+ pg.wwn = s->wwn;
+
+ if (sysbus_get_default())
+ qbus_enumerate_port_group(&pg, sysbus_get_default());
+
+ if (pg.numgrp == 0) {
+ return -1;
+ }
+ DPRINTF("wwn 0x%" PRIx64 " %d port groups \n", s->wwn, pg.numgrp);
+ p = &inbuf[4];
+ for (i = 0; i < pg.numgrp; i++) {
+ pd.numdesc = 0;
+ pd.wwn = s->wwn;
+ pd.port_group = pg.grp[i];
+ pd.desc = &p[8];
+ buflen += 8;
+ qbus_enumerate_port_desc(&pd, sysbus_get_default());
+ DPRINTF("pg %x: %d port descriptors\n", pg.grp[i], pd.numdesc);
+ p[0] = pg.alua_state[i];
+ p[1] = pg.alua_mask;
+ p[2] = (pg.grp[i] >> 8) & 0xff;
+ p[3] = pg.grp[i] & 0xff;
+ p[7] = pd.numdesc;
+ p += 8 + pd.numdesc * 4;
+ buflen += pd.numdesc * 4;
+ }
+ if (buflen) {
+ inbuf[0] = (buflen >> 24) & 0xff;
+ inbuf[1] = (buflen >> 16) & 0xff;
+ inbuf[2] = (buflen >> 8) & 0xff;
+ inbuf[3] = buflen & 0xff;
+ }
+
+ return buflen + 4;
+}
+
static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
{
SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -2058,6 +2202,19 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
goto illegal_request;
}
break;
+ case MAINTENANCE_IN:
+ if ((req->cmd.buf[1] & 31) == MI_REPORT_TARGET_PORT_GROUPS) {
+ DPRINTF("MI REPORT TARGET PORT GROUPS\n");
+ memset(outbuf, 0, req->cmd.xfer);
+ buflen = scsi_emulate_report_target_port_groups(s, outbuf);
+ if (buflen < 0) {
+ goto illegal_request;
+ }
+ break;
+ }
+ DPRINTF("Unsupported Maintenance In\n");
+ goto illegal_request;
+ break;
case MECHANISM_STATUS:
buflen = scsi_emulate_mechanism_status(s, outbuf);
if (buflen < 0) {
@@ -2918,7 +3075,7 @@ static void scsi_disk_set_alua_state(Object *obj, Visitor *v, void *opaque,
goto out;
}
- s->alua_state = alua_state;
+ s->alua_state = (s->alua_state & 0xf0) | alua_state;
scsi_device_set_ua(&s->qdev, SENSE_CODE(ASYMMETRIC_ACCESS_STATE_CHANGED));
out:
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 4/8] scsi-disk: Implement 'SET TARGET PORT GROUPS'
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
` (2 preceding siblings ...)
2015-11-27 14:59 ` [Qemu-devel] [PATCH 3/8] scsi-disk: Implement 'REPORT TARGET PORT GROUPS' Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 5/8] scsi-disk: implement ALUA policy Hannes Reinecke
` (4 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
Implement 'SET TARGET PORT GROUPS' handling. The ports states are
switched as indicated in the command; no strategy is implemented.
This might cause issues with standard Linux behaviour, which will
only switch the passive path to 'active' and leave the former
active path alone.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-disk.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++
include/block/scsi.h | 5 +++
2 files changed, 123 insertions(+)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 8dabed3..52c73be 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1915,6 +1915,111 @@ static void qbus_enumerate_port_desc(PortDescEnumerate *pd, BusState *bus)
}
}
+typedef struct PortGroupSetEnumerate {
+ uint64_t wwn;
+ uint16_t port_group;
+ uint8_t alua_state;
+} PortGroupSetEnumerate;
+
+static void qbus_enumerate_set_port(PortGroupSetEnumerate *, BusState *);
+
+static void qdev_enumerate_set_port(PortGroupSetEnumerate *ps, DeviceState *dev)
+{
+ BusState *child;
+
+ if (!strcmp(object_get_typename(OBJECT(dev->parent_bus)), TYPE_SCSI_BUS)) {
+ SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
+ if (s->wwn == ps->wwn &&
+ s->port_group == ps->port_group) {
+ printf("pg %x: switch ALUA state %x -> %x\n",
+ s->port_group, (s->alua_state & 0x0f), ps->alua_state);
+ s->alua_state = (s->alua_state & 0xf0) | ps->alua_state;
+ scsi_device_set_ua(&s->qdev,
+ SENSE_CODE(ASYMMETRIC_ACCESS_STATE_CHANGED));
+ }
+ }
+ QLIST_FOREACH(child, &dev->child_bus, sibling) {
+ qbus_enumerate_set_port(ps, child);
+ }
+}
+
+static void qbus_enumerate_set_port(PortGroupSetEnumerate *ps, BusState *bus)
+{
+ BusChild *kid;
+
+ QTAILQ_FOREACH(kid, &bus->children, sibling) {
+ DeviceState *dev = kid->child;
+ qdev_enumerate_set_port(ps, dev);
+ }
+}
+
+static void scsi_emulate_set_target_port_groups(SCSIDiskReq *r, uint8_t *inbuf)
+{
+ SCSIRequest *req = &r->req;
+ SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+ uint32_t buflen = scsi_data_cdb_xfer(r->req.cmd.buf);
+ uint8_t *p = inbuf;
+ PortGroupEnumerate pg;
+ PortGroupSetEnumerate ps;
+ int i, pg_found = 0;
+
+ if (!s->wwn) {
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ }
+
+ pg.numgrp = 0;
+ pg.wwn = s->wwn;
+ qbus_enumerate_port_group(&pg, sysbus_get_default());
+
+ p = &inbuf[4];
+ /* Validate input before continuing */
+ while (p < inbuf + buflen) {
+ uint16_t port_group;
+ uint8_t alua_state;
+
+ port_group = ((uint16_t)p[2] << 8) + p[3];
+ alua_state = p[0] & 0x7;
+
+ for (i = 0; i < pg.numgrp; i++) {
+ if ((port_group == pg.grp[i]) &&
+ (alua_state == (pg.alua_state[i] & 0x0f))) {
+ printf("pg %x: port already in state %d\n",
+ pg.grp[i], (pg.alua_state[i] & 0x0f));
+ pg_found++;
+ }
+ }
+ p += 4;
+ }
+ if (pg_found == pg.numgrp) {
+ printf("all ports in requested state\n");
+ scsi_req_complete(&r->req, GOOD);
+ return;
+ }
+
+ p = &inbuf[4];
+ while (p < inbuf + buflen) {
+ uint16_t port_group;
+ uint8_t alua_state;
+
+ port_group = ((uint16_t)p[2] << 8) + p[3];
+ alua_state = p[0] & 0x7;
+
+ if (port_group == s->port_group) {
+ printf("pg %x: explicit switch current ALUA state "
+ "%x -> %x\n",
+ port_group, (s->alua_state & 0x0f), alua_state);
+ s->alua_state = (s->alua_state & 0xf0) | alua_state;
+ } else {
+ ps.wwn = s->wwn;
+ ps.port_group = port_group;
+ ps.alua_state = alua_state;
+ qbus_enumerate_set_port(&ps, sysbus_get_default());
+ }
+ p += 4;
+ }
+ scsi_req_complete(&r->req, GOOD);
+}
+
static void scsi_disk_emulate_write_data(SCSIRequest *req)
{
SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -1951,6 +2056,16 @@ static void scsi_disk_emulate_write_data(SCSIRequest *req)
scsi_disk_emulate_write_same(r, r->iov.iov_base);
break;
+ case MAINTENANCE_OUT:
+ if ((req->cmd.buf[1] & 31) == MO_SET_TARGET_PORT_GROUPS) {
+ DPRINTF("MO SET TARGET PORT GROUPS\n");
+ scsi_emulate_set_target_port_groups(r, r->iov.iov_base);
+ break;
+ }
+ DPRINTF("Unsupported Maintenance Out\n");
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ break;
+
default:
abort();
}
@@ -2215,6 +2330,9 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
DPRINTF("Unsupported Maintenance In\n");
goto illegal_request;
break;
+ case MAINTENANCE_OUT:
+ DPRINTF("Maintenance Out (len %lu)\n", (long)r->req.cmd.xfer);
+ break;
case MECHANISM_STATUS:
buflen = scsi_emulate_mechanism_status(s, outbuf);
if (buflen < 0) {
diff --git a/include/block/scsi.h b/include/block/scsi.h
index a9d0f64..47a25b8 100644
--- a/include/block/scsi.h
+++ b/include/block/scsi.h
@@ -156,6 +156,11 @@ const char *scsi_command_name(uint8_t cmd);
#define MI_REPORT_TARGET_PORT_GROUPS 0xa
/*
+ * MAINTENANCE OUT subcodes
+ */
+#define MO_SET_TARGET_PORT_GROUPS 0xa
+
+/*
* READ POSITION service action codes
*/
#define SHORT_FORM_BLOCK_ID 0x00
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 5/8] scsi-disk: implement ALUA policy
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
` (3 preceding siblings ...)
2015-11-27 14:59 ` [Qemu-devel] [PATCH 4/8] scsi-disk: Implement 'SET " Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 6/8] scsi-disk: Allow READ CAPACITY in standby Hannes Reinecke
` (3 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
Implement ALUA policies 'optimized-standby' and 'optimized-nonoptimized'
to emulate active/passive multipath configurations.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-disk.c | 211 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 204 insertions(+), 7 deletions(-)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 52c73be..59d09e4 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -89,6 +89,7 @@ struct SCSIDiskState
char *serial;
char *vendor;
char *product;
+ char *alua_policy;
bool tray_open;
bool tray_locked;
};
@@ -795,9 +796,14 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
outbuf[4] = 36 - 5;
}
- /* Enable TGPS bit */
+ /* Enable TGPS bits */
if (s->wwn) {
- outbuf[5] = 0x10;
+ if (s->alua_policy &&
+ !strcmp(s->alua_policy, "optimized-standby")) {
+ outbuf[5] = 0x30;
+ } else {
+ outbuf[5] = 0x10;
+ }
}
/* Sync data transfer and TCQ. */
@@ -1958,18 +1964,53 @@ static void scsi_emulate_set_target_port_groups(SCSIDiskReq *r, uint8_t *inbuf)
SCSIRequest *req = &r->req;
SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
uint32_t buflen = scsi_data_cdb_xfer(r->req.cmd.buf);
+ uint16_t current_port_group = (uint16_t)-1;
+ uint16_t alternate_port_group = (uint16_t)-1;
+ uint8_t primary_alua_state = ALUA_STATE_ACTIVE_OPTIMIZED;
+ uint8_t secondary_alua_state = ALUA_STATE_STANDBY;
+ uint8_t new_current_state, new_alternate_state = 0;
uint8_t *p = inbuf;
PortGroupEnumerate pg;
PortGroupSetEnumerate ps;
- int i, pg_found = 0;
+ bool switch_current, switch_alternate;
+ int i, pg_found = 0, primary_state_found = 0, secondary_state_found = 0;
if (!s->wwn) {
scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ return;
+ }
+
+ if (!s->alua_policy) {
+ printf("No ALUA policy set\n");
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ return;
}
pg.numgrp = 0;
pg.wwn = s->wwn;
qbus_enumerate_port_group(&pg, sysbus_get_default());
+ if (s->alua_policy) {
+ if (pg.numgrp != 2) {
+ printf("ALUA policy can not handle %d port groups", pg.numgrp);
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ return;
+ }
+ if (!strcmp(s->alua_policy, "optimized-nonoptimized")) {
+ primary_alua_state = ALUA_STATE_ACTIVE_OPTIMIZED;
+ secondary_alua_state = ALUA_STATE_ACTIVE_NON_OPTIMIZED;
+ }
+ for (i = 0; i < pg.numgrp; i++) {
+ if (pg.grp[i] == s->port_group) {
+ current_port_group = pg.grp[i];
+ new_current_state = (pg.alua_state[i] & 0x0f);
+ } else {
+ alternate_port_group = pg.grp[i];
+ new_alternate_state = (pg.alua_state[i] & 0x0f);
+ }
+ }
+ } else {
+ current_port_group = s->port_group;
+ }
p = &inbuf[4];
/* Validate input before continuing */
@@ -1980,6 +2021,24 @@ static void scsi_emulate_set_target_port_groups(SCSIDiskReq *r, uint8_t *inbuf)
port_group = ((uint16_t)p[2] << 8) + p[3];
alua_state = p[0] & 0x7;
+ if (s->alua_policy) {
+ if ((port_group != current_port_group) &&
+ (port_group != alternate_port_group)) {
+ printf("pg %x: port_group not handled by policy", port_group);
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ return;
+ }
+ if (alua_state == primary_alua_state) {
+ primary_state_found++;
+ } else if (alua_state == secondary_alua_state) {
+ secondary_state_found++;
+ } else {
+ printf("pg %x: state %x not handled by policy\n",
+ port_group, alua_state);
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ return;
+ }
+ }
for (i = 0; i < pg.numgrp; i++) {
if ((port_group == pg.grp[i]) &&
(alua_state == (pg.alua_state[i] & 0x0f))) {
@@ -1990,12 +2049,22 @@ static void scsi_emulate_set_target_port_groups(SCSIDiskReq *r, uint8_t *inbuf)
}
p += 4;
}
+ if (s->alua_policy) {
+ if (primary_state_found != 1 &&
+ secondary_state_found != 1) {
+ printf("State change forbidden by policy\n");
+ scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+ return;
+ }
+ }
if (pg_found == pg.numgrp) {
printf("all ports in requested state\n");
scsi_req_complete(&r->req, GOOD);
return;
}
+ switch_current = true;
+ switch_alternate = true;
p = &inbuf[4];
while (p < inbuf + buflen) {
uint16_t port_group;
@@ -2003,20 +2072,44 @@ static void scsi_emulate_set_target_port_groups(SCSIDiskReq *r, uint8_t *inbuf)
port_group = ((uint16_t)p[2] << 8) + p[3];
alua_state = p[0] & 0x7;
-
- if (port_group == s->port_group) {
+ if (port_group == current_port_group) {
printf("pg %x: explicit switch current ALUA state "
"%x -> %x\n",
port_group, (s->alua_state & 0x0f), alua_state);
s->alua_state = (s->alua_state & 0xf0) | alua_state;
+ new_current_state = alua_state;
+ switch_current = false;
} else {
ps.wwn = s->wwn;
ps.port_group = port_group;
ps.alua_state = alua_state;
+ new_alternate_state = alua_state;
qbus_enumerate_set_port(&ps, sysbus_get_default());
+ switch_alternate = false;
}
p += 4;
}
+
+ if (s->alua_policy) {
+ ps.wwn = s->wwn;
+ if (switch_current) {
+ ps.port_group = current_port_group;
+ if (new_alternate_state == primary_alua_state) {
+ ps.alua_state = secondary_alua_state;
+ } else {
+ ps.alua_state = primary_alua_state;
+ }
+ qbus_enumerate_set_port(&ps, sysbus_get_default());
+ } else if (switch_alternate) {
+ ps.port_group = alternate_port_group;
+ if (new_current_state == primary_alua_state) {
+ ps.alua_state = secondary_alua_state;
+ } else {
+ ps.alua_state = primary_alua_state;
+ }
+ qbus_enumerate_set_port(&ps, sysbus_get_default());
+ }
+ }
scsi_req_complete(&r->req, GOOD);
}
@@ -3168,6 +3261,39 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_scsi_disk_state;
}
+static void scsi_disk_get_alua_policy(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+ SCSIDiskState *s = OBJECT_CHECK(SCSIDiskState, obj, "scsi-disk");
+
+ visit_type_str(v, &s->alua_policy, name, errp);
+}
+
+static void scsi_disk_set_alua_policy(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+ SCSIDiskState *s = OBJECT_CHECK(SCSIDiskState, obj, "scsi-disk");
+ char *alua_policy;
+ Error *local_err = NULL;
+
+ visit_type_str(v, &alua_policy, name, &local_err);
+ if (local_err) {
+ goto out;
+ }
+
+ if (strcmp(alua_policy, "optimized-standby") &&
+ strcmp(alua_policy, "optimized-nonoptimized")) {
+ error_setg(&local_err, "Invalid ALUA policy %s\n", alua_policy);
+ goto out;
+ }
+ g_free(s->alua_policy);
+ s->alua_policy = alua_policy;
+out:
+ if (local_err) {
+ error_propagate(errp, local_err);
+ }
+}
+
static void scsi_disk_get_alua_state(Object *obj, Visitor *v, void *opaque,
const char *name, Error **errp)
{
@@ -3192,9 +3318,76 @@ static void scsi_disk_set_alua_state(Object *obj, Visitor *v, void *opaque,
error_setg(&local_err, "Invalid ALUA state %d\n", alua_state);
goto out;
}
+ if (s->alua_policy) {
+ uint8_t primary_alua_state = ALUA_STATE_ACTIVE_OPTIMIZED;
+ uint8_t secondary_alua_state = ALUA_STATE_STANDBY;
+ PortGroupEnumerate pg;
+ PortGroupSetEnumerate ps;
+ bool switch_to_primary = false;
+ int i;
- s->alua_state = (s->alua_state & 0xf0) | alua_state;
- scsi_device_set_ua(&s->qdev, SENSE_CODE(ASYMMETRIC_ACCESS_STATE_CHANGED));
+ if (!strcmp(s->alua_policy, "optimized-nonoptimized")) {
+ primary_alua_state = ALUA_STATE_ACTIVE_OPTIMIZED;
+ secondary_alua_state = ALUA_STATE_ACTIVE_NON_OPTIMIZED;
+ }
+
+ if (alua_state != primary_alua_state &&
+ alua_state != secondary_alua_state) {
+ error_setg(&local_err, "ALUA state %d forbidden by policy\n",
+ alua_state);
+ goto out;
+ }
+ if (!s->wwn) {
+ error_setg(&local_err, "No WWN set\n");
+ goto out;
+ }
+ pg.numgrp = 0;
+ pg.wwn = s->wwn;
+
+ if (sysbus_get_default()) {
+ qbus_enumerate_port_group(&pg, sysbus_get_default());
+ }
+ if (pg.numgrp == 0) {
+ error_setg(&local_err, "No port group found for %" PRIx64 "\n",
+ s->wwn);
+ goto out;
+ }
+ if (pg.numgrp > 2) {
+ error_setg(&local_err, "Too many port groups for policy\n");
+ goto out;
+ }
+ for (i = 0; i < pg.numgrp; i++) {
+ if (pg.grp[i] == s->port_group) {
+ if ((pg.alua_state[i] & 0x0f) == alua_state) {
+ /* Nothing to be done */
+ goto out;
+ }
+ if (alua_state == primary_alua_state) {
+ switch_to_primary = true;
+ }
+ printf("pg %x: implicit switch primary to new ALUA state %d\n",
+ s->port_group, alua_state);
+ s->alua_state = (s->alua_state & 0xf0) | alua_state;
+ scsi_device_set_ua(&s->qdev,
+ SENSE_CODE(ASYMMETRIC_ACCESS_STATE_CHANGED));
+ continue;
+ }
+ ps.port_group = pg.grp[i];
+ }
+ ps.wwn = s->wwn;
+ if (switch_to_primary) {
+ ps.alua_state = secondary_alua_state;
+ } else {
+ ps.alua_state = primary_alua_state;
+ }
+ printf("pg %x: implicit switch secondary to new ALUA state %d\n",
+ ps.port_group, alua_state);
+ qbus_enumerate_set_port(&ps, sysbus_get_default());
+ } else {
+ s->alua_state = (s->alua_state & 0xf0) | alua_state;
+ scsi_device_set_ua(&s->qdev,
+ SENSE_CODE(ASYMMETRIC_ACCESS_STATE_CHANGED));
+ }
out:
if (local_err) {
@@ -3208,6 +3401,10 @@ static void scsi_disk_instance_initfn(Object *obj)
scsi_disk_get_alua_state,
scsi_disk_set_alua_state, NULL, NULL, NULL);
object_property_set_int(obj, ALUA_STATE_ACTIVE_OPTIMIZED, "alua_state", NULL);
+ object_property_add(obj, "alua_policy", "str",
+ scsi_disk_get_alua_policy,
+ scsi_disk_set_alua_policy, NULL, NULL, NULL);
+ object_property_set_str(obj, "", "alua_policy", NULL);
}
static const TypeInfo scsi_disk_info = {
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 6/8] scsi-disk: Allow READ CAPACITY in standby
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
` (4 preceding siblings ...)
2015-11-27 14:59 ` [Qemu-devel] [PATCH 5/8] scsi-disk: implement ALUA policy Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 7/8] scsi-disk: Implement 'alua_preferred' option Hannes Reinecke
` (2 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
SPC does not mandate to allow READ CAPACITY when in standby,
but linux currently relies on a valid capacity. Otherwise
requests will be retried from sd_prep_fn() and I/O will
never complete.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-disk.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 59d09e4..b3ab890 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2278,6 +2278,22 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
transition_allowed = false;
unavailable_allowed = false;
break;
+ case SERVICE_ACTION_IN_16:
+ /*
+ * READ CAPACITY is not required by SPC,
+ * but Linux (currently) relies on a
+ * valid capacity, otherwise requests will
+ * be retried from sd.c:sd_prep_dn() and
+ * optimized-standby failover won't work.
+ */
+ if ((req->cmd.buf[1] & 0x31) != SAI_READ_CAPACITY_16) {
+ standby_allowed = false;
+ }
+ /* Fallthrough */
+ case READ_CAPACITY_10:
+ transition_allowed = false;
+ unavailable_allowed = false;
+ break;
default:
transition_allowed = false;
unavailable_allowed = false;
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 7/8] scsi-disk: Implement 'alua_preferred' option
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
` (5 preceding siblings ...)
2015-11-27 14:59 ` [Qemu-devel] [PATCH 6/8] scsi-disk: Allow READ CAPACITY in standby Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 8/8] block: Implement 'block_disconnect' HMP command Hannes Reinecke
2015-12-10 8:26 ` [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Stefan Hajnoczi
8 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
Implement an option to set the 'preferred path' bit in the
REPORT TARGET PORT GROUPS output.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
hw/scsi/scsi-disk.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index b3ab890..07e0c28 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -3411,6 +3411,38 @@ out:
}
}
+static void scsi_disk_get_alua_pref(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+ SCSIDiskState *s = OBJECT_CHECK(SCSIDiskState, obj, "scsi-disk");
+ bool pref = !!(s->alua_state & 0x80);
+
+ visit_type_bool(v, &pref, name, errp);
+}
+
+static void scsi_disk_set_alua_pref(Object *obj, Visitor *v, void *opaque,
+ const char *name, Error **errp)
+{
+ SCSIDiskState *s = OBJECT_CHECK(SCSIDiskState, obj, "scsi-disk");
+ bool pref;
+ Error *local_err = NULL;
+
+ visit_type_bool(v, &pref, name, &local_err);
+ if (local_err) {
+ goto out;
+ }
+
+ if (pref) {
+ s->alua_state |= 0x80;
+ } else {
+ s->alua_state &= ~0x80;
+ }
+out:
+ if (local_err) {
+ error_propagate(errp, local_err);
+ }
+}
+
static void scsi_disk_instance_initfn(Object *obj)
{
object_property_add(obj, "alua_state", "uint8",
@@ -3421,6 +3453,11 @@ static void scsi_disk_instance_initfn(Object *obj)
scsi_disk_get_alua_policy,
scsi_disk_set_alua_policy, NULL, NULL, NULL);
object_property_set_str(obj, "", "alua_policy", NULL);
+
+ object_property_add(obj, "alua_preferred", "bool",
+ scsi_disk_get_alua_pref,
+ scsi_disk_set_alua_pref, NULL, NULL, NULL);
+ object_property_set_bool(obj, false, "alua_preferred", NULL);
}
static const TypeInfo scsi_disk_info = {
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 8/8] block: Implement 'block_disconnect' HMP command
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
` (6 preceding siblings ...)
2015-11-27 14:59 ` [Qemu-devel] [PATCH 7/8] scsi-disk: Implement 'alua_preferred' option Hannes Reinecke
@ 2015-11-27 14:59 ` Hannes Reinecke
2015-11-27 18:00 ` Eric Blake
2015-12-10 8:26 ` [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Stefan Hajnoczi
8 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2015-11-27 14:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Hannes Reinecke, Stefan Hajnoczi, qemu-devel, Alexander Graf,
Johannes Thumshirn, Hannes Reinecke
Implement a 'block_disconnect' HMP command to simulate a device
communication / link failure.
Signed-off-by: Hannes Reinecke <hare@suse.com>
---
block.c | 5 +++++
block/block-backend.c | 8 ++++++++
blockdev.c | 18 ++++++++++++++++++
hmp-commands.hx | 14 ++++++++++++++
hmp.c | 10 ++++++++++
hmp.h | 1 +
hw/scsi/megasas.c | 4 ++++
hw/scsi/virtio-scsi.c | 5 +++++
include/block/block.h | 1 +
include/block/block_int.h | 1 +
include/sysemu/block-backend.h | 1 +
qapi/block-core.json | 21 +++++++++++++++++++++
qmp-commands.hx | 26 ++++++++++++++++++++++++++
13 files changed, 115 insertions(+)
diff --git a/block.c b/block.c
index 3a7324b..0b48c8f 100644
--- a/block.c
+++ b/block.c
@@ -3130,6 +3130,11 @@ void bdrv_lock_medium(BlockDriverState *bs, bool locked)
}
}
+bool bdrv_is_disconnected(BlockDriverState *bs)
+{
+ return bs->disconnected;
+}
+
BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
{
BdrvDirtyBitmap *bm;
diff --git a/block/block-backend.c b/block/block-backend.c
index 9889e81..da55bf8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1062,6 +1062,14 @@ void blk_op_unblock_all(BlockBackend *blk, Error *reason)
}
}
+bool blk_is_disconnected(BlockBackend *blk)
+{
+ if (blk->bs) {
+ return bdrv_is_disconnected(blk->bs);
+ }
+ return false;
+}
+
AioContext *blk_get_aio_context(BlockBackend *blk)
{
if (blk->bs) {
diff --git a/blockdev.c b/blockdev.c
index fc85128..2dbd895 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2821,6 +2821,24 @@ out:
aio_context_release(aio_context);
}
+void qmp_block_disconnect(bool has_device, const char *device,
+ bool has_node_name, const char *node_name,
+ bool disconnect, Error **errp)
+{
+ Error *local_err = NULL;
+ BlockDriverState *bs;
+
+ bs = bdrv_lookup_bs(has_device ? device : NULL,
+ has_node_name ? node_name : NULL,
+ &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+
+ bs->disconnected = disconnect;
+}
+
static void block_job_cb(void *opaque, int ret)
{
/* Note that this function may be executed from another AioContext besides
diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb52e4d..992e10c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -74,6 +74,20 @@ resizes image files, it can not resize block devices like LVM volumes.
ETEXI
{
+ .name = "block_disconnect",
+ .args_type = "device:B,disconnect:b",
+ .params = "disconnect",
+ .help = "disconnect a block device",
+ .mhandler.cmd = hmp_block_disconnect,
+ },
+
+STEXI
+@item block_disconnect
+@findex block_disconnect
+Simulate a block device disconnect while a guest is running.
+ETEXI
+
+ {
.name = "block_stream",
.args_type = "device:B,speed:o?,base:s?",
.params = "device [speed [base]]",
diff --git a/hmp.c b/hmp.c
index 2140605..60d11b0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1049,6 +1049,16 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, &err);
}
+void hmp_block_disconnect(Monitor *mon, const QDict *qdict)
+{
+ const char *device = qdict_get_str(qdict, "device");
+ bool disconnect = qdict_get_try_bool(qdict, "disconnect", false);
+ Error *err = NULL;
+
+ qmp_block_disconnect(true, device, false, NULL, disconnect, &err);
+ hmp_handle_error(mon, &err);
+}
+
void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
{
const char *device = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index a8c5b5a..e2f5e81 100644
--- a/hmp.h
+++ b/hmp.h
@@ -56,6 +56,7 @@ void hmp_set_link(Monitor *mon, const QDict *qdict);
void hmp_block_passwd(Monitor *mon, const QDict *qdict);
void hmp_balloon(Monitor *mon, const QDict *qdict);
void hmp_block_resize(Monitor *mon, const QDict *qdict);
+void hmp_block_disconnect(Monitor *mon, const QDict *qdict);
void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
void hmp_snapshot_blkdev_internal(Monitor *mon, const QDict *qdict);
void hmp_snapshot_delete_blkdev_internal(Monitor *mon, const QDict *qdict);
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index d7dc667..b2942b7 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -1678,6 +1678,10 @@ static int megasas_handle_scsi(MegasasState *s, MegasasCmd *cmd,
return MFI_STAT_DEVICE_NOT_FOUND;
}
+ if (blk_is_disconnected(sdev->conf.blk)) {
+ return MFI_STAT_LD_OFFLINE;
+ }
+
if (cmd->frame->header.cdb_len > 16) {
trace_megasas_scsi_invalid_cdb_len(
mfi_frame_desc[cmd->frame->header.frame_cmd], is_logical,
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 7655401..748b777 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -534,6 +534,11 @@ bool virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
virtio_scsi_complete_cmd_req(req);
return false;
}
+ if (blk_is_disconnected(d->conf.blk)) {
+ req->resp.cmd.response = VIRTIO_SCSI_S_TRANSPORT_FAILURE;
+ virtio_scsi_complete_cmd_req(req);
+ return false;
+ }
if (s->dataplane_started) {
assert(blk_get_aio_context(d->conf.blk) == s->ctx);
}
diff --git a/include/block/block.h b/include/block/block.h
index 73edb1a..ea5ebf3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -398,6 +398,7 @@ int bdrv_is_sg(BlockDriverState *bs);
int bdrv_enable_write_cache(BlockDriverState *bs);
void bdrv_set_enable_write_cache(BlockDriverState *bs, bool wce);
bool bdrv_is_inserted(BlockDriverState *bs);
+bool bdrv_is_disconnected(BlockDriverState *bs);
int bdrv_media_changed(BlockDriverState *bs);
void bdrv_lock_medium(BlockDriverState *bs, bool locked);
void bdrv_eject(BlockDriverState *bs, bool eject_flag);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4012e36..60ff193 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -371,6 +371,7 @@ struct BlockDriverState {
int sg; /* if true, the device is a /dev/sg* */
int copy_on_read; /* if true, copy read backing sectors into image
note this is a reference count */
+ bool disconnected;
bool probed;
BlockDriver *drv; /* NULL means no media */
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index f4a68e2..2578996 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -144,6 +144,7 @@ bool blk_is_inserted(BlockBackend *blk);
bool blk_is_available(BlockBackend *blk);
void blk_lock_medium(BlockBackend *blk, bool locked);
void blk_eject(BlockBackend *blk, bool eject_flag);
+bool blk_is_disconnected(BlockBackend *blk);
int blk_get_flags(BlockBackend *blk);
int blk_get_max_transfer_length(BlockBackend *blk);
void blk_set_guest_block_size(BlockBackend *blk, int align);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f97c250..e3009bf 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -754,6 +754,27 @@
'size': 'int' }}
##
+# @block_disconnect
+#
+# Simulate block device disconnect while a guest is running.
+#
+# Either @device or @node-name must be set but not both.
+#
+# @device: #optional the name of the device to get the image resized
+#
+# @node-name: #optional graph node name to get the image resized (Since 2.0)
+#
+# @disconnect: true for disconnecting the device
+#
+# Returns: nothing on success
+# If @device is not a valid block device, DeviceNotFound
+#
+##
+{ 'command': 'block_disconnect', 'data': { '*device': 'str',
+ '*node-name': 'str',
+ 'disconnect': 'bool' }}
+
+##
# @NewImageMode
#
# An enumeration that tells QEMU how to set the backing file path in
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9d8b42f..5181d54 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1052,6 +1052,32 @@ Example:
EQMP
+
+ {
+ .name = "block_disconnect",
+ .args_type = "device:s?,node-name:s?,disconnect:b",
+ .mhandler.cmd_new = qmp_marshal_block_disconnect,
+ },
+
+SQMP
+block_disconnect
+----------------
+
+Simulate a block device disconnect while a guest is running.
+
+Arguments:
+
+- "device": the device's ID, must be unique (json-string)
+- "node-name": the node name in the block driver state graph (json-string)
+- "disconnect": whether to simulate a device disconnect (json-bool)
+
+Example:
+
+-> { "execute": "block_disconnect", "arguments": { "device": "scratch", "disconnect": true } }
+<- { "return": {} }
+
+EQMP
+
{
.name = "block-stream",
.args_type = "device:B,base:s?,speed:o?,backing-file:s?,on-error:s?",
--
1.8.4.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] block: Implement 'block_disconnect' HMP command
2015-11-27 14:59 ` [Qemu-devel] [PATCH 8/8] block: Implement 'block_disconnect' HMP command Hannes Reinecke
@ 2015-11-27 18:00 ` Eric Blake
0 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2015-11-27 18:00 UTC (permalink / raw)
To: Hannes Reinecke, Paolo Bonzini
Cc: Johannes Thumshirn, Stefan Hajnoczi, Hannes Reinecke, qemu-devel,
Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 2510 bytes --]
On 11/27/2015 07:59 AM, Hannes Reinecke wrote:
> Implement a 'block_disconnect' HMP command to simulate a device
> communication / link failure.
>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
> +++ b/qapi/block-core.json
> @@ -754,6 +754,27 @@
> 'size': 'int' }}
>
> ##
> +# @block_disconnect
New QMP commands should favor '-' rather than '_'; this should be
'block-disconnect'.
> +#
> +# Simulate block device disconnect while a guest is running.
> +#
> +# Either @device or @node-name must be set but not both.
> +#
> +# @device: #optional the name of the device to get the image resized
Bad copy-and-paste? We aren't resizing anything here.
> +#
> +# @node-name: #optional graph node name to get the image resized (Since 2.0)
And again. And since the command is new, you don't need a '(Since 2.0)'.
> +#
> +# @disconnect: true for disconnecting the device
> +#
> +# Returns: nothing on success
> +# If @device is not a valid block device, DeviceNotFound
> +#
> +##
Missing a '# Since 2.6' line.
> +{ 'command': 'block_disconnect', 'data': { '*device': 'str',
> + '*node-name': 'str',
> + 'disconnect': 'bool' }}
Mutually-exclusive 'device' vs. 'node-name' is awkward. For new
commands, it is sufficient to just use 'node' (and accept both node
names for the direct node to disconnect, or a device name to detach the
BDS node plugged in to that device).
> +block_disconnect
> +----------------
> +
> +Simulate a block device disconnect while a guest is running.
> +
> +Arguments:
> +
> +- "device": the device's ID, must be unique (json-string)
> +- "node-name": the node name in the block driver state graph (json-string)
Awkward that you aren't mentioning the mutual exclusion above; and again
I think that a single parameter is better than two mutually exclusive ones.
> +- "disconnect": whether to simulate a device disconnect (json-bool)
Do I again call the command with 'disconnect':false to undo the
disconnect? That sounds like a double-negative. It might make more
sense to have:
{ 'command':'block-set-connection', 'data': { 'node':'str',
'connected':'bool' } }
where I pass 'connected':false to disconnect, and 'connected':true to
reconnect.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
` (7 preceding siblings ...)
2015-11-27 14:59 ` [Qemu-devel] [PATCH 8/8] block: Implement 'block_disconnect' HMP command Hannes Reinecke
@ 2015-12-10 8:26 ` Stefan Hajnoczi
2015-12-10 9:13 ` Hannes Reinecke
8 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2015-12-10 8:26 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Johannes Thumshirn, Paolo Bonzini, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]
On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote:
> here's now an updated version to enable ALUA and simplified
> active/passive multipath support for qemu.
>
> This patchset relies on having _two_ block devices configured,
> and two SCSI disks pointing to those block devices with the
> _same_ 'wwn' property and unique 'port_group' properties.
> I know, this is a bit of a nasty hack, but I hope to add
> proper multipath support (with several SCSI devices pointing /
> linking to the same block device) in the near future.
>
> It also implements a 'alua_policy', which allows for simulating
> an 'active/passive' multipath setup.
>
> And for testing I've implemented a 'block_disconnect' HMP command,
> which simulates a link failure for the attached devices.
>
> I wouldn't object if someone declares this a gross hack, but with
> it I can finally simulate real-life multipath failover and do
> some functional multipath-tools testing withouth having to recurse
> on using real hardware.
I'm not familiar with how ALUA works but have been thinking about a
multipath problem:
If the host has SCSI disks that are marked 'offline' then QEMU will
refuse to start up since it cannot open the block device (ENXIO).
Does it make sense to allow guests to start in this condition?
I think we'd need to notice when the disk comes back online and notify
the guest.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
2015-12-10 8:26 ` [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Stefan Hajnoczi
@ 2015-12-10 9:13 ` Hannes Reinecke
2015-12-14 7:24 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2015-12-10 9:13 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Johannes Thumshirn, Paolo Bonzini, qemu-devel, Alexander Graf
On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote:
> On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote:
>> here's now an updated version to enable ALUA and simplified
>> active/passive multipath support for qemu.
>>
>> This patchset relies on having _two_ block devices configured,
>> and two SCSI disks pointing to those block devices with the
>> _same_ 'wwn' property and unique 'port_group' properties.
>> I know, this is a bit of a nasty hack, but I hope to add
>> proper multipath support (with several SCSI devices pointing /
>> linking to the same block device) in the near future.
>>
>> It also implements a 'alua_policy', which allows for simulating
>> an 'active/passive' multipath setup.
>>
>> And for testing I've implemented a 'block_disconnect' HMP command,
>> which simulates a link failure for the attached devices.
>>
>> I wouldn't object if someone declares this a gross hack, but with
>> it I can finally simulate real-life multipath failover and do
>> some functional multipath-tools testing withouth having to recurse
>> on using real hardware.
>
> I'm not familiar with how ALUA works but have been thinking about a
> multipath problem:
>
> If the host has SCSI disks that are marked 'offline' then QEMU will
> refuse to start up since it cannot open the block device (ENXIO).
>
Define 'offline'.
If this means the ALUA state 'offline' then we wouldn't have to
worry; ALUA state 'offline' essentially means "Yeah, there's
something here, but I won't tell you and you cannot access it.".
And any transitions to and from 'offline' are essentially
vendor-specific.
In short: Do not use it.
If OTOH means the 'block_disconnect' state this is something which
should/needs to be implemented in the HBA emulation for simulating
a link failure.
qemu itself should be able to access the device and it should start
up perfectly normal, so we shouldn't get any ENXIO errors.
(Obviously, if _all_ disks are in 'disconnect' state the guest
wouldn't start up as it cannot read any data. But that's beside the
point.)
> Does it make sense to allow guests to start in this condition?
>
Sorta. At least it'd be good to allow this, if only for debugging.
> I think we'd need to notice when the disk comes back online and notify
> the guest.
>
Nope. That's something the HBA emulation is responsible for.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
2015-12-10 9:13 ` Hannes Reinecke
@ 2015-12-14 7:24 ` Stefan Hajnoczi
2015-12-14 7:35 ` Hannes Reinecke
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2015-12-14 7:24 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Johannes Thumshirn, Paolo Bonzini, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]
On Thu, Dec 10, 2015 at 10:13:17AM +0100, Hannes Reinecke wrote:
> On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote:
> >On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote:
> >>here's now an updated version to enable ALUA and simplified
> >>active/passive multipath support for qemu.
> >>
> >>This patchset relies on having _two_ block devices configured,
> >>and two SCSI disks pointing to those block devices with the
> >>_same_ 'wwn' property and unique 'port_group' properties.
> >>I know, this is a bit of a nasty hack, but I hope to add
> >>proper multipath support (with several SCSI devices pointing /
> >>linking to the same block device) in the near future.
> >>
> >>It also implements a 'alua_policy', which allows for simulating
> >>an 'active/passive' multipath setup.
> >>
> >>And for testing I've implemented a 'block_disconnect' HMP command,
> >>which simulates a link failure for the attached devices.
> >>
> >>I wouldn't object if someone declares this a gross hack, but with
> >>it I can finally simulate real-life multipath failover and do
> >>some functional multipath-tools testing withouth having to recurse
> >>on using real hardware.
> >
> >I'm not familiar with how ALUA works but have been thinking about a
> >multipath problem:
> >
> >If the host has SCSI disks that are marked 'offline' then QEMU will
> >refuse to start up since it cannot open the block device (ENXIO).
> >
> Define 'offline'.
> If this means the ALUA state 'offline' then we wouldn't have to worry; ALUA
> state 'offline' essentially means "Yeah, there's something here, but I won't
> tell you and you cannot access it.".
> And any transitions to and from 'offline' are essentially vendor-specific.
> In short: Do not use it.
>
> If OTOH means the 'block_disconnect' state this is something which
> should/needs to be implemented in the HBA emulation for simulating
> a link failure.
> qemu itself should be able to access the device and it should start up
> perfectly normal, so we shouldn't get any ENXIO errors.
>
> (Obviously, if _all_ disks are in 'disconnect' state the guest wouldn't
> start up as it cannot read any data. But that's beside the point.)
I'm referring to scsi_device_set_state(scmd->device, SDEV_OFFLINE) in
Linux. This is the state where the host block device cannot be opened
or accessed.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
2015-12-14 7:24 ` Stefan Hajnoczi
@ 2015-12-14 7:35 ` Hannes Reinecke
2015-12-15 3:02 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2015-12-14 7:35 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Johannes Thumshirn, Paolo Bonzini, qemu-devel, Alexander Graf
On 12/14/2015 08:24 AM, Stefan Hajnoczi wrote:
> On Thu, Dec 10, 2015 at 10:13:17AM +0100, Hannes Reinecke wrote:
>> On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote:
>>> On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote:
>>>> here's now an updated version to enable ALUA and simplified
>>>> active/passive multipath support for qemu.
>>>>
>>>> This patchset relies on having _two_ block devices configured,
>>>> and two SCSI disks pointing to those block devices with the
>>>> _same_ 'wwn' property and unique 'port_group' properties.
>>>> I know, this is a bit of a nasty hack, but I hope to add
>>>> proper multipath support (with several SCSI devices pointing /
>>>> linking to the same block device) in the near future.
>>>>
>>>> It also implements a 'alua_policy', which allows for simulating
>>>> an 'active/passive' multipath setup.
>>>>
>>>> And for testing I've implemented a 'block_disconnect' HMP command,
>>>> which simulates a link failure for the attached devices.
>>>>
>>>> I wouldn't object if someone declares this a gross hack, but with
>>>> it I can finally simulate real-life multipath failover and do
>>>> some functional multipath-tools testing withouth having to recurse
>>>> on using real hardware.
>>>
>>> I'm not familiar with how ALUA works but have been thinking about a
>>> multipath problem:
>>>
>>> If the host has SCSI disks that are marked 'offline' then QEMU will
>>> refuse to start up since it cannot open the block device (ENXIO).
>>>
>> Define 'offline'.
>> If this means the ALUA state 'offline' then we wouldn't have to worry; ALUA
>> state 'offline' essentially means "Yeah, there's something here, but I won't
>> tell you and you cannot access it.".
>> And any transitions to and from 'offline' are essentially vendor-specific.
>> In short: Do not use it.
>>
>> If OTOH means the 'block_disconnect' state this is something which
>> should/needs to be implemented in the HBA emulation for simulating
>> a link failure.
>> qemu itself should be able to access the device and it should start up
>> perfectly normal, so we shouldn't get any ENXIO errors.
>>
>> (Obviously, if _all_ disks are in 'disconnect' state the guest wouldn't
>> start up as it cannot read any data. But that's beside the point.)
>
> I'm referring to scsi_device_set_state(scmd->device, SDEV_OFFLINE) in
> Linux. This is the state where the host block device cannot be opened
> or accessed.
>
Which means the device is declared dead by the SCSI stack.
And qemu does _very_ well not to start in this circumstances.
However, this behaviour is not influenced nor modified by the ALUA
patchset but is rather a different topic.
<rambling>
'offline' devices is the final step in SCSI EH, which means that
SCSI EH has exhausted its options and doesn't know how to fix the
device.
However, in modern systems this typically happens when SCSI EH kicks
in during a (transport) link disconnect, as then every single step
in SCSI EH will fail. (Which also means that SCSI EH is woefully
inadequate for FC, but that's a different topic.)
But as this is a transport issue, _all_ respective drivers should be
aware of this, and should have been modified _not_ to start SCSI EH
when the transport link is severed.
So the very fact that SCSI EH is started means that there's an issue
with the driver, which really needs to be fixed first.
Hence I think qemu is right here, as the underlying reason for the
'offline' device should be fixed first.
</rambling>
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
2015-12-14 7:35 ` Hannes Reinecke
@ 2015-12-15 3:02 ` Stefan Hajnoczi
2015-12-15 6:49 ` Hannes Reinecke
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2015-12-15 3:02 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Johannes Thumshirn, Paolo Bonzini, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 3758 bytes --]
On Mon, Dec 14, 2015 at 08:35:43AM +0100, Hannes Reinecke wrote:
> On 12/14/2015 08:24 AM, Stefan Hajnoczi wrote:
> >On Thu, Dec 10, 2015 at 10:13:17AM +0100, Hannes Reinecke wrote:
> >>On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote:
> >>>On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote:
> >>>>here's now an updated version to enable ALUA and simplified
> >>>>active/passive multipath support for qemu.
> >>>>
> >>>>This patchset relies on having _two_ block devices configured,
> >>>>and two SCSI disks pointing to those block devices with the
> >>>>_same_ 'wwn' property and unique 'port_group' properties.
> >>>>I know, this is a bit of a nasty hack, but I hope to add
> >>>>proper multipath support (with several SCSI devices pointing /
> >>>>linking to the same block device) in the near future.
> >>>>
> >>>>It also implements a 'alua_policy', which allows for simulating
> >>>>an 'active/passive' multipath setup.
> >>>>
> >>>>And for testing I've implemented a 'block_disconnect' HMP command,
> >>>>which simulates a link failure for the attached devices.
> >>>>
> >>>>I wouldn't object if someone declares this a gross hack, but with
> >>>>it I can finally simulate real-life multipath failover and do
> >>>>some functional multipath-tools testing withouth having to recurse
> >>>>on using real hardware.
> >>>
> >>>I'm not familiar with how ALUA works but have been thinking about a
> >>>multipath problem:
> >>>
> >>>If the host has SCSI disks that are marked 'offline' then QEMU will
> >>>refuse to start up since it cannot open the block device (ENXIO).
> >>>
> >>Define 'offline'.
> >>If this means the ALUA state 'offline' then we wouldn't have to worry; ALUA
> >>state 'offline' essentially means "Yeah, there's something here, but I won't
> >>tell you and you cannot access it.".
> >>And any transitions to and from 'offline' are essentially vendor-specific.
> >>In short: Do not use it.
> >>
> >>If OTOH means the 'block_disconnect' state this is something which
> >>should/needs to be implemented in the HBA emulation for simulating
> >>a link failure.
> >>qemu itself should be able to access the device and it should start up
> >>perfectly normal, so we shouldn't get any ENXIO errors.
> >>
> >>(Obviously, if _all_ disks are in 'disconnect' state the guest wouldn't
> >>start up as it cannot read any data. But that's beside the point.)
> >
> >I'm referring to scsi_device_set_state(scmd->device, SDEV_OFFLINE) in
> >Linux. This is the state where the host block device cannot be opened
> >or accessed.
> >
> Which means the device is declared dead by the SCSI stack.
> And qemu does _very_ well not to start in this circumstances.
>
> However, this behaviour is not influenced nor modified by the ALUA patchset
> but is rather a different topic.
>
> <rambling>
> 'offline' devices is the final step in SCSI EH, which means that SCSI EH has
> exhausted its options and doesn't know how to fix the device.
> However, in modern systems this typically happens when SCSI EH kicks in
> during a (transport) link disconnect, as then every single step in SCSI EH
> will fail. (Which also means that SCSI EH is woefully inadequate for FC, but
> that's a different topic.)
> But as this is a transport issue, _all_ respective drivers should be aware
> of this, and should have been modified _not_ to start SCSI EH when the
> transport link is severed.
> So the very fact that SCSI EH is started means that there's an issue with
> the driver, which really needs to be fixed first.
> Hence I think qemu is right here, as the underlying reason for the 'offline'
> device should be fixed first.
> </rambling>
Interesting, thanks for explaining.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support
2015-12-15 3:02 ` Stefan Hajnoczi
@ 2015-12-15 6:49 ` Hannes Reinecke
0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2015-12-15 6:49 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Johannes Thumshirn, Paolo Bonzini, qemu-devel, Alexander Graf
On 12/15/2015 04:02 AM, Stefan Hajnoczi wrote:
> On Mon, Dec 14, 2015 at 08:35:43AM +0100, Hannes Reinecke wrote:
>> On 12/14/2015 08:24 AM, Stefan Hajnoczi wrote:
>>> On Thu, Dec 10, 2015 at 10:13:17AM +0100, Hannes Reinecke wrote:
>>>> On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote:
>>>>>> here's now an updated version to enable ALUA and simplified
>>>>>> active/passive multipath support for qemu.
>>>>>>
>>>>>> This patchset relies on having _two_ block devices configured,
>>>>>> and two SCSI disks pointing to those block devices with the
>>>>>> _same_ 'wwn' property and unique 'port_group' properties.
>>>>>> I know, this is a bit of a nasty hack, but I hope to add
>>>>>> proper multipath support (with several SCSI devices pointing /
>>>>>> linking to the same block device) in the near future.
>>>>>>
>>>>>> It also implements a 'alua_policy', which allows for simulating
>>>>>> an 'active/passive' multipath setup.
>>>>>>
>>>>>> And for testing I've implemented a 'block_disconnect' HMP command,
>>>>>> which simulates a link failure for the attached devices.
>>>>>>
>>>>>> I wouldn't object if someone declares this a gross hack, but with
>>>>>> it I can finally simulate real-life multipath failover and do
>>>>>> some functional multipath-tools testing withouth having to recurse
>>>>>> on using real hardware.
>>>>>
>>>>> I'm not familiar with how ALUA works but have been thinking about a
>>>>> multipath problem:
>>>>>
>>>>> If the host has SCSI disks that are marked 'offline' then QEMU will
>>>>> refuse to start up since it cannot open the block device (ENXIO).
>>>>>
>>>> Define 'offline'.
>>>> If this means the ALUA state 'offline' then we wouldn't have to worry; ALUA
>>>> state 'offline' essentially means "Yeah, there's something here, but I won't
>>>> tell you and you cannot access it.".
>>>> And any transitions to and from 'offline' are essentially vendor-specific.
>>>> In short: Do not use it.
>>>>
>>>> If OTOH means the 'block_disconnect' state this is something which
>>>> should/needs to be implemented in the HBA emulation for simulating
>>>> a link failure.
>>>> qemu itself should be able to access the device and it should start up
>>>> perfectly normal, so we shouldn't get any ENXIO errors.
>>>>
>>>> (Obviously, if _all_ disks are in 'disconnect' state the guest wouldn't
>>>> start up as it cannot read any data. But that's beside the point.)
>>>
>>> I'm referring to scsi_device_set_state(scmd->device, SDEV_OFFLINE) in
>>> Linux. This is the state where the host block device cannot be opened
>>> or accessed.
>>>
>> Which means the device is declared dead by the SCSI stack.
>> And qemu does _very_ well not to start in this circumstances.
>>
>> However, this behaviour is not influenced nor modified by the ALUA patchset
>> but is rather a different topic.
>>
>> <rambling>
>> 'offline' devices is the final step in SCSI EH, which means that SCSI EH has
>> exhausted its options and doesn't know how to fix the device.
>> However, in modern systems this typically happens when SCSI EH kicks in
>> during a (transport) link disconnect, as then every single step in SCSI EH
>> will fail. (Which also means that SCSI EH is woefully inadequate for FC, but
>> that's a different topic.)
>> But as this is a transport issue, _all_ respective drivers should be aware
>> of this, and should have been modified _not_ to start SCSI EH when the
>> transport link is severed.
>> So the very fact that SCSI EH is started means that there's an issue with
>> the driver, which really needs to be fixed first.
>> Hence I think qemu is right here, as the underlying reason for the 'offline'
>> device should be fixed first.
>> </rambling>
>
> Interesting, thanks for explaining.
>
And thinking about it some more, we _could_ map the SCSI 'offline'
state onto ALUA 'offline', and allow qemu to start nevertheless.
(It could get the interesting bits like INQUIRY from sysfs, so it
doesn't _actually_ have to do I/O on startup).
Then we could have a HMP/QMP command for resetting the SCSI status
back to 'running', which should allow I/O to start properly.
Hmm. Lemme see ...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-12-15 6:49 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-27 14:58 [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Hannes Reinecke
2015-11-27 14:58 ` [Qemu-devel] [PATCH 1/8] scsi-disk: Add 'port_group' property Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 2/8] scsi-disk: Add 'alua_state' property Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 3/8] scsi-disk: Implement 'REPORT TARGET PORT GROUPS' Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 4/8] scsi-disk: Implement 'SET " Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 5/8] scsi-disk: implement ALUA policy Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 6/8] scsi-disk: Allow READ CAPACITY in standby Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 7/8] scsi-disk: Implement 'alua_preferred' option Hannes Reinecke
2015-11-27 14:59 ` [Qemu-devel] [PATCH 8/8] block: Implement 'block_disconnect' HMP command Hannes Reinecke
2015-11-27 18:00 ` Eric Blake
2015-12-10 8:26 ` [Qemu-devel] [PATCH RFC 0/8] scsi-disk: Active/passive ALUA support Stefan Hajnoczi
2015-12-10 9:13 ` Hannes Reinecke
2015-12-14 7:24 ` Stefan Hajnoczi
2015-12-14 7:35 ` Hannes Reinecke
2015-12-15 3:02 ` Stefan Hajnoczi
2015-12-15 6:49 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).